On physical machine, ipi HW uses IOCSR registers, however there is trap
into hypervisor when vcpu accesses IOCSR registers if system is in VM
mode. SWI is a interrupt mechanism like SGI on ARM, software can send
interrupt to CPU, only that on LoongArch SWI can only be sent to local CPU
now. So SWI can not used for IPI on real HW system, however it can be used
on VM when combined with hypercall method. IPI can be sent with hypercall
method and SWI interrupt is injected to vcpu, vcpu can treat SWI
interrupt as IPI.
With PV IPI supported, there is one trap with IPI sending, however with IPI
receiving there is no trap. with IOCSR HW ipi method, there will be one
trap with IPI sending and two trap with ipi receiving.
Also IPI multicast support is added for VM, the idea comes from x86 PV ipi.
IPI can be sent to 128 vcpus in one time. With IPI multicast support, trap
will be reduced greatly.
Here is the microbenchmarck data with "perf bench futex wake" testcase on
3C5000 single-way machine, there are 16 cpus on 3C5000 single-way machine,
VM has 16 vcpus also. The benchmark data is ms time unit to wakeup 16 threads,
the performance is better if data is smaller.
physical machine 0.0176 ms
VM original 0.1140 ms
VM with pv ipi patch 0.0481 ms
---
Change in V5:
1. Refresh function/macro name from review comments.
Change in V4:
1. Modfiy pv ipi hook function name call_func_ipi() and
call_func_single_ipi() with send_ipi_mask()/send_ipi_single(), since pv
ipi is used for both remote function call and reschedule notification.
2. Refresh changelog.
Change in V3:
1. Add 128 vcpu ipi multicast support like x86
2. Change cpucfg base address from 0x10000000 to 0x40000000, in order
to avoid confliction with future hw usage
3. Adjust patch order in this patchset, move patch
Refine-ipi-ops-on-LoongArch-platform to the first one.
Change in V2:
1. Add hw cpuid map support since ipi routing uses hw cpuid
2. Refine changelog description
3. Add hypercall statistic support for vcpu
4. Set percpu pv ipi message buffer aligned with cacheline
5. Refine pv ipi send logic, do not send ipi message with if there is
pending ipi message.
---
Bibo Mao (6):
LoongArch/smp: Refine some ipi functions on LoongArch platform
LoongArch: KVM: Add hypercall instruction emulation support
LoongArch: KVM: Add cpucfg area for kvm hypervisor
LoongArch: Add paravirt interface for guest kernel
LoongArch: KVM: Add vcpu search support from physical cpuid
LoongArch: Add pv ipi support on LoongArch system
arch/loongarch/Kconfig | 9 +
arch/loongarch/include/asm/Kbuild | 1 -
arch/loongarch/include/asm/hardirq.h | 5 +
arch/loongarch/include/asm/inst.h | 1 +
arch/loongarch/include/asm/irq.h | 10 +-
arch/loongarch/include/asm/kvm_host.h | 27 +++
arch/loongarch/include/asm/kvm_para.h | 156 ++++++++++++++++++
arch/loongarch/include/asm/kvm_vcpu.h | 1 +
arch/loongarch/include/asm/loongarch.h | 11 ++
arch/loongarch/include/asm/paravirt.h | 27 +++
.../include/asm/paravirt_api_clock.h | 1 +
arch/loongarch/include/asm/smp.h | 31 ++--
arch/loongarch/include/uapi/asm/Kbuild | 2 -
arch/loongarch/kernel/Makefile | 1 +
arch/loongarch/kernel/irq.c | 24 +--
arch/loongarch/kernel/paravirt.c | 153 +++++++++++++++++
arch/loongarch/kernel/perf_event.c | 14 +-
arch/loongarch/kernel/setup.c | 2 +
arch/loongarch/kernel/smp.c | 60 ++++---
arch/loongarch/kernel/time.c | 12 +-
arch/loongarch/kvm/exit.c | 125 ++++++++++++--
arch/loongarch/kvm/vcpu.c | 94 ++++++++++-
arch/loongarch/kvm/vm.c | 11 ++
23 files changed, 676 insertions(+), 102 deletions(-)
create mode 100644 arch/loongarch/include/asm/kvm_para.h
create mode 100644 arch/loongarch/include/asm/paravirt.h
create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild
create mode 100644 arch/loongarch/kernel/paravirt.c
base-commit: 39133352cbed6626956d38ed72012f49b0421e7b
--
2.39.3
On LoongArch system, there is hypercall instruction special for
virtualization. When system executes this instruction on host side,
there is illegal instruction exception reported, however it will
trap into host when it is executed in VM mode.
When hypercall is emulated, A0 register is set with value
KVM_HCALL_INVALID_CODE, rather than inject EXCCODE_INE invalid
instruction exception. So VM can continue to executing the next code.
Signed-off-by: Bibo Mao <[email protected]>
---
arch/loongarch/include/asm/Kbuild | 1 -
arch/loongarch/include/asm/kvm_para.h | 26 ++++++++++++++++++++++++++
arch/loongarch/include/uapi/asm/Kbuild | 2 --
arch/loongarch/kvm/exit.c | 10 ++++++++++
4 files changed, 36 insertions(+), 3 deletions(-)
create mode 100644 arch/loongarch/include/asm/kvm_para.h
delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild
diff --git a/arch/loongarch/include/asm/Kbuild b/arch/loongarch/include/asm/Kbuild
index 93783fa24f6e..22991a6f0e2b 100644
--- a/arch/loongarch/include/asm/Kbuild
+++ b/arch/loongarch/include/asm/Kbuild
@@ -23,4 +23,3 @@ generic-y += poll.h
generic-y += param.h
generic-y += posix_types.h
generic-y += resource.h
-generic-y += kvm_para.h
diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h
new file mode 100644
index 000000000000..d48f993ae206
--- /dev/null
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_KVM_PARA_H
+#define _ASM_LOONGARCH_KVM_PARA_H
+
+/*
+ * LoongArch hypercall return code
+ */
+#define KVM_HCALL_STATUS_SUCCESS 0
+#define KVM_HCALL_INVALID_CODE -1UL
+#define KVM_HCALL_INVALID_PARAMETER -2UL
+
+static inline unsigned int kvm_arch_para_features(void)
+{
+ return 0;
+}
+
+static inline unsigned int kvm_arch_para_hints(void)
+{
+ return 0;
+}
+
+static inline bool kvm_check_and_clear_guest_paused(void)
+{
+ return false;
+}
+#endif /* _ASM_LOONGARCH_KVM_PARA_H */
diff --git a/arch/loongarch/include/uapi/asm/Kbuild b/arch/loongarch/include/uapi/asm/Kbuild
deleted file mode 100644
index 4aa680ca2e5f..000000000000
--- a/arch/loongarch/include/uapi/asm/Kbuild
+++ /dev/null
@@ -1,2 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-generic-y += kvm_para.h
diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
index ed1d89d53e2e..923bbca9bd22 100644
--- a/arch/loongarch/kvm/exit.c
+++ b/arch/loongarch/kvm/exit.c
@@ -685,6 +685,15 @@ static int kvm_handle_lasx_disabled(struct kvm_vcpu *vcpu)
return RESUME_GUEST;
}
+static int kvm_handle_hypercall(struct kvm_vcpu *vcpu)
+{
+ update_pc(&vcpu->arch);
+
+ /* Treat it as noop intruction, only set return value */
+ vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HCALL_INVALID_CODE;
+ return RESUME_GUEST;
+}
+
/*
* LoongArch KVM callback handling for unimplemented guest exiting
*/
@@ -716,6 +725,7 @@ static exit_handle_fn kvm_fault_tables[EXCCODE_INT_START] = {
[EXCCODE_LSXDIS] = kvm_handle_lsx_disabled,
[EXCCODE_LASXDIS] = kvm_handle_lasx_disabled,
[EXCCODE_GSPR] = kvm_handle_gspr,
+ [EXCCODE_HVC] = kvm_handle_hypercall,
};
int kvm_handle_fault(struct kvm_vcpu *vcpu, int fault)
--
2.39.3
It is code refine about ipi handling on LoongArch platform, there are
three modifications.
1. Add generic function get_percpu_irq(), replacing some percpu irq
functions such as get_ipi_irq()/get_pmc_irq()/get_timer_irq() with
get_percpu_irq().
2. Change definition about parameter action called by function
loongson_send_ipi_single() and loongson_send_ipi_mask(), and it is
defined as decimal encoding format at ipi sender side. Normal decimal
encoding is used rather than binary bitmap encoding for ipi action, ipi
hw sender uses decimal encoding code, and ipi receiver will get binary
bitmap encoding, the ipi hw will convert it into bitmap in ipi message
buffer.
3. Add structure smp_ops on LoongArch platform so that pv ipi can be used
later.
Signed-off-by: Bibo Mao <[email protected]>
---
arch/loongarch/include/asm/hardirq.h | 4 ++
arch/loongarch/include/asm/irq.h | 10 ++++-
arch/loongarch/include/asm/smp.h | 31 +++++++--------
arch/loongarch/kernel/irq.c | 22 +----------
arch/loongarch/kernel/perf_event.c | 14 +------
arch/loongarch/kernel/smp.c | 58 +++++++++++++++++++---------
arch/loongarch/kernel/time.c | 12 +-----
7 files changed, 71 insertions(+), 80 deletions(-)
diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h
index 0ef3b18f8980..9f0038e19c7f 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -12,6 +12,10 @@
extern void ack_bad_irq(unsigned int irq);
#define ack_bad_irq ack_bad_irq
+enum ipi_msg_type {
+ IPI_RESCHEDULE,
+ IPI_CALL_FUNCTION,
+};
#define NR_IPI 2
typedef struct {
diff --git a/arch/loongarch/include/asm/irq.h b/arch/loongarch/include/asm/irq.h
index 218b4da0ea90..00101b6d601e 100644
--- a/arch/loongarch/include/asm/irq.h
+++ b/arch/loongarch/include/asm/irq.h
@@ -117,8 +117,16 @@ extern struct fwnode_handle *liointc_handle;
extern struct fwnode_handle *pch_lpc_handle;
extern struct fwnode_handle *pch_pic_handle[MAX_IO_PICS];
-extern irqreturn_t loongson_ipi_interrupt(int irq, void *dev);
+static inline int get_percpu_irq(int vector)
+{
+ struct irq_domain *d;
+
+ d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
+ if (d)
+ return irq_create_mapping(d, vector);
+ return -EINVAL;
+}
#include <asm-generic/irq.h>
#endif /* _ASM_IRQ_H */
diff --git a/arch/loongarch/include/asm/smp.h b/arch/loongarch/include/asm/smp.h
index f81e5f01d619..75d30529748c 100644
--- a/arch/loongarch/include/asm/smp.h
+++ b/arch/loongarch/include/asm/smp.h
@@ -12,6 +12,13 @@
#include <linux/threads.h>
#include <linux/cpumask.h>
+struct smp_ops {
+ void (*init_ipi)(void);
+ void (*send_ipi_mask)(const struct cpumask *mask, unsigned int action);
+ void (*send_ipi_single)(int cpu, unsigned int action);
+};
+
+extern struct smp_ops smp_ops;
extern int smp_num_siblings;
extern int num_processors;
extern int disabled_cpus;
@@ -24,8 +31,6 @@ void loongson_prepare_cpus(unsigned int max_cpus);
void loongson_boot_secondary(int cpu, struct task_struct *idle);
void loongson_init_secondary(void);
void loongson_smp_finish(void);
-void loongson_send_ipi_single(int cpu, unsigned int action);
-void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action);
#ifdef CONFIG_HOTPLUG_CPU
int loongson_cpu_disable(void);
void loongson_cpu_die(unsigned int cpu);
@@ -59,9 +64,12 @@ extern int __cpu_logical_map[NR_CPUS];
#define cpu_physical_id(cpu) cpu_logical_map(cpu)
-#define SMP_BOOT_CPU 0x1
-#define SMP_RESCHEDULE 0x2
-#define SMP_CALL_FUNCTION 0x4
+#define ACTION_BOOT_CPU 0
+#define ACTION_RESCHEDULE 1
+#define ACTION_CALL_FUNCTION 2
+#define SMP_BOOT_CPU BIT(ACTION_BOOT_CPU)
+#define SMP_RESCHEDULE BIT(ACTION_RESCHEDULE)
+#define SMP_CALL_FUNCTION BIT(ACTION_CALL_FUNCTION)
struct secondary_data {
unsigned long stack;
@@ -71,7 +79,8 @@ extern struct secondary_data cpuboot_data;
extern asmlinkage void smpboot_entry(void);
extern asmlinkage void start_secondary(void);
-
+extern void arch_send_call_function_single_ipi(int cpu);
+extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
extern void calculate_cpu_foreign_map(void);
/*
@@ -79,16 +88,6 @@ extern void calculate_cpu_foreign_map(void);
*/
extern void show_ipi_list(struct seq_file *p, int prec);
-static inline void arch_send_call_function_single_ipi(int cpu)
-{
- loongson_send_ipi_single(cpu, SMP_CALL_FUNCTION);
-}
-
-static inline void arch_send_call_function_ipi_mask(const struct cpumask *mask)
-{
- loongson_send_ipi_mask(mask, SMP_CALL_FUNCTION);
-}
-
#ifdef CONFIG_HOTPLUG_CPU
static inline int __cpu_disable(void)
{
diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index 883e5066ae44..ce36897d1e5a 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -87,23 +87,9 @@ static void __init init_vec_parent_group(void)
acpi_table_parse(ACPI_SIG_MCFG, early_pci_mcfg_parse);
}
-static int __init get_ipi_irq(void)
-{
- struct irq_domain *d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
-
- if (d)
- return irq_create_mapping(d, INT_IPI);
-
- return -EINVAL;
-}
-
void __init init_IRQ(void)
{
int i;
-#ifdef CONFIG_SMP
- int r, ipi_irq;
- static int ipi_dummy_dev;
-#endif
unsigned int order = get_order(IRQ_STACK_SIZE);
struct page *page;
@@ -113,13 +99,7 @@ void __init init_IRQ(void)
init_vec_parent_group();
irqchip_init();
#ifdef CONFIG_SMP
- ipi_irq = get_ipi_irq();
- if (ipi_irq < 0)
- panic("IPI IRQ mapping failed\n");
- irq_set_percpu_devid(ipi_irq);
- r = request_percpu_irq(ipi_irq, loongson_ipi_interrupt, "IPI", &ipi_dummy_dev);
- if (r < 0)
- panic("IPI IRQ request failed\n");
+ smp_ops.init_ipi();
#endif
for (i = 0; i < NR_IRQS; i++)
diff --git a/arch/loongarch/kernel/perf_event.c b/arch/loongarch/kernel/perf_event.c
index 0491bf453cd4..3265c8f33223 100644
--- a/arch/loongarch/kernel/perf_event.c
+++ b/arch/loongarch/kernel/perf_event.c
@@ -456,16 +456,6 @@ static void loongarch_pmu_disable(struct pmu *pmu)
static DEFINE_MUTEX(pmu_reserve_mutex);
static atomic_t active_events = ATOMIC_INIT(0);
-static int get_pmc_irq(void)
-{
- struct irq_domain *d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
-
- if (d)
- return irq_create_mapping(d, INT_PCOV);
-
- return -EINVAL;
-}
-
static void reset_counters(void *arg);
static int __hw_perf_event_init(struct perf_event *event);
@@ -473,7 +463,7 @@ static void hw_perf_event_destroy(struct perf_event *event)
{
if (atomic_dec_and_mutex_lock(&active_events, &pmu_reserve_mutex)) {
on_each_cpu(reset_counters, NULL, 1);
- free_irq(get_pmc_irq(), &loongarch_pmu);
+ free_irq(get_percpu_irq(INT_PCOV), &loongarch_pmu);
mutex_unlock(&pmu_reserve_mutex);
}
}
@@ -562,7 +552,7 @@ static int loongarch_pmu_event_init(struct perf_event *event)
if (event->cpu >= 0 && !cpu_online(event->cpu))
return -ENODEV;
- irq = get_pmc_irq();
+ irq = get_percpu_irq(INT_PCOV);
flags = IRQF_PERCPU | IRQF_NOBALANCING | IRQF_NO_THREAD | IRQF_NO_SUSPEND | IRQF_SHARED;
if (!atomic_inc_not_zero(&active_events)) {
mutex_lock(&pmu_reserve_mutex);
diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
index 2b49d30eb7c0..2182e7cc2ed6 100644
--- a/arch/loongarch/kernel/smp.c
+++ b/arch/loongarch/kernel/smp.c
@@ -66,11 +66,6 @@ static cpumask_t cpu_core_setup_map;
struct secondary_data cpuboot_data;
static DEFINE_PER_CPU(int, cpu_state);
-enum ipi_msg_type {
- IPI_RESCHEDULE,
- IPI_CALL_FUNCTION,
-};
-
static const char *ipi_types[NR_IPI] __tracepoint_string = {
[IPI_RESCHEDULE] = "Rescheduling interrupts",
[IPI_CALL_FUNCTION] = "Function call interrupts",
@@ -123,24 +118,19 @@ static u32 ipi_read_clear(int cpu)
static void ipi_write_action(int cpu, u32 action)
{
- unsigned int irq = 0;
-
- while ((irq = ffs(action))) {
- uint32_t val = IOCSR_IPI_SEND_BLOCKING;
+ uint32_t val;
- val |= (irq - 1);
- val |= (cpu << IOCSR_IPI_SEND_CPU_SHIFT);
- iocsr_write32(val, LOONGARCH_IOCSR_IPI_SEND);
- action &= ~BIT(irq - 1);
- }
+ val = IOCSR_IPI_SEND_BLOCKING | action;
+ val |= (cpu << IOCSR_IPI_SEND_CPU_SHIFT);
+ iocsr_write32(val, LOONGARCH_IOCSR_IPI_SEND);
}
-void loongson_send_ipi_single(int cpu, unsigned int action)
+static void loongson_send_ipi_single(int cpu, unsigned int action)
{
ipi_write_action(cpu_logical_map(cpu), (u32)action);
}
-void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action)
+static void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action)
{
unsigned int i;
@@ -148,6 +138,16 @@ void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action)
ipi_write_action(cpu_logical_map(i), (u32)action);
}
+void arch_send_call_function_single_ipi(int cpu)
+{
+ smp_ops.send_ipi_single(cpu, ACTION_CALL_FUNCTION);
+}
+
+void arch_send_call_function_ipi_mask(const struct cpumask *mask)
+{
+ smp_ops.send_ipi_mask(mask, ACTION_CALL_FUNCTION);
+}
+
/*
* This function sends a 'reschedule' IPI to another CPU.
* it goes straight through and wastes no time serializing
@@ -155,11 +155,11 @@ void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action)
*/
void arch_smp_send_reschedule(int cpu)
{
- loongson_send_ipi_single(cpu, SMP_RESCHEDULE);
+ smp_ops.send_ipi_single(cpu, ACTION_RESCHEDULE);
}
EXPORT_SYMBOL_GPL(arch_smp_send_reschedule);
-irqreturn_t loongson_ipi_interrupt(int irq, void *dev)
+static irqreturn_t loongson_ipi_interrupt(int irq, void *dev)
{
unsigned int action;
unsigned int cpu = smp_processor_id();
@@ -179,6 +179,26 @@ irqreturn_t loongson_ipi_interrupt(int irq, void *dev)
return IRQ_HANDLED;
}
+static void loongson_init_ipi(void)
+{
+ int r, ipi_irq;
+
+ ipi_irq = get_percpu_irq(INT_IPI);
+ if (ipi_irq < 0)
+ panic("IPI IRQ mapping failed\n");
+
+ irq_set_percpu_devid(ipi_irq);
+ r = request_percpu_irq(ipi_irq, loongson_ipi_interrupt, "IPI", &irq_stat);
+ if (r < 0)
+ panic("IPI IRQ request failed\n");
+}
+
+struct smp_ops smp_ops = {
+ .init_ipi = loongson_init_ipi,
+ .send_ipi_single = loongson_send_ipi_single,
+ .send_ipi_mask = loongson_send_ipi_mask,
+};
+
static void __init fdt_smp_setup(void)
{
#ifdef CONFIG_OF
@@ -256,7 +276,7 @@ void loongson_boot_secondary(int cpu, struct task_struct *idle)
csr_mail_send(entry, cpu_logical_map(cpu), 0);
- loongson_send_ipi_single(cpu, SMP_BOOT_CPU);
+ loongson_send_ipi_single(cpu, ACTION_BOOT_CPU);
}
/*
diff --git a/arch/loongarch/kernel/time.c b/arch/loongarch/kernel/time.c
index e7015f7b70e3..fd5354f9be7c 100644
--- a/arch/loongarch/kernel/time.c
+++ b/arch/loongarch/kernel/time.c
@@ -123,16 +123,6 @@ void sync_counter(void)
csr_write64(init_offset, LOONGARCH_CSR_CNTC);
}
-static int get_timer_irq(void)
-{
- struct irq_domain *d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
-
- if (d)
- return irq_create_mapping(d, INT_TI);
-
- return -EINVAL;
-}
-
int constant_clockevent_init(void)
{
unsigned int cpu = smp_processor_id();
@@ -142,7 +132,7 @@ int constant_clockevent_init(void)
static int irq = 0, timer_irq_installed = 0;
if (!timer_irq_installed) {
- irq = get_timer_irq();
+ irq = get_percpu_irq(INT_TI);
if (irq < 0)
pr_err("Failed to map irq %d (timer)\n", irq);
}
--
2.39.3
Instruction cpucfg can be used to get processor features. And there
is trap exception when it is executed in VM mode, and also it is
to provide cpu features to VM. On real hardware cpucfg area 0 - 20
is used. Here one specified area 0x40000000 -- 0x400000ff is used
for KVM hypervisor to privide PV features, and the area can be extended
for other hypervisors in future. This area will never be used for
real HW, it is only used by software.
Signed-off-by: Bibo Mao <[email protected]>
---
arch/loongarch/include/asm/inst.h | 1 +
arch/loongarch/include/asm/loongarch.h | 10 ++++++
arch/loongarch/kvm/exit.c | 46 +++++++++++++++++---------
3 files changed, 41 insertions(+), 16 deletions(-)
diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
index d8f637f9e400..ad120f924905 100644
--- a/arch/loongarch/include/asm/inst.h
+++ b/arch/loongarch/include/asm/inst.h
@@ -67,6 +67,7 @@ enum reg2_op {
revhd_op = 0x11,
extwh_op = 0x16,
extwb_op = 0x17,
+ cpucfg_op = 0x1b,
iocsrrdb_op = 0x19200,
iocsrrdh_op = 0x19201,
iocsrrdw_op = 0x19202,
diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
index 46366e783c84..a1d22e8b6f94 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -158,6 +158,16 @@
#define CPUCFG48_VFPU_CG BIT(2)
#define CPUCFG48_RAM_CG BIT(3)
+/*
+ * cpucfg index area: 0x40000000 -- 0x400000ff
+ * SW emulation for KVM hypervirsor
+ */
+#define CPUCFG_KVM_BASE 0x40000000UL
+#define CPUCFG_KVM_SIZE 0x100
+#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
+#define KVM_SIGNATURE "KVM\0"
+#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
+
#ifndef __ASSEMBLY__
/* CSR */
diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
index 923bbca9bd22..6a38fd59d86d 100644
--- a/arch/loongarch/kvm/exit.c
+++ b/arch/loongarch/kvm/exit.c
@@ -206,10 +206,37 @@ int kvm_emu_idle(struct kvm_vcpu *vcpu)
return EMULATE_DONE;
}
-static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
+static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
{
int rd, rj;
unsigned int index;
+
+ rd = inst.reg2_format.rd;
+ rj = inst.reg2_format.rj;
+ ++vcpu->stat.cpucfg_exits;
+ index = vcpu->arch.gprs[rj];
+
+ /*
+ * By LoongArch Reference Manual 2.2.10.5
+ * Return value is 0 for undefined cpucfg index
+ */
+ switch (index) {
+ case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
+ vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
+ break;
+ case CPUCFG_KVM_SIG:
+ vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
+ break;
+ default:
+ vcpu->arch.gprs[rd] = 0;
+ break;
+ }
+
+ return EMULATE_DONE;
+}
+
+static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
+{
unsigned long curr_pc;
larch_inst inst;
enum emulation_result er = EMULATE_DONE;
@@ -224,21 +251,8 @@ static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
er = EMULATE_FAIL;
switch (((inst.word >> 24) & 0xff)) {
case 0x0: /* CPUCFG GSPR */
- if (inst.reg2_format.opcode == 0x1B) {
- rd = inst.reg2_format.rd;
- rj = inst.reg2_format.rj;
- ++vcpu->stat.cpucfg_exits;
- index = vcpu->arch.gprs[rj];
- er = EMULATE_DONE;
- /*
- * By LoongArch Reference Manual 2.2.10.5
- * return value is 0 for undefined cpucfg index
- */
- if (index < KVM_MAX_CPUCFG_REGS)
- vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
- else
- vcpu->arch.gprs[rd] = 0;
- }
+ if (inst.reg2_format.opcode == cpucfg_op)
+ er = kvm_emu_cpucfg(vcpu, inst);
break;
case 0x4: /* CSR{RD,WR,XCHG} GSPR */
er = kvm_handle_csr(vcpu, inst);
--
2.39.3
Paravirt interface pv_ipi_init() is added here for guest kernel, it
firstly checks whether system runs on VM mode. If kernel runs on VM mode,
it will call function kvm_para_available() to detect current VMM type.
Now only KVM VMM type is detected,the paravirt function can work only if
current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.
There is not effective with pv_ipi_init() now, it is dummy function.
Signed-off-by: Bibo Mao <[email protected]>
---
arch/loongarch/Kconfig | 9 ++++
arch/loongarch/include/asm/kvm_para.h | 7 ++++
arch/loongarch/include/asm/paravirt.h | 27 ++++++++++++
.../include/asm/paravirt_api_clock.h | 1 +
arch/loongarch/kernel/Makefile | 1 +
arch/loongarch/kernel/paravirt.c | 41 +++++++++++++++++++
arch/loongarch/kernel/setup.c | 1 +
7 files changed, 87 insertions(+)
create mode 100644 arch/loongarch/include/asm/paravirt.h
create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
create mode 100644 arch/loongarch/kernel/paravirt.c
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 929f68926b34..fdaae9a0435c 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -587,6 +587,15 @@ config CPU_HAS_PREFETCH
bool
default y
+config PARAVIRT
+ bool "Enable paravirtualization code"
+ depends on AS_HAS_LVZ_EXTENSION
+ help
+ This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization. However, when run without a hypervisor
+ the kernel is theoretically slower and slightly larger.
+
config ARCH_SUPPORTS_KEXEC
def_bool y
diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h
index d48f993ae206..af5d677a9052 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
#ifndef _ASM_LOONGARCH_KVM_PARA_H
#define _ASM_LOONGARCH_KVM_PARA_H
+/*
+ * Hypercall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT 8
+#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code)
+
/*
* LoongArch hypercall return code
*/
diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index 000000000000..58f7b7b89f2c
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include <linux/static_call_types.h>
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+ return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_ipi_init(void);
+#else
+static inline int pv_ipi_init(void)
+{
+ return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index 000000000000..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include <asm/paravirt.h>
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-$(CONFIG_PROC_FS) += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
obj-$(CONFIG_SMP) += smp.o
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index 000000000000..5cf794e8490f
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/export.h>
+#include <linux/types.h>
+#include <linux/jump_label.h>
+#include <linux/kvm_para.h>
+#include <asm/paravirt.h>
+#include <linux/static_call.h>
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+ return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+
+static bool kvm_para_available(void)
+{
+ static int hypervisor_type;
+ int config;
+
+ if (!hypervisor_type) {
+ config = read_cpucfg(CPUCFG_KVM_SIG);
+ if (!memcmp(&config, KVM_SIGNATURE, 4))
+ hypervisor_type = HYPERVISOR_KVM;
+ }
+
+ return hypervisor_type == HYPERVISOR_KVM;
+}
+
+int __init pv_ipi_init(void)
+{
+ if (!cpu_has_hypervisor)
+ return 0;
+ if (!kvm_para_available())
+ return 0;
+
+ return 1;
+}
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index edf2bba80130..b79a1244b56f 100644
--- a/arch/loongarch/kernel/setup.c
+++ b/arch/loongarch/kernel/setup.c
@@ -43,6 +43,7 @@
#include <asm/efi.h>
#include <asm/loongson.h>
#include <asm/numa.h>
+#include <asm/paravirt.h>
#include <asm/pgalloc.h>
#include <asm/sections.h>
#include <asm/setup.h>
--
2.39.3
Physical cpuid is used for interrupt routing for irqchips such as
ipi/msi/extioi interrupt controller. And physical cpuid is stored
at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
is created and physical cpuid of two vcpus cannot be the same.
Different irqchips have different size declaration about physical cpuid,
max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
for MSI irqchip.
The smallest value from all interrupt controllers is selected now,
and the max cpuid size is defines as 256 by KVM which comes from
extioi irqchip.
Signed-off-by: Bibo Mao <[email protected]>
---
arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
arch/loongarch/include/asm/kvm_vcpu.h | 1 +
arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
arch/loongarch/kvm/vm.c | 11 ++++
4 files changed, 130 insertions(+), 1 deletion(-)
diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
index 2d62f7b0d377..3ba16ef1fe69 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -64,6 +64,30 @@ struct kvm_world_switch {
#define MAX_PGTABLE_LEVELS 4
+/*
+ * Physical cpu id is used for interrupt routing, there are different
+ * definitions about physical cpuid on different hardwares.
+ * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
+ * For IPI HW, max dest CPUID size 1024
+ * For extioi interrupt controller, max dest CPUID size is 256
+ * For MSI interrupt controller, max supported CPUID size is 65536
+ *
+ * Currently max CPUID is defined as 256 for KVM hypervisor, in future
+ * it will be expanded to 4096, including 16 packages at most. And every
+ * package supports at most 256 vcpus
+ */
+#define KVM_MAX_PHYID 256
+
+struct kvm_phyid_info {
+ struct kvm_vcpu *vcpu;
+ bool enabled;
+};
+
+struct kvm_phyid_map {
+ int max_phyid;
+ struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
+};
+
struct kvm_arch {
/* Guest physical mm */
kvm_pte_t *pgd;
@@ -71,6 +95,8 @@ struct kvm_arch {
unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
unsigned int root_level;
+ spinlock_t phyid_map_lock;
+ struct kvm_phyid_map *phyid_map;
s64 time_offset;
struct kvm_context __percpu *vmcs;
diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
index 0cb4fdb8a9b5..9f53950959da 100644
--- a/arch/loongarch/include/asm/kvm_vcpu.h
+++ b/arch/loongarch/include/asm/kvm_vcpu.h
@@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
void kvm_restore_timer(struct kvm_vcpu *vcpu);
int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
+struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
/*
* Loongarch KVM guest interrupt handling
diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index 27701991886d..40296d8ef297 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
return 0;
}
+static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
+{
+ int cpuid;
+ struct loongarch_csrs *csr = vcpu->arch.csr;
+ struct kvm_phyid_map *map;
+
+ if (val >= KVM_MAX_PHYID)
+ return -EINVAL;
+
+ cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
+ map = vcpu->kvm->arch.phyid_map;
+ spin_lock(&vcpu->kvm->arch.phyid_map_lock);
+ if (map->phys_map[cpuid].enabled) {
+ /*
+ * Cpuid is already set before
+ * Forbid changing different cpuid at runtime
+ */
+ if (cpuid != val) {
+ /*
+ * Cpuid 0 is initial value for vcpu, maybe invalid
+ * unset value for vcpu
+ */
+ if (cpuid) {
+ spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+ return -EINVAL;
+ }
+ } else {
+ /* Discard duplicated cpuid set */
+ spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+ return 0;
+ }
+ }
+
+ if (map->phys_map[val].enabled) {
+ /*
+ * New cpuid is already set with other vcpu
+ * Forbid sharing the same cpuid between different vcpus
+ */
+ if (map->phys_map[val].vcpu != vcpu) {
+ spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+ return -EINVAL;
+ }
+
+ /* Discard duplicated cpuid set operation*/
+ spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+ return 0;
+ }
+
+ kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
+ map->phys_map[val].enabled = true;
+ map->phys_map[val].vcpu = vcpu;
+ if (map->max_phyid < val)
+ map->max_phyid = val;
+ spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+ return 0;
+}
+
+struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
+{
+ struct kvm_phyid_map *map;
+
+ if (cpuid >= KVM_MAX_PHYID)
+ return NULL;
+
+ map = kvm->arch.phyid_map;
+ if (map->phys_map[cpuid].enabled)
+ return map->phys_map[cpuid].vcpu;
+
+ return NULL;
+}
+
+static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
+{
+ int cpuid;
+ struct loongarch_csrs *csr = vcpu->arch.csr;
+ struct kvm_phyid_map *map;
+
+ map = vcpu->kvm->arch.phyid_map;
+ cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
+ if (cpuid >= KVM_MAX_PHYID)
+ return;
+
+ if (map->phys_map[cpuid].enabled) {
+ map->phys_map[cpuid].vcpu = NULL;
+ map->phys_map[cpuid].enabled = false;
+ kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
+ }
+}
+
static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
{
int ret = 0, gintc;
@@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
return ret;
- }
+ } else if (id == LOONGARCH_CSR_CPUID)
+ return kvm_set_cpuid(vcpu, val);
kvm_write_sw_gcsr(csr, id, val);
@@ -925,6 +1015,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
hrtimer_cancel(&vcpu->arch.swtimer);
kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
kfree(vcpu->arch.csr);
+ kvm_drop_cpuid(vcpu);
/*
* If the vCPU is freed and reused as another vCPU, we don't want the
diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
index 0a37f6fa8f2d..6006a28653ad 100644
--- a/arch/loongarch/kvm/vm.c
+++ b/arch/loongarch/kvm/vm.c
@@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (!kvm->arch.pgd)
return -ENOMEM;
+ kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
+ GFP_KERNEL_ACCOUNT);
+ if (!kvm->arch.phyid_map) {
+ free_page((unsigned long)kvm->arch.pgd);
+ kvm->arch.pgd = NULL;
+ return -ENOMEM;
+ }
+
kvm_init_vmcs(kvm);
kvm->arch.gpa_size = BIT(cpu_vabits - 1);
kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
@@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
for (i = 0; i <= kvm->arch.root_level; i++)
kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);
+ spin_lock_init(&kvm->arch.phyid_map_lock);
return 0;
}
@@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
{
kvm_destroy_vcpus(kvm);
free_page((unsigned long)kvm->arch.pgd);
+ kvfree(kvm->arch.phyid_map);
kvm->arch.pgd = NULL;
+ kvm->arch.phyid_map = NULL;
}
int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
--
2.39.3
On LoongArch system, ipi hw uses iocsr registers, there is one iocsr
register access on ipi sending, and two iocsr access on ipi receiving
which is ipi interrupt handler. On VM mode all iocsr accessing will
cause VM to trap into hypervisor. So with one ipi hw notification
there will be three times of trap.
PV ipi is added for VM, hypercall instruction is used for ipi sender,
and hypervisor will inject SWI to destination vcpu. During SWI interrupt
handler, only estat CSR register is written to clear irq. Estat CSR
register access will not trap into hypervisor. So with pv ipi supported,
there is one trap with pv ipi sender, and no trap with ipi receiver,
there is only one trap with ipi notification.
Also this patch adds ipi multicast support, the method is similar with
x86. With ipi multicast support, ipi notification can be sent to at most
128 vcpus at one time. It reduces trap times into hypervisor greatly.
Signed-off-by: Bibo Mao <[email protected]>
---
arch/loongarch/include/asm/hardirq.h | 1 +
arch/loongarch/include/asm/kvm_host.h | 1 +
arch/loongarch/include/asm/kvm_para.h | 123 +++++++++++++++++++++++++
arch/loongarch/include/asm/loongarch.h | 1 +
arch/loongarch/kernel/irq.c | 2 +-
arch/loongarch/kernel/paravirt.c | 112 ++++++++++++++++++++++
arch/loongarch/kernel/setup.c | 1 +
arch/loongarch/kernel/smp.c | 2 +-
arch/loongarch/kvm/exit.c | 73 ++++++++++++++-
arch/loongarch/kvm/vcpu.c | 1 +
10 files changed, 313 insertions(+), 4 deletions(-)
diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..b26d596a73aa 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
typedef struct {
unsigned int ipi_irqs[NR_IPI];
unsigned int __softirq_pending;
+ atomic_t message ____cacheline_aligned_in_smp;
} ____cacheline_aligned irq_cpustat_t;
DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
index 3ba16ef1fe69..0b96c6303cf7 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -43,6 +43,7 @@ struct kvm_vcpu_stat {
u64 idle_exits;
u64 cpucfg_exits;
u64 signal_exits;
+ u64 hypercall_exits;
};
#define KVM_MEM_HUGEPAGE_CAPABLE (1UL << 0)
diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h
index af5d677a9052..a82bffbbf8a1 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -8,6 +8,9 @@
#define HYPERVISOR_KVM 1
#define HYPERVISOR_VENDOR_SHIFT 8
#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code)
+#define KVM_HCALL_CODE_PV_SERVICE 0
+#define KVM_HCALL_PV_SERVICE HYPERCALL_CODE(HYPERVISOR_KVM, KVM_HCALL_CODE_PV_SERVICE)
+#define KVM_HCALL_FUNC_PV_IPI 1
/*
* LoongArch hypercall return code
@@ -16,6 +19,126 @@
#define KVM_HCALL_INVALID_CODE -1UL
#define KVM_HCALL_INVALID_PARAMETER -2UL
+/*
+ * Hypercall interface for KVM hypervisor
+ *
+ * a0: function identifier
+ * a1-a6: args
+ * Return value will be placed in v0.
+ * Up to 6 arguments are passed in a1, a2, a3, a4, a5, a6.
+ */
+static __always_inline long kvm_hypercall(u64 fid)
+{
+ register long ret asm("v0");
+ register unsigned long fun asm("a0") = fid;
+
+ __asm__ __volatile__(
+ "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
+ : "=r" (ret)
+ : "r" (fun)
+ : "memory"
+ );
+
+ return ret;
+}
+
+static __always_inline long kvm_hypercall1(u64 fid, unsigned long arg0)
+{
+ register long ret asm("v0");
+ register unsigned long fun asm("a0") = fid;
+ register unsigned long a1 asm("a1") = arg0;
+
+ __asm__ __volatile__(
+ "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
+ : "=r" (ret)
+ : "r" (fun), "r" (a1)
+ : "memory"
+ );
+
+ return ret;
+}
+
+static __always_inline long kvm_hypercall2(u64 fid,
+ unsigned long arg0, unsigned long arg1)
+{
+ register long ret asm("v0");
+ register unsigned long fun asm("a0") = fid;
+ register unsigned long a1 asm("a1") = arg0;
+ register unsigned long a2 asm("a2") = arg1;
+
+ __asm__ __volatile__(
+ "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
+ : "=r" (ret)
+ : "r" (fun), "r" (a1), "r" (a2)
+ : "memory"
+ );
+
+ return ret;
+}
+
+static __always_inline long kvm_hypercall3(u64 fid,
+ unsigned long arg0, unsigned long arg1, unsigned long arg2)
+{
+ register long ret asm("v0");
+ register unsigned long fun asm("a0") = fid;
+ register unsigned long a1 asm("a1") = arg0;
+ register unsigned long a2 asm("a2") = arg1;
+ register unsigned long a3 asm("a3") = arg2;
+
+ __asm__ __volatile__(
+ "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
+ : "=r" (ret)
+ : "r" (fun), "r" (a1), "r" (a2), "r" (a3)
+ : "memory"
+ );
+
+ return ret;
+}
+
+static __always_inline long kvm_hypercall4(u64 fid,
+ unsigned long arg0, unsigned long arg1, unsigned long arg2,
+ unsigned long arg3)
+{
+ register long ret asm("v0");
+ register unsigned long fun asm("a0") = fid;
+ register unsigned long a1 asm("a1") = arg0;
+ register unsigned long a2 asm("a2") = arg1;
+ register unsigned long a3 asm("a3") = arg2;
+ register unsigned long a4 asm("a4") = arg3;
+
+ __asm__ __volatile__(
+ "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
+ : "=r" (ret)
+ : "r"(fun), "r" (a1), "r" (a2), "r" (a3), "r" (a4)
+ : "memory"
+ );
+
+ return ret;
+}
+
+static __always_inline long kvm_hypercall5(u64 fid,
+ unsigned long arg0, unsigned long arg1, unsigned long arg2,
+ unsigned long arg3, unsigned long arg4)
+{
+ register long ret asm("v0");
+ register unsigned long fun asm("a0") = fid;
+ register unsigned long a1 asm("a1") = arg0;
+ register unsigned long a2 asm("a2") = arg1;
+ register unsigned long a3 asm("a3") = arg2;
+ register unsigned long a4 asm("a4") = arg3;
+ register unsigned long a5 asm("a5") = arg4;
+
+ __asm__ __volatile__(
+ "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
+ : "=r" (ret)
+ : "r"(fun), "r" (a1), "r" (a2), "r" (a3), "r" (a4), "r" (a5)
+ : "memory"
+ );
+
+ return ret;
+}
+
+
static inline unsigned int kvm_arch_para_features(void)
{
return 0;
diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
index a1d22e8b6f94..0ad36704cb4b 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -167,6 +167,7 @@
#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
#define KVM_SIGNATURE "KVM\0"
#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
+#define KVM_FEATURE_PV_IPI BIT(1)
#ifndef __ASSEMBLY__
diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index ce36897d1e5a..4863e6c1b739 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -113,5 +113,5 @@ void __init init_IRQ(void)
per_cpu(irq_stack, i), per_cpu(irq_stack, i) + IRQ_STACK_SIZE);
}
- set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
+ set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
}
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
index 5cf794e8490f..4c30e1c73c72 100644
--- a/arch/loongarch/kernel/paravirt.c
+++ b/arch/loongarch/kernel/paravirt.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/export.h>
#include <linux/types.h>
+#include <linux/interrupt.h>
#include <linux/jump_label.h>
#include <linux/kvm_para.h>
#include <asm/paravirt.h>
@@ -16,6 +17,103 @@ static u64 native_steal_clock(int cpu)
DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+#ifdef CONFIG_SMP
+static void pv_send_ipi_single(int cpu, unsigned int action)
+{
+ unsigned int min, old;
+ unsigned long bitmap = 0;
+ irq_cpustat_t *info = &per_cpu(irq_stat, cpu);
+
+ action = BIT(action);
+ old = atomic_fetch_or(action, &info->message);
+ if (old == 0) {
+ min = cpu_logical_map(cpu);
+ bitmap = 1;
+ kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI, bitmap, 0, min);
+ }
+}
+
+#define KVM_IPI_CLUSTER_SIZE (2 * BITS_PER_LONG)
+static void pv_send_ipi_mask(const struct cpumask *mask, unsigned int action)
+{
+ unsigned int cpu, i, min = 0, max = 0, old;
+ __uint128_t bitmap = 0;
+ irq_cpustat_t *info;
+
+ if (cpumask_empty(mask))
+ return;
+
+ action = BIT(action);
+ for_each_cpu(i, mask) {
+ info = &per_cpu(irq_stat, i);
+ old = atomic_fetch_or(action, &info->message);
+ if (old)
+ continue;
+
+ cpu = cpu_logical_map(i);
+ if (!bitmap) {
+ min = max = cpu;
+ } else if (cpu > min && cpu < min + KVM_IPI_CLUSTER_SIZE) {
+ max = cpu > max ? cpu : max;
+ } else if (cpu < min && (max - cpu) < KVM_IPI_CLUSTER_SIZE) {
+ bitmap <<= min - cpu;
+ min = cpu;
+ } else {
+ /*
+ * Physical cpuid is sorted in ascending order ascend
+ * for the next mask calculation, send IPI here
+ * directly and skip the remainding cpus
+ */
+ kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI,
+ (unsigned long)bitmap,
+ (unsigned long)(bitmap >> BITS_PER_LONG), min);
+ min = max = cpu;
+ bitmap = 0;
+ }
+ __set_bit(cpu - min, (unsigned long *)&bitmap);
+ }
+
+ if (bitmap)
+ kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI, (unsigned long)bitmap,
+ (unsigned long)(bitmap >> BITS_PER_LONG), min);
+}
+
+static irqreturn_t loongson_do_swi(int irq, void *dev)
+{
+ irq_cpustat_t *info;
+ long action;
+
+ /* Clear swi interrupt */
+ clear_csr_estat(1 << INT_SWI0);
+ info = this_cpu_ptr(&irq_stat);
+ action = atomic_xchg(&info->message, 0);
+ if (action & SMP_CALL_FUNCTION) {
+ generic_smp_call_function_interrupt();
+ info->ipi_irqs[IPI_CALL_FUNCTION]++;
+ }
+
+ if (action & SMP_RESCHEDULE) {
+ scheduler_ipi();
+ info->ipi_irqs[IPI_RESCHEDULE]++;
+ }
+
+ return IRQ_HANDLED;
+}
+
+static void pv_init_ipi(void)
+{
+ int r, swi0;
+
+ swi0 = get_percpu_irq(INT_SWI0);
+ if (swi0 < 0)
+ panic("SWI0 IRQ mapping failed\n");
+ irq_set_percpu_devid(swi0);
+ r = request_percpu_irq(swi0, loongson_do_swi, "SWI0", &irq_stat);
+ if (r < 0)
+ panic("SWI0 IRQ request failed\n");
+}
+#endif
+
static bool kvm_para_available(void)
{
static int hypervisor_type;
@@ -32,10 +130,24 @@ static bool kvm_para_available(void)
int __init pv_ipi_init(void)
{
+ int feature;
+
if (!cpu_has_hypervisor)
return 0;
if (!kvm_para_available())
return 0;
+ /*
+ * check whether KVM hypervisor supports pv_ipi or not
+ */
+ feature = read_cpucfg(CPUCFG_KVM_FEATURE);
+#ifdef CONFIG_SMP
+ if (feature & KVM_FEATURE_PV_IPI) {
+ smp_ops.init_ipi = pv_init_ipi;
+ smp_ops.send_ipi_single = pv_send_ipi_single;
+ smp_ops.send_ipi_mask = pv_send_ipi_mask;
+ }
+#endif
+
return 1;
}
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index b79a1244b56f..c95ed3224b7d 100644
--- a/arch/loongarch/kernel/setup.c
+++ b/arch/loongarch/kernel/setup.c
@@ -368,6 +368,7 @@ void __init platform_init(void)
pr_info("The BIOS Version: %s\n", b_info.bios_version);
efi_runtime_init();
+ pv_ipi_init();
}
static void __init check_kernel_sections_mem(void)
diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
index 2182e7cc2ed6..9e9fda1fe18a 100644
--- a/arch/loongarch/kernel/smp.c
+++ b/arch/loongarch/kernel/smp.c
@@ -285,7 +285,7 @@ void loongson_boot_secondary(int cpu, struct task_struct *idle)
void loongson_init_secondary(void)
{
unsigned int cpu = smp_processor_id();
- unsigned int imask = ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 |
+ unsigned int imask = ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 |
ECFGF_IPI | ECFGF_PMC | ECFGF_TIMER;
change_csr_ecfg(ECFG0_IM, imask);
diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
index 6a38fd59d86d..46940e97975b 100644
--- a/arch/loongarch/kvm/exit.c
+++ b/arch/loongarch/kvm/exit.c
@@ -227,6 +227,9 @@ static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
case CPUCFG_KVM_SIG:
vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
break;
+ case CPUCFG_KVM_FEATURE:
+ vcpu->arch.gprs[rd] = KVM_FEATURE_PV_IPI;
+ break;
default:
vcpu->arch.gprs[rd] = 0;
break;
@@ -699,12 +702,78 @@ static int kvm_handle_lasx_disabled(struct kvm_vcpu *vcpu)
return RESUME_GUEST;
}
+static int kvm_pv_send_ipi(struct kvm_vcpu *vcpu)
+{
+ unsigned long ipi_bitmap;
+ unsigned int min, cpu, i;
+ struct kvm_vcpu *dest;
+
+ min = vcpu->arch.gprs[LOONGARCH_GPR_A3];
+ for (i = 0; i < 2; i++) {
+ ipi_bitmap = vcpu->arch.gprs[LOONGARCH_GPR_A1 + i];
+ if (!ipi_bitmap)
+ continue;
+
+ cpu = find_first_bit((void *)&ipi_bitmap, BITS_PER_LONG);
+ while (cpu < BITS_PER_LONG) {
+ dest = kvm_get_vcpu_by_cpuid(vcpu->kvm, cpu + min);
+ cpu = find_next_bit((void *)&ipi_bitmap, BITS_PER_LONG,
+ cpu + 1);
+ if (!dest)
+ continue;
+
+ /*
+ * Send SWI0 to dest vcpu to emulate IPI interrupt
+ */
+ kvm_queue_irq(dest, INT_SWI0);
+ kvm_vcpu_kick(dest);
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * hypercall emulation always return to guest, Caller should check retval.
+ */
+static void kvm_handle_pv_service(struct kvm_vcpu *vcpu)
+{
+ unsigned long func = vcpu->arch.gprs[LOONGARCH_GPR_A0];
+ long ret;
+
+ switch (func) {
+ case KVM_HCALL_FUNC_PV_IPI:
+ kvm_pv_send_ipi(vcpu);
+ ret = KVM_HCALL_STATUS_SUCCESS;
+ break;
+ default:
+ ret = KVM_HCALL_INVALID_CODE;
+ break;
+ };
+
+ vcpu->arch.gprs[LOONGARCH_GPR_A0] = ret;
+}
+
static int kvm_handle_hypercall(struct kvm_vcpu *vcpu)
{
+ larch_inst inst;
+ unsigned int code;
+
+ inst.word = vcpu->arch.badi;
+ code = inst.reg0i15_format.immediate;
update_pc(&vcpu->arch);
- /* Treat it as noop intruction, only set return value */
- vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HCALL_INVALID_CODE;
+ switch (code) {
+ case KVM_HCALL_PV_SERVICE:
+ vcpu->stat.hypercall_exits++;
+ kvm_handle_pv_service(vcpu);
+ break;
+ default:
+ /* Treat it as noop intruction, only set return value */
+ vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HCALL_INVALID_CODE;
+ break;
+ }
+
return RESUME_GUEST;
}
diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index 40296d8ef297..24fd5e4647f3 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -19,6 +19,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
STATS_DESC_COUNTER(VCPU, idle_exits),
STATS_DESC_COUNTER(VCPU, cpucfg_exits),
STATS_DESC_COUNTER(VCPU, signal_exits),
+ STATS_DESC_COUNTER(VCPU, hypercall_exits)
};
const struct kvm_stats_header kvm_vcpu_stats_header = {
--
2.39.3
Hi, Bibo,
On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]> wrote:
>
> Instruction cpucfg can be used to get processor features. And there
> is trap exception when it is executed in VM mode, and also it is
> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
> is used. Here one specified area 0x40000000 -- 0x400000ff is used
> for KVM hypervisor to privide PV features, and the area can be extended
> for other hypervisors in future. This area will never be used for
> real HW, it is only used by software.
After reading and thinking, I find that the hypercall method which is
used in our productive kernel is better than this cpucfg method.
Because hypercall is more simple and straightforward, plus we don't
worry about conflicting with the real hardware.
Huacai
>
> Signed-off-by: Bibo Mao <[email protected]>
> ---
> arch/loongarch/include/asm/inst.h | 1 +
> arch/loongarch/include/asm/loongarch.h | 10 ++++++
> arch/loongarch/kvm/exit.c | 46 +++++++++++++++++---------
> 3 files changed, 41 insertions(+), 16 deletions(-)
>
> diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
> index d8f637f9e400..ad120f924905 100644
> --- a/arch/loongarch/include/asm/inst.h
> +++ b/arch/loongarch/include/asm/inst.h
> @@ -67,6 +67,7 @@ enum reg2_op {
> revhd_op = 0x11,
> extwh_op = 0x16,
> extwb_op = 0x17,
> + cpucfg_op = 0x1b,
> iocsrrdb_op = 0x19200,
> iocsrrdh_op = 0x19201,
> iocsrrdw_op = 0x19202,
> diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
> index 46366e783c84..a1d22e8b6f94 100644
> --- a/arch/loongarch/include/asm/loongarch.h
> +++ b/arch/loongarch/include/asm/loongarch.h
> @@ -158,6 +158,16 @@
> #define CPUCFG48_VFPU_CG BIT(2)
> #define CPUCFG48_RAM_CG BIT(3)
>
> +/*
> + * cpucfg index area: 0x40000000 -- 0x400000ff
> + * SW emulation for KVM hypervirsor
> + */
> +#define CPUCFG_KVM_BASE 0x40000000UL
> +#define CPUCFG_KVM_SIZE 0x100
> +#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
> +#define KVM_SIGNATURE "KVM\0"
> +#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
> +
> #ifndef __ASSEMBLY__
>
> /* CSR */
> diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
> index 923bbca9bd22..6a38fd59d86d 100644
> --- a/arch/loongarch/kvm/exit.c
> +++ b/arch/loongarch/kvm/exit.c
> @@ -206,10 +206,37 @@ int kvm_emu_idle(struct kvm_vcpu *vcpu)
> return EMULATE_DONE;
> }
>
> -static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
> +static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
> {
> int rd, rj;
> unsigned int index;
> +
> + rd = inst.reg2_format.rd;
> + rj = inst.reg2_format.rj;
> + ++vcpu->stat.cpucfg_exits;
> + index = vcpu->arch.gprs[rj];
> +
> + /*
> + * By LoongArch Reference Manual 2.2.10.5
> + * Return value is 0 for undefined cpucfg index
> + */
> + switch (index) {
> + case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
> + vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
> + break;
> + case CPUCFG_KVM_SIG:
> + vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
> + break;
> + default:
> + vcpu->arch.gprs[rd] = 0;
> + break;
> + }
> +
> + return EMULATE_DONE;
> +}
> +
> +static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
> +{
> unsigned long curr_pc;
> larch_inst inst;
> enum emulation_result er = EMULATE_DONE;
> @@ -224,21 +251,8 @@ static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
> er = EMULATE_FAIL;
> switch (((inst.word >> 24) & 0xff)) {
> case 0x0: /* CPUCFG GSPR */
> - if (inst.reg2_format.opcode == 0x1B) {
> - rd = inst.reg2_format.rd;
> - rj = inst.reg2_format.rj;
> - ++vcpu->stat.cpucfg_exits;
> - index = vcpu->arch.gprs[rj];
> - er = EMULATE_DONE;
> - /*
> - * By LoongArch Reference Manual 2.2.10.5
> - * return value is 0 for undefined cpucfg index
> - */
> - if (index < KVM_MAX_CPUCFG_REGS)
> - vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
> - else
> - vcpu->arch.gprs[rd] = 0;
> - }
> + if (inst.reg2_format.opcode == cpucfg_op)
> + er = kvm_emu_cpucfg(vcpu, inst);
> break;
> case 0x4: /* CSR{RD,WR,XCHG} GSPR */
> er = kvm_handle_csr(vcpu, inst);
> --
> 2.39.3
>
Hi, Bibo,
On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]> wrote:
>
> Paravirt interface pv_ipi_init() is added here for guest kernel, it
> firstly checks whether system runs on VM mode. If kernel runs on VM mode,
> it will call function kvm_para_available() to detect current VMM type.
> Now only KVM VMM type is detected,the paravirt function can work only if
> current VMM is KVM hypervisor, since there is only KVM hypervisor
> supported on LoongArch now.
>
> There is not effective with pv_ipi_init() now, it is dummy function.
>
> Signed-off-by: Bibo Mao <[email protected]>
> ---
> arch/loongarch/Kconfig | 9 ++++
> arch/loongarch/include/asm/kvm_para.h | 7 ++++
> arch/loongarch/include/asm/paravirt.h | 27 ++++++++++++
> .../include/asm/paravirt_api_clock.h | 1 +
> arch/loongarch/kernel/Makefile | 1 +
> arch/loongarch/kernel/paravirt.c | 41 +++++++++++++++++++
> arch/loongarch/kernel/setup.c | 1 +
> 7 files changed, 87 insertions(+)
> create mode 100644 arch/loongarch/include/asm/paravirt.h
> create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
> create mode 100644 arch/loongarch/kernel/paravirt.c
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 929f68926b34..fdaae9a0435c 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -587,6 +587,15 @@ config CPU_HAS_PREFETCH
> bool
> default y
>
> +config PARAVIRT
> + bool "Enable paravirtualization code"
> + depends on AS_HAS_LVZ_EXTENSION
> + help
> + This changes the kernel so it can modify itself when it is run
> + under a hypervisor, potentially improving performance significantly
> + over full virtualization. However, when run without a hypervisor
> + the kernel is theoretically slower and slightly larger.
> +
> config ARCH_SUPPORTS_KEXEC
> def_bool y
>
> diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h
> index d48f993ae206..af5d677a9052 100644
> --- a/arch/loongarch/include/asm/kvm_para.h
> +++ b/arch/loongarch/include/asm/kvm_para.h
> @@ -2,6 +2,13 @@
> #ifndef _ASM_LOONGARCH_KVM_PARA_H
> #define _ASM_LOONGARCH_KVM_PARA_H
>
> +/*
> + * Hypercall code field
> + */
> +#define HYPERVISOR_KVM 1
> +#define HYPERVISOR_VENDOR_SHIFT 8
> +#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code)
> +
> /*
> * LoongArch hypercall return code
> */
> diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h
> new file mode 100644
> index 000000000000..58f7b7b89f2c
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> +#define _ASM_LOONGARCH_PARAVIRT_H
> +
> +#ifdef CONFIG_PARAVIRT
> +#include <linux/static_call_types.h>
> +struct static_key;
> +extern struct static_key paravirt_steal_enabled;
> +extern struct static_key paravirt_steal_rq_enabled;
> +
> +u64 dummy_steal_clock(int cpu);
> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> +
> +static inline u64 paravirt_steal_clock(int cpu)
> +{
> + return static_call(pv_steal_clock)(cpu);
> +}
> +
> +int pv_ipi_init(void);
> +#else
> +static inline int pv_ipi_init(void)
> +{
> + return 0;
> +}
> +
> +#endif // CONFIG_PARAVIRT
> +#endif
> diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h
> new file mode 100644
> index 000000000000..65ac7cee0dad
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt_api_clock.h
> @@ -0,0 +1 @@
> +#include <asm/paravirt.h>
> diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
> index 3c808c680370..662e6e9de12d 100644
> --- a/arch/loongarch/kernel/Makefile
> +++ b/arch/loongarch/kernel/Makefile
> @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
> obj-$(CONFIG_STACKTRACE) += stacktrace.o
>
> obj-$(CONFIG_PROC_FS) += proc.o
> +obj-$(CONFIG_PARAVIRT) += paravirt.o
>
> obj-$(CONFIG_SMP) += smp.o
>
> diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
> new file mode 100644
> index 000000000000..5cf794e8490f
> --- /dev/null
> +++ b/arch/loongarch/kernel/paravirt.c
> @@ -0,0 +1,41 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/export.h>
> +#include <linux/types.h>
> +#include <linux/jump_label.h>
> +#include <linux/kvm_para.h>
> +#include <asm/paravirt.h>
> +#include <linux/static_call.h>
> +
> +struct static_key paravirt_steal_enabled;
> +struct static_key paravirt_steal_rq_enabled;
> +
> +static u64 native_steal_clock(int cpu)
> +{
> + return 0;
> +}
> +
> +DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
> +
> +static bool kvm_para_available(void)
> +{
> + static int hypervisor_type;
> + int config;
> +
> + if (!hypervisor_type) {
> + config = read_cpucfg(CPUCFG_KVM_SIG);
> + if (!memcmp(&config, KVM_SIGNATURE, 4))
> + hypervisor_type = HYPERVISOR_KVM;
> + }
> +
> + return hypervisor_type == HYPERVISOR_KVM;
> +}
> +
> +int __init pv_ipi_init(void)
> +{
> + if (!cpu_has_hypervisor)
> + return 0;
> + if (!kvm_para_available())
> + return 0;
> +
> + return 1;
> +}
pv_ipi_init() and its declaration should also be moved to the last
patch. And if you think this patch is too small, you can squash the
whole patch to the last one.
Huacai
> diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
> index edf2bba80130..b79a1244b56f 100644
> --- a/arch/loongarch/kernel/setup.c
> +++ b/arch/loongarch/kernel/setup.c
> @@ -43,6 +43,7 @@
> #include <asm/efi.h>
> #include <asm/loongson.h>
> #include <asm/numa.h>
> +#include <asm/paravirt.h>
> #include <asm/pgalloc.h>
> #include <asm/sections.h>
> #include <asm/setup.h>
> --
> 2.39.3
>
Hi, Bibo,
On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]> wrote:
>
> On LoongArch system, ipi hw uses iocsr registers, there is one iocsr
> register access on ipi sending, and two iocsr access on ipi receiving
> which is ipi interrupt handler. On VM mode all iocsr accessing will
> cause VM to trap into hypervisor. So with one ipi hw notification
> there will be three times of trap.
>
> PV ipi is added for VM, hypercall instruction is used for ipi sender,
> and hypervisor will inject SWI to destination vcpu. During SWI interrupt
> handler, only estat CSR register is written to clear irq. Estat CSR
> register access will not trap into hypervisor. So with pv ipi supported,
> there is one trap with pv ipi sender, and no trap with ipi receiver,
> there is only one trap with ipi notification.
>
> Also this patch adds ipi multicast support, the method is similar with
> x86. With ipi multicast support, ipi notification can be sent to at most
> 128 vcpus at one time. It reduces trap times into hypervisor greatly.
>
> Signed-off-by: Bibo Mao <[email protected]>
> ---
> arch/loongarch/include/asm/hardirq.h | 1 +
> arch/loongarch/include/asm/kvm_host.h | 1 +
> arch/loongarch/include/asm/kvm_para.h | 123 +++++++++++++++++++++++++
> arch/loongarch/include/asm/loongarch.h | 1 +
> arch/loongarch/kernel/irq.c | 2 +-
> arch/loongarch/kernel/paravirt.c | 112 ++++++++++++++++++++++
> arch/loongarch/kernel/setup.c | 1 +
> arch/loongarch/kernel/smp.c | 2 +-
> arch/loongarch/kvm/exit.c | 73 ++++++++++++++-
> arch/loongarch/kvm/vcpu.c | 1 +
> 10 files changed, 313 insertions(+), 4 deletions(-)
>
> diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h
> index 9f0038e19c7f..b26d596a73aa 100644
> --- a/arch/loongarch/include/asm/hardirq.h
> +++ b/arch/loongarch/include/asm/hardirq.h
> @@ -21,6 +21,7 @@ enum ipi_msg_type {
> typedef struct {
> unsigned int ipi_irqs[NR_IPI];
> unsigned int __softirq_pending;
> + atomic_t message ____cacheline_aligned_in_smp;
> } ____cacheline_aligned irq_cpustat_t;
>
> DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
> index 3ba16ef1fe69..0b96c6303cf7 100644
> --- a/arch/loongarch/include/asm/kvm_host.h
> +++ b/arch/loongarch/include/asm/kvm_host.h
> @@ -43,6 +43,7 @@ struct kvm_vcpu_stat {
> u64 idle_exits;
> u64 cpucfg_exits;
> u64 signal_exits;
> + u64 hypercall_exits;
> };
>
> #define KVM_MEM_HUGEPAGE_CAPABLE (1UL << 0)
> diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h
> index af5d677a9052..a82bffbbf8a1 100644
> --- a/arch/loongarch/include/asm/kvm_para.h
> +++ b/arch/loongarch/include/asm/kvm_para.h
> @@ -8,6 +8,9 @@
> #define HYPERVISOR_KVM 1
> #define HYPERVISOR_VENDOR_SHIFT 8
> #define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code)
> +#define KVM_HCALL_CODE_PV_SERVICE 0
> +#define KVM_HCALL_PV_SERVICE HYPERCALL_CODE(HYPERVISOR_KVM, KVM_HCALL_CODE_PV_SERVICE)
> +#define KVM_HCALL_FUNC_PV_IPI 1
>
> /*
> * LoongArch hypercall return code
> @@ -16,6 +19,126 @@
> #define KVM_HCALL_INVALID_CODE -1UL
> #define KVM_HCALL_INVALID_PARAMETER -2UL
>
> +/*
> + * Hypercall interface for KVM hypervisor
> + *
> + * a0: function identifier
> + * a1-a6: args
> + * Return value will be placed in v0.
> + * Up to 6 arguments are passed in a1, a2, a3, a4, a5, a6.
> + */
> +static __always_inline long kvm_hypercall(u64 fid)
> +{
> + register long ret asm("v0");
> + register unsigned long fun asm("a0") = fid;
> +
> + __asm__ __volatile__(
> + "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
> + : "=r" (ret)
> + : "r" (fun)
> + : "memory"
> + );
> +
> + return ret;
> +}
> +
> +static __always_inline long kvm_hypercall1(u64 fid, unsigned long arg0)
> +{
> + register long ret asm("v0");
> + register unsigned long fun asm("a0") = fid;
> + register unsigned long a1 asm("a1") = arg0;
> +
> + __asm__ __volatile__(
> + "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
> + : "=r" (ret)
> + : "r" (fun), "r" (a1)
> + : "memory"
> + );
> +
> + return ret;
> +}
> +
> +static __always_inline long kvm_hypercall2(u64 fid,
> + unsigned long arg0, unsigned long arg1)
> +{
> + register long ret asm("v0");
> + register unsigned long fun asm("a0") = fid;
> + register unsigned long a1 asm("a1") = arg0;
> + register unsigned long a2 asm("a2") = arg1;
> +
> + __asm__ __volatile__(
> + "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
> + : "=r" (ret)
> + : "r" (fun), "r" (a1), "r" (a2)
> + : "memory"
> + );
> +
> + return ret;
> +}
> +
> +static __always_inline long kvm_hypercall3(u64 fid,
> + unsigned long arg0, unsigned long arg1, unsigned long arg2)
> +{
> + register long ret asm("v0");
> + register unsigned long fun asm("a0") = fid;
> + register unsigned long a1 asm("a1") = arg0;
> + register unsigned long a2 asm("a2") = arg1;
> + register unsigned long a3 asm("a3") = arg2;
> +
> + __asm__ __volatile__(
> + "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
> + : "=r" (ret)
> + : "r" (fun), "r" (a1), "r" (a2), "r" (a3)
> + : "memory"
> + );
> +
> + return ret;
> +}
> +
> +static __always_inline long kvm_hypercall4(u64 fid,
> + unsigned long arg0, unsigned long arg1, unsigned long arg2,
> + unsigned long arg3)
> +{
> + register long ret asm("v0");
> + register unsigned long fun asm("a0") = fid;
> + register unsigned long a1 asm("a1") = arg0;
> + register unsigned long a2 asm("a2") = arg1;
> + register unsigned long a3 asm("a3") = arg2;
> + register unsigned long a4 asm("a4") = arg3;
> +
> + __asm__ __volatile__(
> + "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
> + : "=r" (ret)
> + : "r"(fun), "r" (a1), "r" (a2), "r" (a3), "r" (a4)
> + : "memory"
> + );
> +
> + return ret;
> +}
> +
> +static __always_inline long kvm_hypercall5(u64 fid,
> + unsigned long arg0, unsigned long arg1, unsigned long arg2,
> + unsigned long arg3, unsigned long arg4)
> +{
> + register long ret asm("v0");
> + register unsigned long fun asm("a0") = fid;
> + register unsigned long a1 asm("a1") = arg0;
> + register unsigned long a2 asm("a2") = arg1;
> + register unsigned long a3 asm("a3") = arg2;
> + register unsigned long a4 asm("a4") = arg3;
> + register unsigned long a5 asm("a5") = arg4;
> +
> + __asm__ __volatile__(
> + "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
> + : "=r" (ret)
> + : "r"(fun), "r" (a1), "r" (a2), "r" (a3), "r" (a4), "r" (a5)
> + : "memory"
> + );
> +
> + return ret;
> +}
> +
> +
> static inline unsigned int kvm_arch_para_features(void)
> {
> return 0;
> diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
> index a1d22e8b6f94..0ad36704cb4b 100644
> --- a/arch/loongarch/include/asm/loongarch.h
> +++ b/arch/loongarch/include/asm/loongarch.h
> @@ -167,6 +167,7 @@
> #define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
> #define KVM_SIGNATURE "KVM\0"
> #define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
> +#define KVM_FEATURE_PV_IPI BIT(1)
>
> #ifndef __ASSEMBLY__
>
> diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
> index ce36897d1e5a..4863e6c1b739 100644
> --- a/arch/loongarch/kernel/irq.c
> +++ b/arch/loongarch/kernel/irq.c
> @@ -113,5 +113,5 @@ void __init init_IRQ(void)
> per_cpu(irq_stack, i), per_cpu(irq_stack, i) + IRQ_STACK_SIZE);
> }
>
> - set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
> + set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
> }
> diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
> index 5cf794e8490f..4c30e1c73c72 100644
> --- a/arch/loongarch/kernel/paravirt.c
> +++ b/arch/loongarch/kernel/paravirt.c
> @@ -1,6 +1,7 @@
> // SPDX-License-Identifier: GPL-2.0
> #include <linux/export.h>
> #include <linux/types.h>
> +#include <linux/interrupt.h>
> #include <linux/jump_label.h>
> #include <linux/kvm_para.h>
> #include <asm/paravirt.h>
> @@ -16,6 +17,103 @@ static u64 native_steal_clock(int cpu)
>
> DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
>
> +#ifdef CONFIG_SMP
> +static void pv_send_ipi_single(int cpu, unsigned int action)
> +{
> + unsigned int min, old;
> + unsigned long bitmap = 0;
> + irq_cpustat_t *info = &per_cpu(irq_stat, cpu);
> +
> + action = BIT(action);
> + old = atomic_fetch_or(action, &info->message);
> + if (old == 0) {
> + min = cpu_logical_map(cpu);
> + bitmap = 1;
> + kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI, bitmap, 0, min);
> + }
Early return style can make it a little easy, which means:
if (old)
return;
min = ......
> +}
> +
> +#define KVM_IPI_CLUSTER_SIZE (2 * BITS_PER_LONG)
> +static void pv_send_ipi_mask(const struct cpumask *mask, unsigned int action)
> +{
> + unsigned int cpu, i, min = 0, max = 0, old;
> + __uint128_t bitmap = 0;
> + irq_cpustat_t *info;
> +
> + if (cpumask_empty(mask))
> + return;
> +
> + action = BIT(action);
> + for_each_cpu(i, mask) {
> + info = &per_cpu(irq_stat, i);
> + old = atomic_fetch_or(action, &info->message);
> + if (old)
> + continue;
> +
> + cpu = cpu_logical_map(i);
> + if (!bitmap) {
> + min = max = cpu;
> + } else if (cpu > min && cpu < min + KVM_IPI_CLUSTER_SIZE) {
> + max = cpu > max ? cpu : max;
> + } else if (cpu < min && (max - cpu) < KVM_IPI_CLUSTER_SIZE) {
> + bitmap <<= min - cpu;
> + min = cpu;
> + } else {
> + /*
> + * Physical cpuid is sorted in ascending order ascend
> + * for the next mask calculation, send IPI here
> + * directly and skip the remainding cpus
> + */
> + kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI,
> + (unsigned long)bitmap,
> + (unsigned long)(bitmap >> BITS_PER_LONG), min);
> + min = max = cpu;
> + bitmap = 0;
> + }
> + __set_bit(cpu - min, (unsigned long *)&bitmap);
> + }
> +
> + if (bitmap)
> + kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI, (unsigned long)bitmap,
> + (unsigned long)(bitmap >> BITS_PER_LONG), min);
> +}
> +
> +static irqreturn_t loongson_do_swi(int irq, void *dev)
> +{
> + irq_cpustat_t *info;
> + long action;
> +
> + /* Clear swi interrupt */
> + clear_csr_estat(1 << INT_SWI0);
> + info = this_cpu_ptr(&irq_stat);
> + action = atomic_xchg(&info->message, 0);
> + if (action & SMP_CALL_FUNCTION) {
> + generic_smp_call_function_interrupt();
> + info->ipi_irqs[IPI_CALL_FUNCTION]++;
> + }
> +
> + if (action & SMP_RESCHEDULE) {
> + scheduler_ipi();
> + info->ipi_irqs[IPI_RESCHEDULE]++;
> + }
> +
> + return IRQ_HANDLED;
> +}
> +
> +static void pv_init_ipi(void)
> +{
> + int r, swi0;
> +
> + swi0 = get_percpu_irq(INT_SWI0);
> + if (swi0 < 0)
> + panic("SWI0 IRQ mapping failed\n");
> + irq_set_percpu_devid(swi0);
> + r = request_percpu_irq(swi0, loongson_do_swi, "SWI0", &irq_stat);
> + if (r < 0)
> + panic("SWI0 IRQ request failed\n");
> +}
> +#endif
> +
> static bool kvm_para_available(void)
> {
> static int hypervisor_type;
> @@ -32,10 +130,24 @@ static bool kvm_para_available(void)
>
> int __init pv_ipi_init(void)
> {
> + int feature;
> +
> if (!cpu_has_hypervisor)
> return 0;
> if (!kvm_para_available())
> return 0;
>
> + /*
> + * check whether KVM hypervisor supports pv_ipi or not
> + */
> + feature = read_cpucfg(CPUCFG_KVM_FEATURE);
> +#ifdef CONFIG_SMP
> + if (feature & KVM_FEATURE_PV_IPI) {
> + smp_ops.init_ipi = pv_init_ipi;
> + smp_ops.send_ipi_single = pv_send_ipi_single;
> + smp_ops.send_ipi_mask = pv_send_ipi_mask;
> + }
> +#endif
> +
> return 1;
> }
> diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
> index b79a1244b56f..c95ed3224b7d 100644
> --- a/arch/loongarch/kernel/setup.c
> +++ b/arch/loongarch/kernel/setup.c
> @@ -368,6 +368,7 @@ void __init platform_init(void)
> pr_info("The BIOS Version: %s\n", b_info.bios_version);
>
> efi_runtime_init();
> + pv_ipi_init();
Move the callsite to loongson_smp_setup() is better.
Huacai
> }
>
> static void __init check_kernel_sections_mem(void)
> diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
> index 2182e7cc2ed6..9e9fda1fe18a 100644
> --- a/arch/loongarch/kernel/smp.c
> +++ b/arch/loongarch/kernel/smp.c
> @@ -285,7 +285,7 @@ void loongson_boot_secondary(int cpu, struct task_struct *idle)
> void loongson_init_secondary(void)
> {
> unsigned int cpu = smp_processor_id();
> - unsigned int imask = ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 |
> + unsigned int imask = ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 |
> ECFGF_IPI | ECFGF_PMC | ECFGF_TIMER;
>
> change_csr_ecfg(ECFG0_IM, imask);
> diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
> index 6a38fd59d86d..46940e97975b 100644
> --- a/arch/loongarch/kvm/exit.c
> +++ b/arch/loongarch/kvm/exit.c
> @@ -227,6 +227,9 @@ static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
> case CPUCFG_KVM_SIG:
> vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
> break;
> + case CPUCFG_KVM_FEATURE:
> + vcpu->arch.gprs[rd] = KVM_FEATURE_PV_IPI;
> + break;
> default:
> vcpu->arch.gprs[rd] = 0;
> break;
> @@ -699,12 +702,78 @@ static int kvm_handle_lasx_disabled(struct kvm_vcpu *vcpu)
> return RESUME_GUEST;
> }
>
> +static int kvm_pv_send_ipi(struct kvm_vcpu *vcpu)
> +{
> + unsigned long ipi_bitmap;
> + unsigned int min, cpu, i;
> + struct kvm_vcpu *dest;
> +
> + min = vcpu->arch.gprs[LOONGARCH_GPR_A3];
> + for (i = 0; i < 2; i++) {
> + ipi_bitmap = vcpu->arch.gprs[LOONGARCH_GPR_A1 + i];
> + if (!ipi_bitmap)
> + continue;
> +
> + cpu = find_first_bit((void *)&ipi_bitmap, BITS_PER_LONG);
> + while (cpu < BITS_PER_LONG) {
> + dest = kvm_get_vcpu_by_cpuid(vcpu->kvm, cpu + min);
> + cpu = find_next_bit((void *)&ipi_bitmap, BITS_PER_LONG,
> + cpu + 1);
> + if (!dest)
> + continue;
> +
> + /*
> + * Send SWI0 to dest vcpu to emulate IPI interrupt
> + */
> + kvm_queue_irq(dest, INT_SWI0);
> + kvm_vcpu_kick(dest);
> + }
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * hypercall emulation always return to guest, Caller should check retval.
> + */
> +static void kvm_handle_pv_service(struct kvm_vcpu *vcpu)
> +{
> + unsigned long func = vcpu->arch.gprs[LOONGARCH_GPR_A0];
> + long ret;
> +
> + switch (func) {
> + case KVM_HCALL_FUNC_PV_IPI:
> + kvm_pv_send_ipi(vcpu);
> + ret = KVM_HCALL_STATUS_SUCCESS;
> + break;
> + default:
> + ret = KVM_HCALL_INVALID_CODE;
> + break;
> + };
> +
> + vcpu->arch.gprs[LOONGARCH_GPR_A0] = ret;
> +}
> +
> static int kvm_handle_hypercall(struct kvm_vcpu *vcpu)
> {
> + larch_inst inst;
> + unsigned int code;
> +
> + inst.word = vcpu->arch.badi;
> + code = inst.reg0i15_format.immediate;
> update_pc(&vcpu->arch);
>
> - /* Treat it as noop intruction, only set return value */
> - vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HCALL_INVALID_CODE;
> + switch (code) {
> + case KVM_HCALL_PV_SERVICE:
> + vcpu->stat.hypercall_exits++;
> + kvm_handle_pv_service(vcpu);
> + break;
> + default:
> + /* Treat it as noop intruction, only set return value */
> + vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HCALL_INVALID_CODE;
> + break;
> + }
> +
> return RESUME_GUEST;
> }
>
> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
> index 40296d8ef297..24fd5e4647f3 100644
> --- a/arch/loongarch/kvm/vcpu.c
> +++ b/arch/loongarch/kvm/vcpu.c
> @@ -19,6 +19,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
> STATS_DESC_COUNTER(VCPU, idle_exits),
> STATS_DESC_COUNTER(VCPU, cpucfg_exits),
> STATS_DESC_COUNTER(VCPU, signal_exits),
> + STATS_DESC_COUNTER(VCPU, hypercall_exits)
> };
>
> const struct kvm_stats_header kvm_vcpu_stats_header = {
> --
> 2.39.3
>
>
On 2024/2/24 下午5:13, Huacai Chen wrote:
> Hi, Bibo,
>
> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]> wrote:
>>
>> Instruction cpucfg can be used to get processor features. And there
>> is trap exception when it is executed in VM mode, and also it is
>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
>> is used. Here one specified area 0x40000000 -- 0x400000ff is used
>> for KVM hypervisor to privide PV features, and the area can be extended
>> for other hypervisors in future. This area will never be used for
>> real HW, it is only used by software.
> After reading and thinking, I find that the hypercall method which is
> used in our productive kernel is better than this cpucfg method.
> Because hypercall is more simple and straightforward, plus we don't
> worry about conflicting with the real hardware.
No, I do not think so. cpucfg is simper than hypercall, hypercall can
be in effect when system runs in guest mode. In some scenario like TCG
mode, hypercall is illegal intruction, however cpucfg can work.
Extioi virtualization extension will be added later, cpucfg can be used
to get extioi features. It is unlikely that extioi driver depends on
PARA_VIRT macro if hypercall is used to get features.
Regards
Bibo Mao
>
> Huacai
>
>>
>> Signed-off-by: Bibo Mao <[email protected]>
>> ---
>> arch/loongarch/include/asm/inst.h | 1 +
>> arch/loongarch/include/asm/loongarch.h | 10 ++++++
>> arch/loongarch/kvm/exit.c | 46 +++++++++++++++++---------
>> 3 files changed, 41 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
>> index d8f637f9e400..ad120f924905 100644
>> --- a/arch/loongarch/include/asm/inst.h
>> +++ b/arch/loongarch/include/asm/inst.h
>> @@ -67,6 +67,7 @@ enum reg2_op {
>> revhd_op = 0x11,
>> extwh_op = 0x16,
>> extwb_op = 0x17,
>> + cpucfg_op = 0x1b,
>> iocsrrdb_op = 0x19200,
>> iocsrrdh_op = 0x19201,
>> iocsrrdw_op = 0x19202,
>> diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
>> index 46366e783c84..a1d22e8b6f94 100644
>> --- a/arch/loongarch/include/asm/loongarch.h
>> +++ b/arch/loongarch/include/asm/loongarch.h
>> @@ -158,6 +158,16 @@
>> #define CPUCFG48_VFPU_CG BIT(2)
>> #define CPUCFG48_RAM_CG BIT(3)
>>
>> +/*
>> + * cpucfg index area: 0x40000000 -- 0x400000ff
>> + * SW emulation for KVM hypervirsor
>> + */
>> +#define CPUCFG_KVM_BASE 0x40000000UL
>> +#define CPUCFG_KVM_SIZE 0x100
>> +#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
>> +#define KVM_SIGNATURE "KVM\0"
>> +#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
>> +
>> #ifndef __ASSEMBLY__
>>
>> /* CSR */
>> diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
>> index 923bbca9bd22..6a38fd59d86d 100644
>> --- a/arch/loongarch/kvm/exit.c
>> +++ b/arch/loongarch/kvm/exit.c
>> @@ -206,10 +206,37 @@ int kvm_emu_idle(struct kvm_vcpu *vcpu)
>> return EMULATE_DONE;
>> }
>>
>> -static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>> +static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
>> {
>> int rd, rj;
>> unsigned int index;
>> +
>> + rd = inst.reg2_format.rd;
>> + rj = inst.reg2_format.rj;
>> + ++vcpu->stat.cpucfg_exits;
>> + index = vcpu->arch.gprs[rj];
>> +
>> + /*
>> + * By LoongArch Reference Manual 2.2.10.5
>> + * Return value is 0 for undefined cpucfg index
>> + */
>> + switch (index) {
>> + case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
>> + vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
>> + break;
>> + case CPUCFG_KVM_SIG:
>> + vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
>> + break;
>> + default:
>> + vcpu->arch.gprs[rd] = 0;
>> + break;
>> + }
>> +
>> + return EMULATE_DONE;
>> +}
>> +
>> +static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>> +{
>> unsigned long curr_pc;
>> larch_inst inst;
>> enum emulation_result er = EMULATE_DONE;
>> @@ -224,21 +251,8 @@ static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>> er = EMULATE_FAIL;
>> switch (((inst.word >> 24) & 0xff)) {
>> case 0x0: /* CPUCFG GSPR */
>> - if (inst.reg2_format.opcode == 0x1B) {
>> - rd = inst.reg2_format.rd;
>> - rj = inst.reg2_format.rj;
>> - ++vcpu->stat.cpucfg_exits;
>> - index = vcpu->arch.gprs[rj];
>> - er = EMULATE_DONE;
>> - /*
>> - * By LoongArch Reference Manual 2.2.10.5
>> - * return value is 0 for undefined cpucfg index
>> - */
>> - if (index < KVM_MAX_CPUCFG_REGS)
>> - vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
>> - else
>> - vcpu->arch.gprs[rd] = 0;
>> - }
>> + if (inst.reg2_format.opcode == cpucfg_op)
>> + er = kvm_emu_cpucfg(vcpu, inst);
>> break;
>> case 0x4: /* CSR{RD,WR,XCHG} GSPR */
>> er = kvm_handle_csr(vcpu, inst);
>> --
>> 2.39.3
>>
On 2024/2/24 下午5:13, Huacai Chen wrote:
> Hi, Bibo,
>
> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]> wrote:
>>
>> Instruction cpucfg can be used to get processor features. And there
>> is trap exception when it is executed in VM mode, and also it is
>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
>> is used. Here one specified area 0x40000000 -- 0x400000ff is used
>> for KVM hypervisor to privide PV features, and the area can be extended
>> for other hypervisors in future. This area will never be used for
>> real HW, it is only used by software.
> After reading and thinking, I find that the hypercall method which is
> used in our productive kernel is better than this cpucfg method.
> Because hypercall is more simple and straightforward, plus we don't
> worry about conflicting with the real hardware.
About cpucfg area 0x40000000 -- 0x400000ff, I have negotiated with chip
designer. This area will never be used with real hardware.
Regards
Bibo Mao
>
> Huacai
>
>>
>> Signed-off-by: Bibo Mao <[email protected]>
>> ---
>> arch/loongarch/include/asm/inst.h | 1 +
>> arch/loongarch/include/asm/loongarch.h | 10 ++++++
>> arch/loongarch/kvm/exit.c | 46 +++++++++++++++++---------
>> 3 files changed, 41 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
>> index d8f637f9e400..ad120f924905 100644
>> --- a/arch/loongarch/include/asm/inst.h
>> +++ b/arch/loongarch/include/asm/inst.h
>> @@ -67,6 +67,7 @@ enum reg2_op {
>> revhd_op = 0x11,
>> extwh_op = 0x16,
>> extwb_op = 0x17,
>> + cpucfg_op = 0x1b,
>> iocsrrdb_op = 0x19200,
>> iocsrrdh_op = 0x19201,
>> iocsrrdw_op = 0x19202,
>> diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
>> index 46366e783c84..a1d22e8b6f94 100644
>> --- a/arch/loongarch/include/asm/loongarch.h
>> +++ b/arch/loongarch/include/asm/loongarch.h
>> @@ -158,6 +158,16 @@
>> #define CPUCFG48_VFPU_CG BIT(2)
>> #define CPUCFG48_RAM_CG BIT(3)
>>
>> +/*
>> + * cpucfg index area: 0x40000000 -- 0x400000ff
>> + * SW emulation for KVM hypervirsor
>> + */
>> +#define CPUCFG_KVM_BASE 0x40000000UL
>> +#define CPUCFG_KVM_SIZE 0x100
>> +#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
>> +#define KVM_SIGNATURE "KVM\0"
>> +#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
>> +
>> #ifndef __ASSEMBLY__
>>
>> /* CSR */
>> diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
>> index 923bbca9bd22..6a38fd59d86d 100644
>> --- a/arch/loongarch/kvm/exit.c
>> +++ b/arch/loongarch/kvm/exit.c
>> @@ -206,10 +206,37 @@ int kvm_emu_idle(struct kvm_vcpu *vcpu)
>> return EMULATE_DONE;
>> }
>>
>> -static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>> +static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
>> {
>> int rd, rj;
>> unsigned int index;
>> +
>> + rd = inst.reg2_format.rd;
>> + rj = inst.reg2_format.rj;
>> + ++vcpu->stat.cpucfg_exits;
>> + index = vcpu->arch.gprs[rj];
>> +
>> + /*
>> + * By LoongArch Reference Manual 2.2.10.5
>> + * Return value is 0 for undefined cpucfg index
>> + */
>> + switch (index) {
>> + case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
>> + vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
>> + break;
>> + case CPUCFG_KVM_SIG:
>> + vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
>> + break;
>> + default:
>> + vcpu->arch.gprs[rd] = 0;
>> + break;
>> + }
>> +
>> + return EMULATE_DONE;
>> +}
>> +
>> +static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>> +{
>> unsigned long curr_pc;
>> larch_inst inst;
>> enum emulation_result er = EMULATE_DONE;
>> @@ -224,21 +251,8 @@ static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>> er = EMULATE_FAIL;
>> switch (((inst.word >> 24) & 0xff)) {
>> case 0x0: /* CPUCFG GSPR */
>> - if (inst.reg2_format.opcode == 0x1B) {
>> - rd = inst.reg2_format.rd;
>> - rj = inst.reg2_format.rj;
>> - ++vcpu->stat.cpucfg_exits;
>> - index = vcpu->arch.gprs[rj];
>> - er = EMULATE_DONE;
>> - /*
>> - * By LoongArch Reference Manual 2.2.10.5
>> - * return value is 0 for undefined cpucfg index
>> - */
>> - if (index < KVM_MAX_CPUCFG_REGS)
>> - vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
>> - else
>> - vcpu->arch.gprs[rd] = 0;
>> - }
>> + if (inst.reg2_format.opcode == cpucfg_op)
>> + er = kvm_emu_cpucfg(vcpu, inst);
>> break;
>> case 0x4: /* CSR{RD,WR,XCHG} GSPR */
>> er = kvm_handle_csr(vcpu, inst);
>> --
>> 2.39.3
>>
On 2024/2/24 下午5:15, Huacai Chen wrote:
> Hi, Bibo,
>
> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]> wrote:
>>
>> Paravirt interface pv_ipi_init() is added here for guest kernel, it
>> firstly checks whether system runs on VM mode. If kernel runs on VM mode,
>> it will call function kvm_para_available() to detect current VMM type.
>> Now only KVM VMM type is detected,the paravirt function can work only if
>> current VMM is KVM hypervisor, since there is only KVM hypervisor
>> supported on LoongArch now.
>>
>> There is not effective with pv_ipi_init() now, it is dummy function.
>>
>> Signed-off-by: Bibo Mao <[email protected]>
>> ---
>> arch/loongarch/Kconfig | 9 ++++
>> arch/loongarch/include/asm/kvm_para.h | 7 ++++
>> arch/loongarch/include/asm/paravirt.h | 27 ++++++++++++
>> .../include/asm/paravirt_api_clock.h | 1 +
>> arch/loongarch/kernel/Makefile | 1 +
>> arch/loongarch/kernel/paravirt.c | 41 +++++++++++++++++++
>> arch/loongarch/kernel/setup.c | 1 +
>> 7 files changed, 87 insertions(+)
>> create mode 100644 arch/loongarch/include/asm/paravirt.h
>> create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
>> create mode 100644 arch/loongarch/kernel/paravirt.c
>>
>> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
>> index 929f68926b34..fdaae9a0435c 100644
>> --- a/arch/loongarch/Kconfig
>> +++ b/arch/loongarch/Kconfig
>> @@ -587,6 +587,15 @@ config CPU_HAS_PREFETCH
>> bool
>> default y
>>
>> +config PARAVIRT
>> + bool "Enable paravirtualization code"
>> + depends on AS_HAS_LVZ_EXTENSION
>> + help
>> + This changes the kernel so it can modify itself when it is run
>> + under a hypervisor, potentially improving performance significantly
>> + over full virtualization. However, when run without a hypervisor
>> + the kernel is theoretically slower and slightly larger.
>> +
>> config ARCH_SUPPORTS_KEXEC
>> def_bool y
>>
>> diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h
>> index d48f993ae206..af5d677a9052 100644
>> --- a/arch/loongarch/include/asm/kvm_para.h
>> +++ b/arch/loongarch/include/asm/kvm_para.h
>> @@ -2,6 +2,13 @@
>> #ifndef _ASM_LOONGARCH_KVM_PARA_H
>> #define _ASM_LOONGARCH_KVM_PARA_H
>>
>> +/*
>> + * Hypercall code field
>> + */
>> +#define HYPERVISOR_KVM 1
>> +#define HYPERVISOR_VENDOR_SHIFT 8
>> +#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code)
>> +
>> /*
>> * LoongArch hypercall return code
>> */
>> diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h
>> new file mode 100644
>> index 000000000000..58f7b7b89f2c
>> --- /dev/null
>> +++ b/arch/loongarch/include/asm/paravirt.h
>> @@ -0,0 +1,27 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
>> +#define _ASM_LOONGARCH_PARAVIRT_H
>> +
>> +#ifdef CONFIG_PARAVIRT
>> +#include <linux/static_call_types.h>
>> +struct static_key;
>> +extern struct static_key paravirt_steal_enabled;
>> +extern struct static_key paravirt_steal_rq_enabled;
>> +
>> +u64 dummy_steal_clock(int cpu);
>> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
>> +
>> +static inline u64 paravirt_steal_clock(int cpu)
>> +{
>> + return static_call(pv_steal_clock)(cpu);
>> +}
>> +
>> +int pv_ipi_init(void);
>> +#else
>> +static inline int pv_ipi_init(void)
>> +{
>> + return 0;
>> +}
>> +
>> +#endif // CONFIG_PARAVIRT
>> +#endif
>> diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h
>> new file mode 100644
>> index 000000000000..65ac7cee0dad
>> --- /dev/null
>> +++ b/arch/loongarch/include/asm/paravirt_api_clock.h
>> @@ -0,0 +1 @@
>> +#include <asm/paravirt.h>
>> diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
>> index 3c808c680370..662e6e9de12d 100644
>> --- a/arch/loongarch/kernel/Makefile
>> +++ b/arch/loongarch/kernel/Makefile
>> @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
>> obj-$(CONFIG_STACKTRACE) += stacktrace.o
>>
>> obj-$(CONFIG_PROC_FS) += proc.o
>> +obj-$(CONFIG_PARAVIRT) += paravirt.o
>>
>> obj-$(CONFIG_SMP) += smp.o
>>
>> diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
>> new file mode 100644
>> index 000000000000..5cf794e8490f
>> --- /dev/null
>> +++ b/arch/loongarch/kernel/paravirt.c
>> @@ -0,0 +1,41 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +#include <linux/export.h>
>> +#include <linux/types.h>
>> +#include <linux/jump_label.h>
>> +#include <linux/kvm_para.h>
>> +#include <asm/paravirt.h>
>> +#include <linux/static_call.h>
>> +
>> +struct static_key paravirt_steal_enabled;
>> +struct static_key paravirt_steal_rq_enabled;
>> +
>> +static u64 native_steal_clock(int cpu)
>> +{
>> + return 0;
>> +}
>> +
>> +DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
>> +
>> +static bool kvm_para_available(void)
>> +{
>> + static int hypervisor_type;
>> + int config;
>> +
>> + if (!hypervisor_type) {
>> + config = read_cpucfg(CPUCFG_KVM_SIG);
>> + if (!memcmp(&config, KVM_SIGNATURE, 4))
>> + hypervisor_type = HYPERVISOR_KVM;
>> + }
>> +
>> + return hypervisor_type == HYPERVISOR_KVM;
>> +}
>> +
>> +int __init pv_ipi_init(void)
>> +{
>> + if (!cpu_has_hypervisor)
>> + return 0;
>> + if (!kvm_para_available())
>> + return 0;
>> +
>> + return 1;
>> +}
> pv_ipi_init() and its declaration should also be moved to the last
> patch. And if you think this patch is too small, you can squash the
> whole patch to the last one.
I can move the whole patch to the last one. I do not think that is is
reasonable to move function pv_ipi_init() to the last patch.
Regards
Bibo Mao
>
> Huacai
>
>> diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
>> index edf2bba80130..b79a1244b56f 100644
>> --- a/arch/loongarch/kernel/setup.c
>> +++ b/arch/loongarch/kernel/setup.c
>> @@ -43,6 +43,7 @@
>> #include <asm/efi.h>
>> #include <asm/loongson.h>
>> #include <asm/numa.h>
>> +#include <asm/paravirt.h>
>> #include <asm/pgalloc.h>
>> #include <asm/sections.h>
>> #include <asm/setup.h>
>> --
>> 2.39.3
>>
On 2024/2/24 下午5:19, Huacai Chen wrote:
> Hi, Bibo,
>
> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]> wrote:
>>
>> On LoongArch system, ipi hw uses iocsr registers, there is one iocsr
>> register access on ipi sending, and two iocsr access on ipi receiving
>> which is ipi interrupt handler. On VM mode all iocsr accessing will
>> cause VM to trap into hypervisor. So with one ipi hw notification
>> there will be three times of trap.
>>
>> PV ipi is added for VM, hypercall instruction is used for ipi sender,
>> and hypervisor will inject SWI to destination vcpu. During SWI interrupt
>> handler, only estat CSR register is written to clear irq. Estat CSR
>> register access will not trap into hypervisor. So with pv ipi supported,
>> there is one trap with pv ipi sender, and no trap with ipi receiver,
>> there is only one trap with ipi notification.
>>
>> Also this patch adds ipi multicast support, the method is similar with
>> x86. With ipi multicast support, ipi notification can be sent to at most
>> 128 vcpus at one time. It reduces trap times into hypervisor greatly.
>>
>> Signed-off-by: Bibo Mao <[email protected]>
>> ---
>> arch/loongarch/include/asm/hardirq.h | 1 +
>> arch/loongarch/include/asm/kvm_host.h | 1 +
>> arch/loongarch/include/asm/kvm_para.h | 123 +++++++++++++++++++++++++
>> arch/loongarch/include/asm/loongarch.h | 1 +
>> arch/loongarch/kernel/irq.c | 2 +-
>> arch/loongarch/kernel/paravirt.c | 112 ++++++++++++++++++++++
>> arch/loongarch/kernel/setup.c | 1 +
>> arch/loongarch/kernel/smp.c | 2 +-
>> arch/loongarch/kvm/exit.c | 73 ++++++++++++++-
>> arch/loongarch/kvm/vcpu.c | 1 +
>> 10 files changed, 313 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h
>> index 9f0038e19c7f..b26d596a73aa 100644
>> --- a/arch/loongarch/include/asm/hardirq.h
>> +++ b/arch/loongarch/include/asm/hardirq.h
>> @@ -21,6 +21,7 @@ enum ipi_msg_type {
>> typedef struct {
>> unsigned int ipi_irqs[NR_IPI];
>> unsigned int __softirq_pending;
>> + atomic_t message ____cacheline_aligned_in_smp;
>> } ____cacheline_aligned irq_cpustat_t;
>>
>> DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
>> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
>> index 3ba16ef1fe69..0b96c6303cf7 100644
>> --- a/arch/loongarch/include/asm/kvm_host.h
>> +++ b/arch/loongarch/include/asm/kvm_host.h
>> @@ -43,6 +43,7 @@ struct kvm_vcpu_stat {
>> u64 idle_exits;
>> u64 cpucfg_exits;
>> u64 signal_exits;
>> + u64 hypercall_exits;
>> };
>>
>> #define KVM_MEM_HUGEPAGE_CAPABLE (1UL << 0)
>> diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h
>> index af5d677a9052..a82bffbbf8a1 100644
>> --- a/arch/loongarch/include/asm/kvm_para.h
>> +++ b/arch/loongarch/include/asm/kvm_para.h
>> @@ -8,6 +8,9 @@
>> #define HYPERVISOR_KVM 1
>> #define HYPERVISOR_VENDOR_SHIFT 8
>> #define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code)
>> +#define KVM_HCALL_CODE_PV_SERVICE 0
>> +#define KVM_HCALL_PV_SERVICE HYPERCALL_CODE(HYPERVISOR_KVM, KVM_HCALL_CODE_PV_SERVICE)
>> +#define KVM_HCALL_FUNC_PV_IPI 1
>>
>> /*
>> * LoongArch hypercall return code
>> @@ -16,6 +19,126 @@
>> #define KVM_HCALL_INVALID_CODE -1UL
>> #define KVM_HCALL_INVALID_PARAMETER -2UL
>>
>> +/*
>> + * Hypercall interface for KVM hypervisor
>> + *
>> + * a0: function identifier
>> + * a1-a6: args
>> + * Return value will be placed in v0.
>> + * Up to 6 arguments are passed in a1, a2, a3, a4, a5, a6.
>> + */
>> +static __always_inline long kvm_hypercall(u64 fid)
>> +{
>> + register long ret asm("v0");
>> + register unsigned long fun asm("a0") = fid;
>> +
>> + __asm__ __volatile__(
>> + "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
>> + : "=r" (ret)
>> + : "r" (fun)
>> + : "memory"
>> + );
>> +
>> + return ret;
>> +}
>> +
>> +static __always_inline long kvm_hypercall1(u64 fid, unsigned long arg0)
>> +{
>> + register long ret asm("v0");
>> + register unsigned long fun asm("a0") = fid;
>> + register unsigned long a1 asm("a1") = arg0;
>> +
>> + __asm__ __volatile__(
>> + "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
>> + : "=r" (ret)
>> + : "r" (fun), "r" (a1)
>> + : "memory"
>> + );
>> +
>> + return ret;
>> +}
>> +
>> +static __always_inline long kvm_hypercall2(u64 fid,
>> + unsigned long arg0, unsigned long arg1)
>> +{
>> + register long ret asm("v0");
>> + register unsigned long fun asm("a0") = fid;
>> + register unsigned long a1 asm("a1") = arg0;
>> + register unsigned long a2 asm("a2") = arg1;
>> +
>> + __asm__ __volatile__(
>> + "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
>> + : "=r" (ret)
>> + : "r" (fun), "r" (a1), "r" (a2)
>> + : "memory"
>> + );
>> +
>> + return ret;
>> +}
>> +
>> +static __always_inline long kvm_hypercall3(u64 fid,
>> + unsigned long arg0, unsigned long arg1, unsigned long arg2)
>> +{
>> + register long ret asm("v0");
>> + register unsigned long fun asm("a0") = fid;
>> + register unsigned long a1 asm("a1") = arg0;
>> + register unsigned long a2 asm("a2") = arg1;
>> + register unsigned long a3 asm("a3") = arg2;
>> +
>> + __asm__ __volatile__(
>> + "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
>> + : "=r" (ret)
>> + : "r" (fun), "r" (a1), "r" (a2), "r" (a3)
>> + : "memory"
>> + );
>> +
>> + return ret;
>> +}
>> +
>> +static __always_inline long kvm_hypercall4(u64 fid,
>> + unsigned long arg0, unsigned long arg1, unsigned long arg2,
>> + unsigned long arg3)
>> +{
>> + register long ret asm("v0");
>> + register unsigned long fun asm("a0") = fid;
>> + register unsigned long a1 asm("a1") = arg0;
>> + register unsigned long a2 asm("a2") = arg1;
>> + register unsigned long a3 asm("a3") = arg2;
>> + register unsigned long a4 asm("a4") = arg3;
>> +
>> + __asm__ __volatile__(
>> + "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
>> + : "=r" (ret)
>> + : "r"(fun), "r" (a1), "r" (a2), "r" (a3), "r" (a4)
>> + : "memory"
>> + );
>> +
>> + return ret;
>> +}
>> +
>> +static __always_inline long kvm_hypercall5(u64 fid,
>> + unsigned long arg0, unsigned long arg1, unsigned long arg2,
>> + unsigned long arg3, unsigned long arg4)
>> +{
>> + register long ret asm("v0");
>> + register unsigned long fun asm("a0") = fid;
>> + register unsigned long a1 asm("a1") = arg0;
>> + register unsigned long a2 asm("a2") = arg1;
>> + register unsigned long a3 asm("a3") = arg2;
>> + register unsigned long a4 asm("a4") = arg3;
>> + register unsigned long a5 asm("a5") = arg4;
>> +
>> + __asm__ __volatile__(
>> + "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
>> + : "=r" (ret)
>> + : "r"(fun), "r" (a1), "r" (a2), "r" (a3), "r" (a4), "r" (a5)
>> + : "memory"
>> + );
>> +
>> + return ret;
>> +}
>> +
>> +
>> static inline unsigned int kvm_arch_para_features(void)
>> {
>> return 0;
>> diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
>> index a1d22e8b6f94..0ad36704cb4b 100644
>> --- a/arch/loongarch/include/asm/loongarch.h
>> +++ b/arch/loongarch/include/asm/loongarch.h
>> @@ -167,6 +167,7 @@
>> #define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
>> #define KVM_SIGNATURE "KVM\0"
>> #define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
>> +#define KVM_FEATURE_PV_IPI BIT(1)
>>
>> #ifndef __ASSEMBLY__
>>
>> diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
>> index ce36897d1e5a..4863e6c1b739 100644
>> --- a/arch/loongarch/kernel/irq.c
>> +++ b/arch/loongarch/kernel/irq.c
>> @@ -113,5 +113,5 @@ void __init init_IRQ(void)
>> per_cpu(irq_stack, i), per_cpu(irq_stack, i) + IRQ_STACK_SIZE);
>> }
>>
>> - set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
>> + set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
>> }
>> diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
>> index 5cf794e8490f..4c30e1c73c72 100644
>> --- a/arch/loongarch/kernel/paravirt.c
>> +++ b/arch/loongarch/kernel/paravirt.c
>> @@ -1,6 +1,7 @@
>> // SPDX-License-Identifier: GPL-2.0
>> #include <linux/export.h>
>> #include <linux/types.h>
>> +#include <linux/interrupt.h>
>> #include <linux/jump_label.h>
>> #include <linux/kvm_para.h>
>> #include <asm/paravirt.h>
>> @@ -16,6 +17,103 @@ static u64 native_steal_clock(int cpu)
>>
>> DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
>>
>> +#ifdef CONFIG_SMP
>> +static void pv_send_ipi_single(int cpu, unsigned int action)
>> +{
>> + unsigned int min, old;
>> + unsigned long bitmap = 0;
>> + irq_cpustat_t *info = &per_cpu(irq_stat, cpu);
>> +
>> + action = BIT(action);
>> + old = atomic_fetch_or(action, &info->message);
>> + if (old == 0) {
>> + min = cpu_logical_map(cpu);
>> + bitmap = 1;
>> + kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI, bitmap, 0, min);
>> + }
> Early return style can make it a little easy, which means:
>
> if (old)
> return;
>
> min = ......
>
will do in next patch.
>> +}
>> +
>> +#define KVM_IPI_CLUSTER_SIZE (2 * BITS_PER_LONG)
>> +static void pv_send_ipi_mask(const struct cpumask *mask, unsigned int action)
>> +{
>> + unsigned int cpu, i, min = 0, max = 0, old;
>> + __uint128_t bitmap = 0;
>> + irq_cpustat_t *info;
>> +
>> + if (cpumask_empty(mask))
>> + return;
>> +
>> + action = BIT(action);
>> + for_each_cpu(i, mask) {
>> + info = &per_cpu(irq_stat, i);
>> + old = atomic_fetch_or(action, &info->message);
>> + if (old)
>> + continue;
>> +
>> + cpu = cpu_logical_map(i);
>> + if (!bitmap) {
>> + min = max = cpu;
>> + } else if (cpu > min && cpu < min + KVM_IPI_CLUSTER_SIZE) {
>> + max = cpu > max ? cpu : max;
>> + } else if (cpu < min && (max - cpu) < KVM_IPI_CLUSTER_SIZE) {
>> + bitmap <<= min - cpu;
>> + min = cpu;
>> + } else {
>> + /*
>> + * Physical cpuid is sorted in ascending order ascend
>> + * for the next mask calculation, send IPI here
>> + * directly and skip the remainding cpus
>> + */
>> + kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI,
>> + (unsigned long)bitmap,
>> + (unsigned long)(bitmap >> BITS_PER_LONG), min);
>> + min = max = cpu;
>> + bitmap = 0;
>> + }
>> + __set_bit(cpu - min, (unsigned long *)&bitmap);
>> + }
>> +
>> + if (bitmap)
>> + kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI, (unsigned long)bitmap,
>> + (unsigned long)(bitmap >> BITS_PER_LONG), min);
>> +}
>> +
>> +static irqreturn_t loongson_do_swi(int irq, void *dev)
>> +{
>> + irq_cpustat_t *info;
>> + long action;
>> +
>> + /* Clear swi interrupt */
>> + clear_csr_estat(1 << INT_SWI0);
>> + info = this_cpu_ptr(&irq_stat);
>> + action = atomic_xchg(&info->message, 0);
>> + if (action & SMP_CALL_FUNCTION) {
>> + generic_smp_call_function_interrupt();
>> + info->ipi_irqs[IPI_CALL_FUNCTION]++;
>> + }
>> +
>> + if (action & SMP_RESCHEDULE) {
>> + scheduler_ipi();
>> + info->ipi_irqs[IPI_RESCHEDULE]++;
>> + }
>> +
>> + return IRQ_HANDLED;
>> +}
>> +
>> +static void pv_init_ipi(void)
>> +{
>> + int r, swi0;
>> +
>> + swi0 = get_percpu_irq(INT_SWI0);
>> + if (swi0 < 0)
>> + panic("SWI0 IRQ mapping failed\n");
>> + irq_set_percpu_devid(swi0);
>> + r = request_percpu_irq(swi0, loongson_do_swi, "SWI0", &irq_stat);
>> + if (r < 0)
>> + panic("SWI0 IRQ request failed\n");
>> +}
>> +#endif
>> +
>> static bool kvm_para_available(void)
>> {
>> static int hypervisor_type;
>> @@ -32,10 +130,24 @@ static bool kvm_para_available(void)
>>
>> int __init pv_ipi_init(void)
>> {
>> + int feature;
>> +
>> if (!cpu_has_hypervisor)
>> return 0;
>> if (!kvm_para_available())
>> return 0;
>>
>> + /*
>> + * check whether KVM hypervisor supports pv_ipi or not
>> + */
>> + feature = read_cpucfg(CPUCFG_KVM_FEATURE);
>> +#ifdef CONFIG_SMP
>> + if (feature & KVM_FEATURE_PV_IPI) {
>> + smp_ops.init_ipi = pv_init_ipi;
>> + smp_ops.send_ipi_single = pv_send_ipi_single;
>> + smp_ops.send_ipi_mask = pv_send_ipi_mask;
>> + }
>> +#endif
>> +
>> return 1;
>> }
>> diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
>> index b79a1244b56f..c95ed3224b7d 100644
>> --- a/arch/loongarch/kernel/setup.c
>> +++ b/arch/loongarch/kernel/setup.c
>> @@ -368,6 +368,7 @@ void __init platform_init(void)
>> pr_info("The BIOS Version: %s\n", b_info.bios_version);
>>
>> efi_runtime_init();
>> + pv_ipi_init();
> Move the callsite to loongson_smp_setup() is better.
Will do in next patch.
Regards
Bibo Mao
>
> Huacai
>
>> }
>>
>> static void __init check_kernel_sections_mem(void)
>> diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
>> index 2182e7cc2ed6..9e9fda1fe18a 100644
>> --- a/arch/loongarch/kernel/smp.c
>> +++ b/arch/loongarch/kernel/smp.c
>> @@ -285,7 +285,7 @@ void loongson_boot_secondary(int cpu, struct task_struct *idle)
>> void loongson_init_secondary(void)
>> {
>> unsigned int cpu = smp_processor_id();
>> - unsigned int imask = ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 |
>> + unsigned int imask = ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 |
>> ECFGF_IPI | ECFGF_PMC | ECFGF_TIMER;
>>
>> change_csr_ecfg(ECFG0_IM, imask);
>> diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
>> index 6a38fd59d86d..46940e97975b 100644
>> --- a/arch/loongarch/kvm/exit.c
>> +++ b/arch/loongarch/kvm/exit.c
>> @@ -227,6 +227,9 @@ static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
>> case CPUCFG_KVM_SIG:
>> vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
>> break;
>> + case CPUCFG_KVM_FEATURE:
>> + vcpu->arch.gprs[rd] = KVM_FEATURE_PV_IPI;
>> + break;
>> default:
>> vcpu->arch.gprs[rd] = 0;
>> break;
>> @@ -699,12 +702,78 @@ static int kvm_handle_lasx_disabled(struct kvm_vcpu *vcpu)
>> return RESUME_GUEST;
>> }
>>
>> +static int kvm_pv_send_ipi(struct kvm_vcpu *vcpu)
>> +{
>> + unsigned long ipi_bitmap;
>> + unsigned int min, cpu, i;
>> + struct kvm_vcpu *dest;
>> +
>> + min = vcpu->arch.gprs[LOONGARCH_GPR_A3];
>> + for (i = 0; i < 2; i++) {
>> + ipi_bitmap = vcpu->arch.gprs[LOONGARCH_GPR_A1 + i];
>> + if (!ipi_bitmap)
>> + continue;
>> +
>> + cpu = find_first_bit((void *)&ipi_bitmap, BITS_PER_LONG);
>> + while (cpu < BITS_PER_LONG) {
>> + dest = kvm_get_vcpu_by_cpuid(vcpu->kvm, cpu + min);
>> + cpu = find_next_bit((void *)&ipi_bitmap, BITS_PER_LONG,
>> + cpu + 1);
>> + if (!dest)
>> + continue;
>> +
>> + /*
>> + * Send SWI0 to dest vcpu to emulate IPI interrupt
>> + */
>> + kvm_queue_irq(dest, INT_SWI0);
>> + kvm_vcpu_kick(dest);
>> + }
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * hypercall emulation always return to guest, Caller should check retval.
>> + */
>> +static void kvm_handle_pv_service(struct kvm_vcpu *vcpu)
>> +{
>> + unsigned long func = vcpu->arch.gprs[LOONGARCH_GPR_A0];
>> + long ret;
>> +
>> + switch (func) {
>> + case KVM_HCALL_FUNC_PV_IPI:
>> + kvm_pv_send_ipi(vcpu);
>> + ret = KVM_HCALL_STATUS_SUCCESS;
>> + break;
>> + default:
>> + ret = KVM_HCALL_INVALID_CODE;
>> + break;
>> + };
>> +
>> + vcpu->arch.gprs[LOONGARCH_GPR_A0] = ret;
>> +}
>> +
>> static int kvm_handle_hypercall(struct kvm_vcpu *vcpu)
>> {
>> + larch_inst inst;
>> + unsigned int code;
>> +
>> + inst.word = vcpu->arch.badi;
>> + code = inst.reg0i15_format.immediate;
>> update_pc(&vcpu->arch);
>>
>> - /* Treat it as noop intruction, only set return value */
>> - vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HCALL_INVALID_CODE;
>> + switch (code) {
>> + case KVM_HCALL_PV_SERVICE:
>> + vcpu->stat.hypercall_exits++;
>> + kvm_handle_pv_service(vcpu);
>> + break;
>> + default:
>> + /* Treat it as noop intruction, only set return value */
>> + vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HCALL_INVALID_CODE;
>> + break;
>> + }
>> +
>> return RESUME_GUEST;
>> }
>>
>> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
>> index 40296d8ef297..24fd5e4647f3 100644
>> --- a/arch/loongarch/kvm/vcpu.c
>> +++ b/arch/loongarch/kvm/vcpu.c
>> @@ -19,6 +19,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
>> STATS_DESC_COUNTER(VCPU, idle_exits),
>> STATS_DESC_COUNTER(VCPU, cpucfg_exits),
>> STATS_DESC_COUNTER(VCPU, signal_exits),
>> + STATS_DESC_COUNTER(VCPU, hypercall_exits)
>> };
>>
>> const struct kvm_stats_header kvm_vcpu_stats_header = {
>> --
>> 2.39.3
>>
>>
Hi,
On 2/26/24 10:04, maobibo wrote:
> On 2024/2/24 下午5:13, Huacai Chen wrote:
>> Hi, Bibo,
>>
>> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]> wrote:
>>>
>>> Instruction cpucfg can be used to get processor features. And there
>>> is trap exception when it is executed in VM mode, and also it is
>>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
>>> is used. Here one specified area 0x40000000 -- 0x400000ff is used
>>> for KVM hypervisor to privide PV features, and the area can be extended
>>> for other hypervisors in future. This area will never be used for
>>> real HW, it is only used by software.
>> After reading and thinking, I find that the hypercall method which is
>> used in our productive kernel is better than this cpucfg method.
>> Because hypercall is more simple and straightforward, plus we don't
>> worry about conflicting with the real hardware.
> No, I do not think so. cpucfg is simper than hypercall, hypercall can
> be in effect when system runs in guest mode. In some scenario like TCG
> mode, hypercall is illegal intruction, however cpucfg can work.
While the CPUCFG instruction is universally available, it's also
unprivileged, so any additional CPUCFG behavior also automatically
becomes UAPI, which likely isn't what you expect. Hypervisor
implementation details shouldn't be leaked to userland because it has no
reason to care -- even though userland learns about the capabilities, it
cannot actually access the resources, because relevant CSRs and/or
instructions are privileged. Worse, the unnecessary exposure of
information could be a problem security-wise.
A possible way to preserve the unprivileged CPUCFG behavior would be
acting differently based on guest CSR.CRMD.PLV: only returning data for
the new configuration space when guest is not in PLV3. But this behavior
isn't explicitly allowed nor disallowed in the LoongArch manuals, and is
in my opinion unnecessarily complex.
And regarding the lack of hypcall support from QEMU system mode
emulation on TCG, I'd argue it's simply a matter of adding support in
target/loongarch64. This would be attractive because it will enable easy
development and testing of hypervisor software with QEMU -- both locally
and in CI.
> Extioi virtualization extension will be added later, cpucfg can be
> used to get extioi features. It is unlikely that extioi driver depends
> on PARA_VIRT macro if hypercall is used to get features.
And the EXTIOI feature too isn't something usable from unprivileged
code, so I don't think it will affect the conclusions above.
--
WANG "xen0n" Xuerui
Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/
On Mon, Feb 26, 2024 at 10:04 AM maobibo <[email protected]> wrote:
>
>
>
> On 2024/2/24 下午5:13, Huacai Chen wrote:
> > Hi, Bibo,
> >
> > On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]> wrote:
> >>
> >> Instruction cpucfg can be used to get processor features. And there
> >> is trap exception when it is executed in VM mode, and also it is
> >> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
> >> is used. Here one specified area 0x40000000 -- 0x400000ff is used
> >> for KVM hypervisor to privide PV features, and the area can be extended
> >> for other hypervisors in future. This area will never be used for
> >> real HW, it is only used by software.
> > After reading and thinking, I find that the hypercall method which is
> > used in our productive kernel is better than this cpucfg method.
> > Because hypercall is more simple and straightforward, plus we don't
> > worry about conflicting with the real hardware.
> No, I do not think so. cpucfg is simper than hypercall, hypercall can
> be in effect when system runs in guest mode. In some scenario like TCG
> mode, hypercall is illegal intruction, however cpucfg can work.
Nearly all architectures use hypercall except x86 for its historical
reasons. If we use CPUCFG, then the hypervisor information is
unnecessarily leaked to userspace, and this may be a security issue.
Meanwhile, I don't think TCG mode needs PV features.
I consulted with Jiaxun before, and maybe he can give some more comments.
>
> Extioi virtualization extension will be added later, cpucfg can be used
> to get extioi features. It is unlikely that extioi driver depends on
> PARA_VIRT macro if hypercall is used to get features.
CPUCFG is per-core information, if we really need something about
extioi, it should be in iocsr (LOONGARCH_IOCSR_FEATURES).
Huacai
>
> Regards
> Bibo Mao
>
> >
> > Huacai
> >
> >>
> >> Signed-off-by: Bibo Mao <[email protected]>
> >> ---
> >> arch/loongarch/include/asm/inst.h | 1 +
> >> arch/loongarch/include/asm/loongarch.h | 10 ++++++
> >> arch/loongarch/kvm/exit.c | 46 +++++++++++++++++---------
> >> 3 files changed, 41 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
> >> index d8f637f9e400..ad120f924905 100644
> >> --- a/arch/loongarch/include/asm/inst.h
> >> +++ b/arch/loongarch/include/asm/inst.h
> >> @@ -67,6 +67,7 @@ enum reg2_op {
> >> revhd_op = 0x11,
> >> extwh_op = 0x16,
> >> extwb_op = 0x17,
> >> + cpucfg_op = 0x1b,
> >> iocsrrdb_op = 0x19200,
> >> iocsrrdh_op = 0x19201,
> >> iocsrrdw_op = 0x19202,
> >> diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
> >> index 46366e783c84..a1d22e8b6f94 100644
> >> --- a/arch/loongarch/include/asm/loongarch.h
> >> +++ b/arch/loongarch/include/asm/loongarch.h
> >> @@ -158,6 +158,16 @@
> >> #define CPUCFG48_VFPU_CG BIT(2)
> >> #define CPUCFG48_RAM_CG BIT(3)
> >>
> >> +/*
> >> + * cpucfg index area: 0x40000000 -- 0x400000ff
> >> + * SW emulation for KVM hypervirsor
> >> + */
> >> +#define CPUCFG_KVM_BASE 0x40000000UL
> >> +#define CPUCFG_KVM_SIZE 0x100
> >> +#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
> >> +#define KVM_SIGNATURE "KVM\0"
> >> +#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
> >> +
> >> #ifndef __ASSEMBLY__
> >>
> >> /* CSR */
> >> diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
> >> index 923bbca9bd22..6a38fd59d86d 100644
> >> --- a/arch/loongarch/kvm/exit.c
> >> +++ b/arch/loongarch/kvm/exit.c
> >> @@ -206,10 +206,37 @@ int kvm_emu_idle(struct kvm_vcpu *vcpu)
> >> return EMULATE_DONE;
> >> }
> >>
> >> -static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
> >> +static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
> >> {
> >> int rd, rj;
> >> unsigned int index;
> >> +
> >> + rd = inst.reg2_format.rd;
> >> + rj = inst.reg2_format.rj;
> >> + ++vcpu->stat.cpucfg_exits;
> >> + index = vcpu->arch.gprs[rj];
> >> +
> >> + /*
> >> + * By LoongArch Reference Manual 2.2.10.5
> >> + * Return value is 0 for undefined cpucfg index
> >> + */
> >> + switch (index) {
> >> + case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
> >> + vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
> >> + break;
> >> + case CPUCFG_KVM_SIG:
> >> + vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
> >> + break;
> >> + default:
> >> + vcpu->arch.gprs[rd] = 0;
> >> + break;
> >> + }
> >> +
> >> + return EMULATE_DONE;
> >> +}
> >> +
> >> +static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
> >> +{
> >> unsigned long curr_pc;
> >> larch_inst inst;
> >> enum emulation_result er = EMULATE_DONE;
> >> @@ -224,21 +251,8 @@ static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
> >> er = EMULATE_FAIL;
> >> switch (((inst.word >> 24) & 0xff)) {
> >> case 0x0: /* CPUCFG GSPR */
> >> - if (inst.reg2_format.opcode == 0x1B) {
> >> - rd = inst.reg2_format.rd;
> >> - rj = inst.reg2_format.rj;
> >> - ++vcpu->stat.cpucfg_exits;
> >> - index = vcpu->arch.gprs[rj];
> >> - er = EMULATE_DONE;
> >> - /*
> >> - * By LoongArch Reference Manual 2.2.10.5
> >> - * return value is 0 for undefined cpucfg index
> >> - */
> >> - if (index < KVM_MAX_CPUCFG_REGS)
> >> - vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
> >> - else
> >> - vcpu->arch.gprs[rd] = 0;
> >> - }
> >> + if (inst.reg2_format.opcode == cpucfg_op)
> >> + er = kvm_emu_cpucfg(vcpu, inst);
> >> break;
> >> case 0x4: /* CSR{RD,WR,XCHG} GSPR */
> >> er = kvm_handle_csr(vcpu, inst);
> >> --
> >> 2.39.3
> >>
>
>
On 2024/2/26 下午1:25, WANG Xuerui wrote:
> Hi,
>
> On 2/26/24 10:04, maobibo wrote:
>> On 2024/2/24 下午5:13, Huacai Chen wrote:
>>> Hi, Bibo,
>>>
>>> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]> wrote:
>>>>
>>>> Instruction cpucfg can be used to get processor features. And there
>>>> is trap exception when it is executed in VM mode, and also it is
>>>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
>>>> is used. Here one specified area 0x40000000 -- 0x400000ff is used
>>>> for KVM hypervisor to privide PV features, and the area can be extended
>>>> for other hypervisors in future. This area will never be used for
>>>> real HW, it is only used by software.
>>> After reading and thinking, I find that the hypercall method which is
>>> used in our productive kernel is better than this cpucfg method.
>>> Because hypercall is more simple and straightforward, plus we don't
>>> worry about conflicting with the real hardware.
>> No, I do not think so. cpucfg is simper than hypercall, hypercall can
>> be in effect when system runs in guest mode. In some scenario like TCG
>> mode, hypercall is illegal intruction, however cpucfg can work.
>
> While the CPUCFG instruction is universally available, it's also
> unprivileged, so any additional CPUCFG behavior also automatically
> becomes UAPI, which likely isn't what you expect. Hypervisor
> implementation details shouldn't be leaked to userland because it has no
> reason to care -- even though userland learns about the capabilities, it
> cannot actually access the resources, because relevant CSRs and/or
> instructions are privileged. Worse, the unnecessary exposure of
> information could be a problem security-wise.
cpucfg is read-only and used to represent current hw cpu features,
why do you think there is security issue? Is there security issue about
cpucfg2 and cpucfg6 since it can be accessed in user space also?
PMU feature is defined in cpucfg6, PMU driver is written in kernel mode.
>
> A possible way to preserve the unprivileged CPUCFG behavior would be
> acting differently based on guest CSR.CRMD.PLV: only returning data for
> the new configuration space when guest is not in PLV3. But this behavior
> isn't explicitly allowed nor disallowed in the LoongArch manuals, and is
> in my opinion unnecessarily complex.
>
> And regarding the lack of hypcall support from QEMU system mode
> emulation on TCG, I'd argue it's simply a matter of adding support in
> target/loongarch64. This would be attractive because it will enable easy
> development and testing of hypervisor software with QEMU -- both locally
> and in CI.
Hypercall is part of hardware assisted virtualization LVZ, do you think
only adding hypercall instruction withou LVZ is possible?
>
>> Extioi virtualization extension will be added later, cpucfg can be
>> used to get extioi features. It is unlikely that extioi driver depends
>> on PARA_VIRT macro if hypercall is used to get features.
> And the EXTIOI feature too isn't something usable from unprivileged
> code, so I don't think it will affect the conclusions above.
Sorry, I do not know what do you mean.
Regards
Bibo Mao
>
On 2/26/24 16:00, maobibo wrote:
>
>
> On 2024/2/26 下午1:25, WANG Xuerui wrote:
>> Hi,
>>
>> On 2/26/24 10:04, maobibo wrote:
>>> On 2024/2/24 下午5:13, Huacai Chen wrote:
>>>> Hi, Bibo,
>>>>
>>>> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]> wrote:
>>>>>
>>>>> Instruction cpucfg can be used to get processor features. And there
>>>>> is trap exception when it is executed in VM mode, and also it is
>>>>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
>>>>> is used. Here one specified area 0x40000000 -- 0x400000ff is used
>>>>> for KVM hypervisor to privide PV features, and the area can be
>>>>> extended
>>>>> for other hypervisors in future. This area will never be used for
>>>>> real HW, it is only used by software.
>>>> After reading and thinking, I find that the hypercall method which is
>>>> used in our productive kernel is better than this cpucfg method.
>>>> Because hypercall is more simple and straightforward, plus we don't
>>>> worry about conflicting with the real hardware.
>>> No, I do not think so. cpucfg is simper than hypercall, hypercall can
>>> be in effect when system runs in guest mode. In some scenario like
>>> TCG mode, hypercall is illegal intruction, however cpucfg can work.
>>
>> While the CPUCFG instruction is universally available, it's also
>> unprivileged, so any additional CPUCFG behavior also automatically
>> becomes UAPI, which likely isn't what you expect. Hypervisor
>> implementation details shouldn't be leaked to userland because it has
>> no reason to care -- even though userland learns about the
>> capabilities, it cannot actually access the resources, because
>> relevant CSRs and/or instructions are privileged. Worse, the
>> unnecessary exposure of information could be a problem security-wise.
> cpucfg is read-only and used to represent current hw cpu features,
> why do you think there is security issue? Is there security issue about
> cpucfg2 and cpucfg6 since it can be accessed in user space also?
>
> PMU feature is defined in cpucfg6, PMU driver is written in kernel mode.
These CPUCFG leaves were added before existence of LoongArch were
publicized, without community review. If early drafts of the manual were
available to community reviewers, at least I would strongly NAK it.
>>
>> A possible way to preserve the unprivileged CPUCFG behavior would be
>> acting differently based on guest CSR.CRMD.PLV: only returning data
>> for the new configuration space when guest is not in PLV3. But this
>> behavior isn't explicitly allowed nor disallowed in the LoongArch
>> manuals, and is in my opinion unnecessarily complex.
>>
>> And regarding the lack of hypcall support from QEMU system mode
>> emulation on TCG, I'd argue it's simply a matter of adding support in
>> target/loongarch64. This would be attractive because it will enable
>> easy development and testing of hypervisor software with QEMU -- both
>> locally and in CI.
> Hypercall is part of hardware assisted virtualization LVZ, do you think
> only adding hypercall instruction withou LVZ is possible?
I cannot comment on the actual feasibility of doing so, because I don't
have access to the LVZ manuals which *still* isn't publicly available.
But from my intuition it should be a more-or-less trivial processor mode
transition like with syscall -- whether that's indeed the case I can't
(dis)prove.
>>> Extioi virtualization extension will be added later, cpucfg can be
>>> used to get extioi features. It is unlikely that extioi driver
>>> depends on PARA_VIRT macro if hypercall is used to get features.
>> And the EXTIOI feature too isn't something usable from unprivileged
>> code, so I don't think it will affect the conclusions above.
> Sorry, I do not know what do you mean.
I was just saying this example provided no additional information at
least for me -- while it's appreciated that you informed the community
of your intended future use case, like what I stated in the first
paragraph in my reply, it looked essentially the same because both PV
and EXTIOI are privileged things.
--
WANG "xen0n" Xuerui
Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/
On 2024/2/26 下午2:12, Huacai Chen wrote:
> On Mon, Feb 26, 2024 at 10:04 AM maobibo <[email protected]> wrote:
>>
>>
>>
>> On 2024/2/24 下午5:13, Huacai Chen wrote:
>>> Hi, Bibo,
>>>
>>> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]> wrote:
>>>>
>>>> Instruction cpucfg can be used to get processor features. And there
>>>> is trap exception when it is executed in VM mode, and also it is
>>>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
>>>> is used. Here one specified area 0x40000000 -- 0x400000ff is used
>>>> for KVM hypervisor to privide PV features, and the area can be extended
>>>> for other hypervisors in future. This area will never be used for
>>>> real HW, it is only used by software.
>>> After reading and thinking, I find that the hypercall method which is
>>> used in our productive kernel is better than this cpucfg method.
>>> Because hypercall is more simple and straightforward, plus we don't
>>> worry about conflicting with the real hardware.
>> No, I do not think so. cpucfg is simper than hypercall, hypercall can
>> be in effect when system runs in guest mode. In some scenario like TCG
>> mode, hypercall is illegal intruction, however cpucfg can work.
> Nearly all architectures use hypercall except x86 for its historical
Only x86 support multiple hypervisors and there is multiple hypervisor
in x86 only. It is an advantage, not historical reason.
> reasons. If we use CPUCFG, then the hypervisor information is
> unnecessarily leaked to userspace, and this may be a security issue.
> Meanwhile, I don't think TCG mode needs PV features.
Besides PV features, there is other features different with real hw such
as virtio device, virtual interrupt controller.
Regards
Bibo Mao
>
> I consulted with Jiaxun before, and maybe he can give some more comments.
>
>>
>> Extioi virtualization extension will be added later, cpucfg can be used
>> to get extioi features. It is unlikely that extioi driver depends on
>> PARA_VIRT macro if hypercall is used to get features.
> CPUCFG is per-core information, if we really need something about
> extioi, it should be in iocsr (LOONGARCH_IOCSR_FEATURES).
>
>
> Huacai
>
>>
>> Regards
>> Bibo Mao
>>
>>>
>>> Huacai
>>>
>>>>
>>>> Signed-off-by: Bibo Mao <[email protected]>
>>>> ---
>>>> arch/loongarch/include/asm/inst.h | 1 +
>>>> arch/loongarch/include/asm/loongarch.h | 10 ++++++
>>>> arch/loongarch/kvm/exit.c | 46 +++++++++++++++++---------
>>>> 3 files changed, 41 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
>>>> index d8f637f9e400..ad120f924905 100644
>>>> --- a/arch/loongarch/include/asm/inst.h
>>>> +++ b/arch/loongarch/include/asm/inst.h
>>>> @@ -67,6 +67,7 @@ enum reg2_op {
>>>> revhd_op = 0x11,
>>>> extwh_op = 0x16,
>>>> extwb_op = 0x17,
>>>> + cpucfg_op = 0x1b,
>>>> iocsrrdb_op = 0x19200,
>>>> iocsrrdh_op = 0x19201,
>>>> iocsrrdw_op = 0x19202,
>>>> diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
>>>> index 46366e783c84..a1d22e8b6f94 100644
>>>> --- a/arch/loongarch/include/asm/loongarch.h
>>>> +++ b/arch/loongarch/include/asm/loongarch.h
>>>> @@ -158,6 +158,16 @@
>>>> #define CPUCFG48_VFPU_CG BIT(2)
>>>> #define CPUCFG48_RAM_CG BIT(3)
>>>>
>>>> +/*
>>>> + * cpucfg index area: 0x40000000 -- 0x400000ff
>>>> + * SW emulation for KVM hypervirsor
>>>> + */
>>>> +#define CPUCFG_KVM_BASE 0x40000000UL
>>>> +#define CPUCFG_KVM_SIZE 0x100
>>>> +#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
>>>> +#define KVM_SIGNATURE "KVM\0"
>>>> +#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
>>>> +
>>>> #ifndef __ASSEMBLY__
>>>>
>>>> /* CSR */
>>>> diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
>>>> index 923bbca9bd22..6a38fd59d86d 100644
>>>> --- a/arch/loongarch/kvm/exit.c
>>>> +++ b/arch/loongarch/kvm/exit.c
>>>> @@ -206,10 +206,37 @@ int kvm_emu_idle(struct kvm_vcpu *vcpu)
>>>> return EMULATE_DONE;
>>>> }
>>>>
>>>> -static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>>>> +static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
>>>> {
>>>> int rd, rj;
>>>> unsigned int index;
>>>> +
>>>> + rd = inst.reg2_format.rd;
>>>> + rj = inst.reg2_format.rj;
>>>> + ++vcpu->stat.cpucfg_exits;
>>>> + index = vcpu->arch.gprs[rj];
>>>> +
>>>> + /*
>>>> + * By LoongArch Reference Manual 2.2.10.5
>>>> + * Return value is 0 for undefined cpucfg index
>>>> + */
>>>> + switch (index) {
>>>> + case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
>>>> + vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
>>>> + break;
>>>> + case CPUCFG_KVM_SIG:
>>>> + vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
>>>> + break;
>>>> + default:
>>>> + vcpu->arch.gprs[rd] = 0;
>>>> + break;
>>>> + }
>>>> +
>>>> + return EMULATE_DONE;
>>>> +}
>>>> +
>>>> +static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>>>> +{
>>>> unsigned long curr_pc;
>>>> larch_inst inst;
>>>> enum emulation_result er = EMULATE_DONE;
>>>> @@ -224,21 +251,8 @@ static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>>>> er = EMULATE_FAIL;
>>>> switch (((inst.word >> 24) & 0xff)) {
>>>> case 0x0: /* CPUCFG GSPR */
>>>> - if (inst.reg2_format.opcode == 0x1B) {
>>>> - rd = inst.reg2_format.rd;
>>>> - rj = inst.reg2_format.rj;
>>>> - ++vcpu->stat.cpucfg_exits;
>>>> - index = vcpu->arch.gprs[rj];
>>>> - er = EMULATE_DONE;
>>>> - /*
>>>> - * By LoongArch Reference Manual 2.2.10.5
>>>> - * return value is 0 for undefined cpucfg index
>>>> - */
>>>> - if (index < KVM_MAX_CPUCFG_REGS)
>>>> - vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
>>>> - else
>>>> - vcpu->arch.gprs[rd] = 0;
>>>> - }
>>>> + if (inst.reg2_format.opcode == cpucfg_op)
>>>> + er = kvm_emu_cpucfg(vcpu, inst);
>>>> break;
>>>> case 0x4: /* CSR{RD,WR,XCHG} GSPR */
>>>> er = kvm_handle_csr(vcpu, inst);
>>>> --
>>>> 2.39.3
>>>>
>>
>>
在2024年2月26日二月 上午8:04,maobibo写道:
> On 2024/2/26 下午2:12, Huacai Chen wrote:
>> On Mon, Feb 26, 2024 at 10:04 AM maobibo <[email protected]> wrote:
>>>
>>>
>>>
>>> On 2024/2/24 下午5:13, Huacai Chen wrote:
>>>> Hi, Bibo,
>>>>
>>>> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <maobibo@loongsoncn> wrote:
>>>>>
>>>>> Instruction cpucfg can be used to get processor features. And there
>>>>> is trap exception when it is executed in VM mode, and also it is
>>>>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
>>>>> is used. Here one specified area 0x40000000 -- 0x400000ff is used
>>>>> for KVM hypervisor to privide PV features, and the area can be extended
>>>>> for other hypervisors in future. This area will never be used for
>>>>> real HW, it is only used by software.
>>>> After reading and thinking, I find that the hypercall method which is
>>>> used in our productive kernel is better than this cpucfg method.
>>>> Because hypercall is more simple and straightforward, plus we don't
>>>> worry about conflicting with the real hardware.
>>> No, I do not think so. cpucfg is simper than hypercall, hypercall can
>>> be in effect when system runs in guest mode. In some scenario like TCG
>>> mode, hypercall is illegal intruction, however cpucfg can work.
>> Nearly all architectures use hypercall except x86 for its historical
> Only x86 support multiple hypervisors and there is multiple hypervisor
> in x86 only. It is an advantage, not historical reason.
I do believe that all those stuff should not be exposed to guest user space
for security reasons.
Also for different implementations of hypervisors they may have different
PV features behavior, using hypcall to perform feature detection
can pass more information to help us cope with hypervisor diversity.
>
>> reasons. If we use CPUCFG, then the hypervisor information is
>> unnecessarily leaked to userspace, and this may be a security issue.
>> Meanwhile, I don't think TCG mode needs PV features.
> Besides PV features, there is other features different with real hw such
> as virtio device, virtual interrupt controller.
Those are *device* level information, they must be passed in firmware
interfaces to keep processor emulation sane.
Thanks
>
> Regards
> Bibo Mao
>
>>
>> I consulted with Jiaxun before, and maybe he can give some more comments.
>>
>>>
>>> Extioi virtualization extension will be added later, cpucfg can be used
>>> to get extioi features. It is unlikely that extioi driver depends on
>>> PARA_VIRT macro if hypercall is used to get features.
>> CPUCFG is per-core information, if we really need something about
>> extioi, it should be in iocsr (LOONGARCH_IOCSR_FEATURES).
>>
>>
>> Huacai
>>
>>>
>>> Regards
>>> Bibo Mao
>>>
>>>>
>>>> Huacai
>>>>
>>>>>
>>>>> Signed-off-by: Bibo Mao <[email protected]>
>>>>> ---
>>>>> arch/loongarch/include/asm/inst.h | 1 +
>>>>> arch/loongarch/include/asm/loongarch.h | 10 ++++++
>>>>> arch/loongarch/kvm/exit.c | 46 +++++++++++++++++---------
>>>>> 3 files changed, 41 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
>>>>> index d8f637f9e400..ad120f924905 100644
>>>>> --- a/arch/loongarch/include/asm/inst.h
>>>>> +++ b/arch/loongarch/include/asm/inst.h
>>>>> @@ -67,6 +67,7 @@ enum reg2_op {
>>>>> revhd_op = 0x11,
>>>>> extwh_op = 0x16,
>>>>> extwb_op = 0x17,
>>>>> + cpucfg_op = 0x1b,
>>>>> iocsrrdb_op = 0x19200,
>>>>> iocsrrdh_op = 0x19201,
>>>>> iocsrrdw_op = 0x19202,
>>>>> diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
>>>>> index 46366e783c84..a1d22e8b6f94 100644
>>>>> --- a/arch/loongarch/include/asm/loongarch.h
>>>>> +++ b/arch/loongarch/include/asm/loongarch.h
>>>>> @@ -158,6 +158,16 @@
>>>>> #define CPUCFG48_VFPU_CG BIT(2)
>>>>> #define CPUCFG48_RAM_CG BIT(3)
>>>>>
>>>>> +/*
>>>>> + * cpucfg index area: 0x40000000 -- 0x400000ff
>>>>> + * SW emulation for KVM hypervirsor
>>>>> + */
>>>>> +#define CPUCFG_KVM_BASE 0x40000000UL
>>>>> +#define CPUCFG_KVM_SIZE 0x100
>>>>> +#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
>>>>> +#define KVM_SIGNATURE "KVM\0"
>>>>> +#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
>>>>> +
>>>>> #ifndef __ASSEMBLY__
>>>>>
>>>>> /* CSR */
>>>>> diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
>>>>> index 923bbca9bd22..6a38fd59d86d 100644
>>>>> --- a/arch/loongarch/kvm/exit.c
>>>>> +++ b/arch/loongarch/kvm/exit.c
>>>>> @@ -206,10 +206,37 @@ int kvm_emu_idle(struct kvm_vcpu *vcpu)
>>>>> return EMULATE_DONE;
>>>>> }
>>>>>
>>>>> -static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>>>>> +static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
>>>>> {
>>>>> int rd, rj;
>>>>> unsigned int index;
>>>>> +
>>>>> + rd = inst.reg2_format.rd;
>>>>> + rj = inst.reg2_format.rj;
>>>>> + ++vcpu->stat.cpucfg_exits;
>>>>> + index = vcpu->arch.gprs[rj];
>>>>> +
>>>>> + /*
>>>>> + * By LoongArch Reference Manual 2.2.10.5
>>>>> + * Return value is 0 for undefined cpucfg index
>>>>> + */
>>>>> + switch (index) {
>>>>> + case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
>>>>> + vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
>>>>> + break;
>>>>> + case CPUCFG_KVM_SIG:
>>>>> + vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
>>>>> + break;
>>>>> + default:
>>>>> + vcpu->arch.gprs[rd] = 0;
>>>>> + break;
>>>>> + }
>>>>> +
>>>>> + return EMULATE_DONE;
>>>>> +}
>>>>> +
>>>>> +static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>>>>> +{
>>>>> unsigned long curr_pc;
>>>>> larch_inst inst;
>>>>> enum emulation_result er = EMULATE_DONE;
>>>>> @@ -224,21 +251,8 @@ static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>>>>> er = EMULATE_FAIL;
>>>>> switch (((inst.word >> 24) & 0xff)) {
>>>>> case 0x0: /* CPUCFG GSPR */
>>>>> - if (inst.reg2_format.opcode == 0x1B) {
>>>>> - rd = inst.reg2_format.rd;
>>>>> - rj = inst.reg2_format.rj;
>>>>> - ++vcpu->stat.cpucfg_exits;
>>>>> - index = vcpu->arch.gprs[rj];
>>>>> - er = EMULATE_DONE;
>>>>> - /*
>>>>> - * By LoongArch Reference Manual 2.2.10.5
>>>>> - * return value is 0 for undefined cpucfg index
>>>>> - */
>>>>> - if (index < KVM_MAX_CPUCFG_REGS)
>>>>> - vcpu->arch.gprs[rd] = vcpu->archcpucfg[index];
>>>>> - else
>>>>> - vcpu->arch.gprs[rd] = 0;
>>>>> - }
>>>>> + if (inst.reg2_format.opcode == cpucfg_op)
>>>>> + er = kvm_emu_cpucfg(vcpu, inst);
>>>>> break;
>>>>> case 0x4: /* CSR{RD,WR,XCHG} GSPR */
>>>>> er = kvm_handle_csr(vcpu, inst);
>>>>> --
>>>>> 2.39.3
>>>>>
>>>
>>>
--
- Jiaxun
On 2024/2/27 上午4:02, Jiaxun Yang wrote:
>
>
> 在2024年2月26日二月 上午8:04,maobibo写道:
>> On 2024/2/26 下午2:12, Huacai Chen wrote:
>>> On Mon, Feb 26, 2024 at 10:04 AM maobibo <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On 2024/2/24 下午5:13, Huacai Chen wrote:
>>>>> Hi, Bibo,
>>>>>
>>>>> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]> wrote:
>>>>>>
>>>>>> Instruction cpucfg can be used to get processor features. And there
>>>>>> is trap exception when it is executed in VM mode, and also it is
>>>>>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
>>>>>> is used. Here one specified area 0x40000000 -- 0x400000ff is used
>>>>>> for KVM hypervisor to privide PV features, and the area can be extended
>>>>>> for other hypervisors in future. This area will never be used for
>>>>>> real HW, it is only used by software.
>>>>> After reading and thinking, I find that the hypercall method which is
>>>>> used in our productive kernel is better than this cpucfg method.
>>>>> Because hypercall is more simple and straightforward, plus we don't
>>>>> worry about conflicting with the real hardware.
>>>> No, I do not think so. cpucfg is simper than hypercall, hypercall can
>>>> be in effect when system runs in guest mode. In some scenario like TCG
>>>> mode, hypercall is illegal intruction, however cpucfg can work.
>>> Nearly all architectures use hypercall except x86 for its historical
>> Only x86 support multiple hypervisors and there is multiple hypervisor
>> in x86 only. It is an advantage, not historical reason.
>
> I do believe that all those stuff should not be exposed to guest user space
> for security reasons.
Can you add PLV checking when cpucfg 0x40000000-0x400000FF is emulated?
if it is user mode return value is zero and it is kernel mode emulated
value will be returned. It can avoid information leaking.
>
> Also for different implementations of hypervisors they may have different
> PV features behavior, using hypcall to perform feature detection
> can pass more information to help us cope with hypervisor diversity.
How do different hypervisors can be detected firstly? On x86 MSR is
used for all hypervisors detection and on ARM64 hyperv used
acpi_gbl_FADT and kvm use smc forcely, host mode can execute smc
instruction without exception on ARM64.
I do not know why hypercall is better than cpucfg on LoongArch, cpucfg
is basic intruction however hypercall is not, it is part of LVZ feature.
>>
>>> reasons. If we use CPUCFG, then the hypervisor information is
>>> unnecessarily leaked to userspace, and this may be a security issue.
>>> Meanwhile, I don't think TCG mode needs PV features.
>> Besides PV features, there is other features different with real hw such
>> as virtio device, virtual interrupt controller.
>
> Those are *device* level information, they must be passed in firmware
> interfaces to keep processor emulation sane.
File arch/x86/hyperv/hv_apic.c can be referenced, apic features comes
from ms_hyperv.hints and HYPERV_CPUID_ENLIGHTMENT_INFO cpuid info, not
must be passed by firmware interface.
Regards
Bibo Mao
>
> Thanks
>
>>
>> Regards
>> Bibo Mao
>>
>>>
>>> I consulted with Jiaxun before, and maybe he can give some more comments.
>>>
>>>>
>>>> Extioi virtualization extension will be added later, cpucfg can be used
>>>> to get extioi features. It is unlikely that extioi driver depends on
>>>> PARA_VIRT macro if hypercall is used to get features.
>>> CPUCFG is per-core information, if we really need something about
>>> extioi, it should be in iocsr (LOONGARCH_IOCSR_FEATURES).
>>>
>>>
>>> Huacai
>>>
>>>>
>>>> Regards
>>>> Bibo Mao
>>>>
>>>>>
>>>>> Huacai
>>>>>
>>>>>>
>>>>>> Signed-off-by: Bibo Mao <[email protected]>
>>>>>> ---
>>>>>> arch/loongarch/include/asm/inst.h | 1 +
>>>>>> arch/loongarch/include/asm/loongarch.h | 10 ++++++
>>>>>> arch/loongarch/kvm/exit.c | 46 +++++++++++++++++---------
>>>>>> 3 files changed, 41 insertions(+), 16 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
>>>>>> index d8f637f9e400..ad120f924905 100644
>>>>>> --- a/arch/loongarch/include/asm/inst.h
>>>>>> +++ b/arch/loongarch/include/asm/inst.h
>>>>>> @@ -67,6 +67,7 @@ enum reg2_op {
>>>>>> revhd_op = 0x11,
>>>>>> extwh_op = 0x16,
>>>>>> extwb_op = 0x17,
>>>>>> + cpucfg_op = 0x1b,
>>>>>> iocsrrdb_op = 0x19200,
>>>>>> iocsrrdh_op = 0x19201,
>>>>>> iocsrrdw_op = 0x19202,
>>>>>> diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
>>>>>> index 46366e783c84..a1d22e8b6f94 100644
>>>>>> --- a/arch/loongarch/include/asm/loongarch.h
>>>>>> +++ b/arch/loongarch/include/asm/loongarch.h
>>>>>> @@ -158,6 +158,16 @@
>>>>>> #define CPUCFG48_VFPU_CG BIT(2)
>>>>>> #define CPUCFG48_RAM_CG BIT(3)
>>>>>>
>>>>>> +/*
>>>>>> + * cpucfg index area: 0x40000000 -- 0x400000ff
>>>>>> + * SW emulation for KVM hypervirsor
>>>>>> + */
>>>>>> +#define CPUCFG_KVM_BASE 0x40000000UL
>>>>>> +#define CPUCFG_KVM_SIZE 0x100
>>>>>> +#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
>>>>>> +#define KVM_SIGNATURE "KVM\0"
>>>>>> +#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
>>>>>> +
>>>>>> #ifndef __ASSEMBLY__
>>>>>>
>>>>>> /* CSR */
>>>>>> diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
>>>>>> index 923bbca9bd22..6a38fd59d86d 100644
>>>>>> --- a/arch/loongarch/kvm/exit.c
>>>>>> +++ b/arch/loongarch/kvm/exit.c
>>>>>> @@ -206,10 +206,37 @@ int kvm_emu_idle(struct kvm_vcpu *vcpu)
>>>>>> return EMULATE_DONE;
>>>>>> }
>>>>>>
>>>>>> -static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>>>>>> +static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
>>>>>> {
>>>>>> int rd, rj;
>>>>>> unsigned int index;
>>>>>> +
>>>>>> + rd = inst.reg2_format.rd;
>>>>>> + rj = inst.reg2_format.rj;
>>>>>> + ++vcpu->stat.cpucfg_exits;
>>>>>> + index = vcpu->arch.gprs[rj];
>>>>>> +
>>>>>> + /*
>>>>>> + * By LoongArch Reference Manual 2.2.10.5
>>>>>> + * Return value is 0 for undefined cpucfg index
>>>>>> + */
>>>>>> + switch (index) {
>>>>>> + case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
>>>>>> + vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
>>>>>> + break;
>>>>>> + case CPUCFG_KVM_SIG:
>>>>>> + vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
>>>>>> + break;
>>>>>> + default:
>>>>>> + vcpu->arch.gprs[rd] = 0;
>>>>>> + break;
>>>>>> + }
>>>>>> +
>>>>>> + return EMULATE_DONE;
>>>>>> +}
>>>>>> +
>>>>>> +static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> unsigned long curr_pc;
>>>>>> larch_inst inst;
>>>>>> enum emulation_result er = EMULATE_DONE;
>>>>>> @@ -224,21 +251,8 @@ static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>>>>>> er = EMULATE_FAIL;
>>>>>> switch (((inst.word >> 24) & 0xff)) {
>>>>>> case 0x0: /* CPUCFG GSPR */
>>>>>> - if (inst.reg2_format.opcode == 0x1B) {
>>>>>> - rd = inst.reg2_format.rd;
>>>>>> - rj = inst.reg2_format.rj;
>>>>>> - ++vcpu->stat.cpucfg_exits;
>>>>>> - index = vcpu->arch.gprs[rj];
>>>>>> - er = EMULATE_DONE;
>>>>>> - /*
>>>>>> - * By LoongArch Reference Manual 2.2.10.5
>>>>>> - * return value is 0 for undefined cpucfg index
>>>>>> - */
>>>>>> - if (index < KVM_MAX_CPUCFG_REGS)
>>>>>> - vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
>>>>>> - else
>>>>>> - vcpu->arch.gprs[rd] = 0;
>>>>>> - }
>>>>>> + if (inst.reg2_format.opcode == cpucfg_op)
>>>>>> + er = kvm_emu_cpucfg(vcpu, inst);
>>>>>> break;
>>>>>> case 0x4: /* CSR{RD,WR,XCHG} GSPR */
>>>>>> er = kvm_handle_csr(vcpu, inst);
>>>>>> --
>>>>>> 2.39.3
>>>>>>
>>>>
>>>>
>
在2024年2月27日二月 上午3:14,maobibo写道:
> On 2024/2/27 上午4:02, Jiaxun Yang wrote:
>>
>>
>> 在2024年2月26日二月 上午8:04,maobibo写道:
>>> On 2024/2/26 下午2:12, Huacai Chen wrote:
>>>> On Mon, Feb 26, 2024 at 10:04 AM maobibo <[email protected]> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2024/2/24 下午5:13, Huacai Chen wrote:
>>>>>> Hi, Bibo,
>>>>>>
>>>>>> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]> wrote:
>>>>>>>
>>>>>>> Instruction cpucfg can be used to get processor features. And there
>>>>>>> is trap exception when it is executed in VM mode, and also it is
>>>>>>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
>>>>>>> is used. Here one specified area 0x40000000 -- 0x400000ff is used
>>>>>>> for KVM hypervisor to privide PV features, and the area can be extended
>>>>>>> for other hypervisors in future. This area will never be used for
>>>>>>> real HW, it is only used by software.
>>>>>> After reading and thinking, I find that the hypercall method which is
>>>>>> used in our productive kernel is better than this cpucfg method.
>>>>>> Because hypercall is more simple and straightforward, plus we don't
>>>>>> worry about conflicting with the real hardware.
>>>>> No, I do not think so. cpucfg is simper than hypercall, hypercall can
>>>>> be in effect when system runs in guest mode. In some scenario like TCG
>>>>> mode, hypercall is illegal intruction, however cpucfg can work.
>>>> Nearly all architectures use hypercall except x86 for its historical
>>> Only x86 support multiple hypervisors and there is multiple hypervisor
>>> in x86 only. It is an advantage, not historical reason.
>>
>> I do believe that all those stuff should not be exposed to guest user space
>> for security reasons.
> Can you add PLV checking when cpucfg 0x40000000-0x400000FF is emulated?
> if it is user mode return value is zero and it is kernel mode emulated
> value will be returned. It can avoid information leaking.
Please don’t do insane stuff here, applications are not expecting exception from
cpucfg.
>
>>
>> Also for different implementations of hypervisors they may have different
>> PV features behavior, using hypcall to perform feature detection
>> can pass more information to help us cope with hypervisor diversity.
> How do different hypervisors can be detected firstly? On x86 MSR is
> used for all hypervisors detection and on ARM64 hyperv used
> acpi_gbl_FADT and kvm use smc forcely, host mode can execute smc
> instruction without exception on ARM64.
That’s hypcall ABI design choices, those information can come from firmware
or privileged CSRs on LoongArch.
>
> I do not know why hypercall is better than cpucfg on LoongArch, cpucfg
> is basic intruction however hypercall is not, it is part of LVZ feature.
KVM can only work with LVZ right?
>
>>>
>>>> reasons. If we use CPUCFG, then the hypervisor information is
>>>> unnecessarily leaked to userspace, and this may be a security issue.
>>>> Meanwhile, I don't think TCG mode needs PV features.
>>> Besides PV features, there is other features different with real hw such
>>> as virtio device, virtual interrupt controller.
>>
>> Those are *device* level information, they must be passed in firmware
>> interfaces to keep processor emulation sane.
> File arch/x86/hyperv/hv_apic.c can be referenced, apic features comes
> from ms_hyperv.hints and HYPERV_CPUID_ENLIGHTMENT_INFO cpuid info, not
> must be passed by firmware interface.
That’s not KVM, that’s Hyper V. At Linux Kernel we enjoy the benefits of better
modularity on device abstractions, please don’t break it.
Thanks
>
> Regards
> Bibo Mao
>>
>> Thanks
>>
>>>
>>> Regards
>>> Bibo Mao
>>>
>>>>
>>>> I consulted with Jiaxun before, and maybe he can give some more comments.
>>>>
>>>>>
>>>>> Extioi virtualization extension will be added later, cpucfg can be used
>>>>> to get extioi features. It is unlikely that extioi driver depends on
>>>>> PARA_VIRT macro if hypercall is used to get features.
>>>> CPUCFG is per-core information, if we really need something about
>>>> extioi, it should be in iocsr (LOONGARCH_IOCSR_FEATURES).
>>>>
>>>>
>>>> Huacai
>>>>
>>>>>
>>>>> Regards
>>>>> Bibo Mao
>>>>>
>>>>>>
>>>>>> Huacai
>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Bibo Mao <[email protected]>
>>>>>>> ---
>>>>>>> arch/loongarch/include/asm/inst.h | 1 +
>>>>>>> arch/loongarch/include/asm/loongarch.h | 10 ++++++
>>>>>>> arch/loongarch/kvm/exit.c | 46 +++++++++++++++++---------
>>>>>>> 3 files changed, 41 insertions(+), 16 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
>>>>>>> index d8f637f9e400..ad120f924905 100644
>>>>>>> --- a/arch/loongarch/include/asm/inst.h
>>>>>>> +++ b/arch/loongarch/include/asm/inst.h
>>>>>>> @@ -67,6 +67,7 @@ enum reg2_op {
>>>>>>> revhd_op = 0x11,
>>>>>>> extwh_op = 0x16,
>>>>>>> extwb_op = 0x17,
>>>>>>> + cpucfg_op = 0x1b,
>>>>>>> iocsrrdb_op = 0x19200,
>>>>>>> iocsrrdh_op = 0x19201,
>>>>>>> iocsrrdw_op = 0x19202,
>>>>>>> diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
>>>>>>> index 46366e783c84..a1d22e8b6f94 100644
>>>>>>> --- a/arch/loongarch/include/asm/loongarch.h
>>>>>>> +++ b/arch/loongarch/include/asm/loongarch.h
>>>>>>> @@ -158,6 +158,16 @@
>>>>>>> #define CPUCFG48_VFPU_CG BIT(2)
>>>>>>> #define CPUCFG48_RAM_CG BIT(3)
>>>>>>>
>>>>>>> +/*
>>>>>>> + * cpucfg index area: 0x40000000 -- 0x400000ff
>>>>>>> + * SW emulation for KVM hypervirsor
>>>>>>> + */
>>>>>>> +#define CPUCFG_KVM_BASE 0x40000000UL
>>>>>>> +#define CPUCFG_KVM_SIZE 0x100
>>>>>>> +#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
>>>>>>> +#define KVM_SIGNATURE "KVM\0"
>>>>>>> +#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
>>>>>>> +
>>>>>>> #ifndef __ASSEMBLY__
>>>>>>>
>>>>>>> /* CSR */
>>>>>>> diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exitc
>>>>>>> index 923bbca9bd22..6a38fd59d86d 100644
>>>>>>> --- a/arch/loongarch/kvm/exit.c
>>>>>>> +++ b/arch/loongarch/kvm/exit.c
>>>>>>> @@ -206,10 +206,37 @@ int kvm_emu_idle(struct kvm_vcpu *vcpu)
>>>>>>> return EMULATE_DONE;
>>>>>>> }
>>>>>>>
>>>>>>> -static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>>>>>>> +static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
>>>>>>> {
>>>>>>> int rd, rj;
>>>>>>> unsigned int index;
>>>>>>> +
>>>>>>> + rd = inst.reg2_format.rd;
>>>>>>> + rj = inst.reg2_format.rj;
>>>>>>> + ++vcpu->stat.cpucfg_exits;
>>>>>>> + index = vcpu->arch.gprs[rj];
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * By LoongArch Reference Manual 2.2.10.5
>>>>>>> + * Return value is 0 for undefined cpucfg index
>>>>>>> + */
>>>>>>> + switch (index) {
>>>>>>> + case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
>>>>>>> + vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
>>>>>>> + break;
>>>>>>> + case CPUCFG_KVM_SIG:
>>>>>>> + vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
>>>>>>> + break;
>>>>>>> + default:
>>>>>>> + vcpu->arch.gprs[rd] = 0;
>>>>>>> + break;
>>>>>>> + }
>>>>>>> +
>>>>>>> + return EMULATE_DONE;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>>>>>>> +{
>>>>>>> unsigned long curr_pc;
>>>>>>> larch_inst inst;
>>>>>>> enum emulation_result er = EMULATE_DONE;
>>>>>>> @@ -224,21 +251,8 @@ static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>>>>>>> er = EMULATE_FAIL;
>>>>>>> switch (((inst.word >> 24) & 0xff)) {
>>>>>>> case 0x0: /* CPUCFG GSPR */
>>>>>>> - if (inst.reg2_format.opcode == 0x1B) {
>>>>>>> - rd = inst.reg2_format.rd;
>>>>>>> - rj = inst.reg2_format.rj;
>>>>>>> - ++vcpu->stat.cpucfg_exits;
>>>>>>> - index = vcpu->arch.gprs[rj];
>>>>>>> - er = EMULATE_DONE;
>>>>>>> - /*
>>>>>>> - * By LoongArch Reference Manual 2.2.10.5
>>>>>>> - * return value is 0 for undefined cpucfg index
>>>>>>> - */
>>>>>>> - if (index < KVM_MAX_CPUCFG_REGS)
>>>>>>> - vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
>>>>>>> - else
>>>>>>> - vcpu->arch.gprs[rd] = 0;
>>>>>>> - }
>>>>>>> + if (inst.reg2_format.opcode == cpucfg_op)
>>>>>>> + er = kvm_emu_cpucfg(vcpu, inst);
>>>>>>> break;
>>>>>>> case 0x4: /* CSR{RD,WR,XCHG} GSPR */
>>>>>>> er = kvm_handle_csr(vcpu, inst);
>>>>>>> --
>>>>>>> 2.39.3
>>>>>>>
>>>>>
>>>>>
>>
--
- Jiaxun
On 2024/2/27 下午1:23, Jiaxun Yang wrote:
>
>
> 在2024年2月27日二月 上午3:14,maobibo写道:
>> On 2024/2/27 上午4:02, Jiaxun Yang wrote:
>>>
>>>
>>> 在2024年2月26日二月 上午8:04,maobibo写道:
>>>> On 2024/2/26 下午2:12, Huacai Chen wrote:
>>>>> On Mon, Feb 26, 2024 at 10:04 AM maobibo <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2024/2/24 下午5:13, Huacai Chen wrote:
>>>>>>> Hi, Bibo,
>>>>>>>
>>>>>>> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Instruction cpucfg can be used to get processor features. And there
>>>>>>>> is trap exception when it is executed in VM mode, and also it is
>>>>>>>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
>>>>>>>> is used. Here one specified area 0x40000000 -- 0x400000ff is used
>>>>>>>> for KVM hypervisor to privide PV features, and the area can be extended
>>>>>>>> for other hypervisors in future. This area will never be used for
>>>>>>>> real HW, it is only used by software.
>>>>>>> After reading and thinking, I find that the hypercall method which is
>>>>>>> used in our productive kernel is better than this cpucfg method.
>>>>>>> Because hypercall is more simple and straightforward, plus we don't
>>>>>>> worry about conflicting with the real hardware.
>>>>>> No, I do not think so. cpucfg is simper than hypercall, hypercall can
>>>>>> be in effect when system runs in guest mode. In some scenario like TCG
>>>>>> mode, hypercall is illegal intruction, however cpucfg can work.
>>>>> Nearly all architectures use hypercall except x86 for its historical
>>>> Only x86 support multiple hypervisors and there is multiple hypervisor
>>>> in x86 only. It is an advantage, not historical reason.
>>>
>>> I do believe that all those stuff should not be exposed to guest user space
>>> for security reasons.
>> Can you add PLV checking when cpucfg 0x40000000-0x400000FF is emulated?
>> if it is user mode return value is zero and it is kernel mode emulated
>> value will be returned. It can avoid information leaking.
>
> Please don’t do insane stuff here, applications are not expecting exception from
> cpucfg.
Sorry, I do not understand. Can you describe the behavior about cpucfg
instruction from applications? Why is there no exception for cpucfg.
>
>>
>>>
>>> Also for different implementations of hypervisors they may have different
>>> PV features behavior, using hypcall to perform feature detection
>>> can pass more information to help us cope with hypervisor diversity.
>> How do different hypervisors can be detected firstly? On x86 MSR is
>> used for all hypervisors detection and on ARM64 hyperv used
>> acpi_gbl_FADT and kvm use smc forcely, host mode can execute smc
>> instruction without exception on ARM64.
>
> That’s hypcall ABI design choices, those information can come from firmware
> or privileged CSRs on LoongArch.
Firstly the firmware or privileged CSRs is not relative with hypcall ABI
design choices. With CSR instruction, CSR ID is encoded in CSR
instruction, range about CSR ID is 16K; for cpucfg instruction, cpucfg
area is passed with register, range is UINT_MAX at least.
It is difficult to find an area unused by HW for CSR method since the
small CSR ID range. However it is easy for cpucfg. Here I doubt whether
you really know about LoongArch LVZ.
>
>>
>> I do not know why hypercall is better than cpucfg on LoongArch, cpucfg
>> is basic intruction however hypercall is not, it is part of LVZ feature.
>
> KVM can only work with LVZ right?
Linux kernel need boot well with TCG and KVM mode, hypercall is illegal
instruction with TCG mode.
Regards
Bibo Mao
>
>>
>>>>
>>>>> reasons. If we use CPUCFG, then the hypervisor information is
>>>>> unnecessarily leaked to userspace, and this may be a security issue.
>>>>> Meanwhile, I don't think TCG mode needs PV features.
>>>> Besides PV features, there is other features different with real hw such
>>>> as virtio device, virtual interrupt controller.
>>>
>>> Those are *device* level information, they must be passed in firmware
>>> interfaces to keep processor emulation sane.
>> File arch/x86/hyperv/hv_apic.c can be referenced, apic features comes
>> from ms_hyperv.hints and HYPERV_CPUID_ENLIGHTMENT_INFO cpuid info, not
>> must be passed by firmware interface.
>
> That’s not KVM, that’s Hyper V. At Linux Kernel we enjoy the benefits of better
> modularity on device abstractions, please don’t break it.
>
> Thanks
>
>>
>> Regards
>> Bibo Mao
>>>
>>> Thanks
>>>
>>>>
>>>> Regards
>>>> Bibo Mao
>>>>
>>>>>
>>>>> I consulted with Jiaxun before, and maybe he can give some more comments.
>>>>>
>>>>>>
>>>>>> Extioi virtualization extension will be added later, cpucfg can be used
>>>>>> to get extioi features. It is unlikely that extioi driver depends on
>>>>>> PARA_VIRT macro if hypercall is used to get features.
>>>>> CPUCFG is per-core information, if we really need something about
>>>>> extioi, it should be in iocsr (LOONGARCH_IOCSR_FEATURES).
>>>>>
>>>>>
>>>>> Huacai
>>>>>
>>>>>>
>>>>>> Regards
>>>>>> Bibo Mao
>>>>>>
>>>>>>>
>>>>>>> Huacai
>>>>>>>
>>>>>>>>
>>>>>>>> Signed-off-by: Bibo Mao <[email protected]>
>>>>>>>> ---
>>>>>>>> arch/loongarch/include/asm/inst.h | 1 +
>>>>>>>> arch/loongarch/include/asm/loongarch.h | 10 ++++++
>>>>>>>> arch/loongarch/kvm/exit.c | 46 +++++++++++++++++---------
>>>>>>>> 3 files changed, 41 insertions(+), 16 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
>>>>>>>> index d8f637f9e400..ad120f924905 100644
>>>>>>>> --- a/arch/loongarch/include/asm/inst.h
>>>>>>>> +++ b/arch/loongarch/include/asm/inst.h
>>>>>>>> @@ -67,6 +67,7 @@ enum reg2_op {
>>>>>>>> revhd_op = 0x11,
>>>>>>>> extwh_op = 0x16,
>>>>>>>> extwb_op = 0x17,
>>>>>>>> + cpucfg_op = 0x1b,
>>>>>>>> iocsrrdb_op = 0x19200,
>>>>>>>> iocsrrdh_op = 0x19201,
>>>>>>>> iocsrrdw_op = 0x19202,
>>>>>>>> diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
>>>>>>>> index 46366e783c84..a1d22e8b6f94 100644
>>>>>>>> --- a/arch/loongarch/include/asm/loongarch.h
>>>>>>>> +++ b/arch/loongarch/include/asm/loongarch.h
>>>>>>>> @@ -158,6 +158,16 @@
>>>>>>>> #define CPUCFG48_VFPU_CG BIT(2)
>>>>>>>> #define CPUCFG48_RAM_CG BIT(3)
>>>>>>>>
>>>>>>>> +/*
>>>>>>>> + * cpucfg index area: 0x40000000 -- 0x400000ff
>>>>>>>> + * SW emulation for KVM hypervirsor
>>>>>>>> + */
>>>>>>>> +#define CPUCFG_KVM_BASE 0x40000000UL
>>>>>>>> +#define CPUCFG_KVM_SIZE 0x100
>>>>>>>> +#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
>>>>>>>> +#define KVM_SIGNATURE "KVM\0"
>>>>>>>> +#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
>>>>>>>> +
>>>>>>>> #ifndef __ASSEMBLY__
>>>>>>>>
>>>>>>>> /* CSR */
>>>>>>>> diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
>>>>>>>> index 923bbca9bd22..6a38fd59d86d 100644
>>>>>>>> --- a/arch/loongarch/kvm/exit.c
>>>>>>>> +++ b/arch/loongarch/kvm/exit.c
>>>>>>>> @@ -206,10 +206,37 @@ int kvm_emu_idle(struct kvm_vcpu *vcpu)
>>>>>>>> return EMULATE_DONE;
>>>>>>>> }
>>>>>>>>
>>>>>>>> -static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>>>>>>>> +static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
>>>>>>>> {
>>>>>>>> int rd, rj;
>>>>>>>> unsigned int index;
>>>>>>>> +
>>>>>>>> + rd = inst.reg2_format.rd;
>>>>>>>> + rj = inst.reg2_format.rj;
>>>>>>>> + ++vcpu->stat.cpucfg_exits;
>>>>>>>> + index = vcpu->arch.gprs[rj];
>>>>>>>> +
>>>>>>>> + /*
>>>>>>>> + * By LoongArch Reference Manual 2.2.10.5
>>>>>>>> + * Return value is 0 for undefined cpucfg index
>>>>>>>> + */
>>>>>>>> + switch (index) {
>>>>>>>> + case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
>>>>>>>> + vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
>>>>>>>> + break;
>>>>>>>> + case CPUCFG_KVM_SIG:
>>>>>>>> + vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
>>>>>>>> + break;
>>>>>>>> + default:
>>>>>>>> + vcpu->arch.gprs[rd] = 0;
>>>>>>>> + break;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + return EMULATE_DONE;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>>>>>>>> +{
>>>>>>>> unsigned long curr_pc;
>>>>>>>> larch_inst inst;
>>>>>>>> enum emulation_result er = EMULATE_DONE;
>>>>>>>> @@ -224,21 +251,8 @@ static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
>>>>>>>> er = EMULATE_FAIL;
>>>>>>>> switch (((inst.word >> 24) & 0xff)) {
>>>>>>>> case 0x0: /* CPUCFG GSPR */
>>>>>>>> - if (inst.reg2_format.opcode == 0x1B) {
>>>>>>>> - rd = inst.reg2_format.rd;
>>>>>>>> - rj = inst.reg2_format.rj;
>>>>>>>> - ++vcpu->stat.cpucfg_exits;
>>>>>>>> - index = vcpu->arch.gprs[rj];
>>>>>>>> - er = EMULATE_DONE;
>>>>>>>> - /*
>>>>>>>> - * By LoongArch Reference Manual 2.2.10.5
>>>>>>>> - * return value is 0 for undefined cpucfg index
>>>>>>>> - */
>>>>>>>> - if (index < KVM_MAX_CPUCFG_REGS)
>>>>>>>> - vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
>>>>>>>> - else
>>>>>>>> - vcpu->arch.gprs[rd] = 0;
>>>>>>>> - }
>>>>>>>> + if (inst.reg2_format.opcode == cpucfg_op)
>>>>>>>> + er = kvm_emu_cpucfg(vcpu, inst);
>>>>>>>> break;
>>>>>>>> case 0x4: /* CSR{RD,WR,XCHG} GSPR */
>>>>>>>> er = kvm_handle_csr(vcpu, inst);
>>>>>>>> --
>>>>>>>> 2.39.3
>>>>>>>>
>>>>>>
>>>>>>
>>>
>
On 2/27/24 11:14, maobibo wrote:
>
>
> On 2024/2/27 上午4:02, Jiaxun Yang wrote:
>>
>>
>> 在2024年2月26日二月 上午8:04,maobibo写道:
>>> On 2024/2/26 下午2:12, Huacai Chen wrote:
>>>> On Mon, Feb 26, 2024 at 10:04 AM maobibo <[email protected]> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2024/2/24 下午5:13, Huacai Chen wrote:
>>>>>> Hi, Bibo,
>>>>>>
>>>>>> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> Instruction cpucfg can be used to get processor features. And there
>>>>>>> is trap exception when it is executed in VM mode, and also it is
>>>>>>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
>>>>>>> is used. Here one specified area 0x40000000 -- 0x400000ff is used
>>>>>>> for KVM hypervisor to privide PV features, and the area can be
>>>>>>> extended
>>>>>>> for other hypervisors in future. This area will never be used for
>>>>>>> real HW, it is only used by software.
>>>>>> After reading and thinking, I find that the hypercall method which is
>>>>>> used in our productive kernel is better than this cpucfg method.
>>>>>> Because hypercall is more simple and straightforward, plus we don't
>>>>>> worry about conflicting with the real hardware.
>>>>> No, I do not think so. cpucfg is simper than hypercall, hypercall can
>>>>> be in effect when system runs in guest mode. In some scenario like TCG
>>>>> mode, hypercall is illegal intruction, however cpucfg can work.
>>>> Nearly all architectures use hypercall except x86 for its historical
>>> Only x86 support multiple hypervisors and there is multiple hypervisor
>>> in x86 only. It is an advantage, not historical reason.
>>
>> I do believe that all those stuff should not be exposed to guest user
>> space
>> for security reasons.
> Can you add PLV checking when cpucfg 0x40000000-0x400000FF is emulated?
> if it is user mode return value is zero and it is kernel mode emulated
> value will be returned. It can avoid information leaking.
I've suggested this approach in another reply [1], but I've rechecked
the manual, and it turns out this behavior is not permitted by the
current wording. See LoongArch Reference Manual v1.10, Volume 1, Section
2.2.10.5 "CPUCFG":
> CPUCFG 访问未定义的配置字将读回全 0 值。
>
> Reads of undefined CPUCFG configuration words shall return all-zeroes.
This sentence mentions no distinction based on privilege modes, so it
can only mean the behavior applies universally regardless of privilege
modes.
I think if you want to make CPUCFG behavior PLV-dependent, you may have
to ask the LoongArch spec editors, internally or in public, for a new
spec revision.
(There are already multiple third-party LoongArch implementers as of
late 2023, so any ISA-level change like this would best be coordinated,
to minimize surprises.)
[1]:
https://lore.kernel.org/loongarch/[email protected]/
--
WANG "xen0n" Xuerui
Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/
On Tue, 2024-02-27 at 15:09 +0800, maobibo wrote:
> It is difficult to find an area unused by HW for CSR method since the
> small CSR ID range.
We may use IOCSR instead. In kernel/cpu-probe.c there are already some
IOCSR reads.
> It is difficult to find an area unused by HW for CSR method since the
> small CSR ID range. However it is easy for cpucfg. Here I doubt whether
> you really know about LoongArch LVZ.
It's unfair to accuse the others for not knowing about LVZ considering
the lack of public documentation.
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University
On 2024/2/27 下午5:05, Xi Ruoyao wrote:
> On Tue, 2024-02-27 at 15:09 +0800, maobibo wrote:
>
>> It is difficult to find an area unused by HW for CSR method since the
>> small CSR ID range.
>
> We may use IOCSR instead. In kernel/cpu-probe.c there are already some
> IOCSR reads.
yes, IOCSR can be used and one IOCSR area will be reserved for software.
In general CSR/CPUCFG is to describe CPU features and IOCSR is to
describe board/device features.
CSR area is limited to 16K, it is difficult for HW guys to reserve
special area for software usage. IOCSR/CPUCFG is possible however.
Regards
Bibo Mao
>
>> It is difficult to find an area unused by HW for CSR method since the
>> small CSR ID range. However it is easy for cpucfg. Here I doubt whether
>> you really know about LoongArch LVZ.
>
> It's unfair to accuse the others for not knowing about LVZ considering
> the lack of public documentation.
>
On 2024/2/27 下午5:10, WANG Xuerui wrote:
> On 2/27/24 11:14, maobibo wrote:
>>
>>
>> On 2024/2/27 上午4:02, Jiaxun Yang wrote:
>>>
>>>
>>> 在2024年2月26日二月 上午8:04,maobibo写道:
>>>> On 2024/2/26 下午2:12, Huacai Chen wrote:
>>>>> On Mon, Feb 26, 2024 at 10:04 AM maobibo <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2024/2/24 下午5:13, Huacai Chen wrote:
>>>>>>> Hi, Bibo,
>>>>>>>
>>>>>>> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Instruction cpucfg can be used to get processor features. And there
>>>>>>>> is trap exception when it is executed in VM mode, and also it is
>>>>>>>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
>>>>>>>> is used. Here one specified area 0x40000000 -- 0x400000ff is used
>>>>>>>> for KVM hypervisor to privide PV features, and the area can be
>>>>>>>> extended
>>>>>>>> for other hypervisors in future. This area will never be used for
>>>>>>>> real HW, it is only used by software.
>>>>>>> After reading and thinking, I find that the hypercall method
>>>>>>> which is
>>>>>>> used in our productive kernel is better than this cpucfg method.
>>>>>>> Because hypercall is more simple and straightforward, plus we don't
>>>>>>> worry about conflicting with the real hardware.
>>>>>> No, I do not think so. cpucfg is simper than hypercall, hypercall can
>>>>>> be in effect when system runs in guest mode. In some scenario like
>>>>>> TCG
>>>>>> mode, hypercall is illegal intruction, however cpucfg can work.
>>>>> Nearly all architectures use hypercall except x86 for its historical
>>>> Only x86 support multiple hypervisors and there is multiple hypervisor
>>>> in x86 only. It is an advantage, not historical reason.
>>>
>>> I do believe that all those stuff should not be exposed to guest user
>>> space
>>> for security reasons.
>> Can you add PLV checking when cpucfg 0x40000000-0x400000FF is
>> emulated? if it is user mode return value is zero and it is kernel
>> mode emulated value will be returned. It can avoid information leaking.
>
> I've suggested this approach in another reply [1], but I've rechecked
> the manual, and it turns out this behavior is not permitted by the
> current wording. See LoongArch Reference Manual v1.10, Volume 1, Section
> 2.2.10.5 "CPUCFG":
>
> > CPUCFG 访问未定义的配置字将读回全 0 值。
> >
> > Reads of undefined CPUCFG configuration words shall return all-zeroes.
>
> This sentence mentions no distinction based on privilege modes, so it
> can only mean the behavior applies universally regardless of privilege
> modes.
>
> I think if you want to make CPUCFG behavior PLV-dependent, you may have
> to ask the LoongArch spec editors, internally or in public, for a new
> spec revision.
No, CPUCFG behavior between CPUCFG0-CPUCFG21 is unchanged, only that it
can be defined by software since CPUCFG 0x400000000 is used by software. >
> (There are already multiple third-party LoongArch implementers as of
> late 2023, so any ISA-level change like this would best be coordinated,
> to minimize surprises.)
With document Vol 4-23
https://www.intel.com/content/dam/develop/external/us/en/documents/335592-sdm-vol-4.pdf
There is one line "MSR address range between 40000000H - 400000FFH is
marked as a specially reserved range. All existing and
future processors will not implement any features using any MSR in this
range."
It only says that it is reserved, it does not say detailed software
behavior. Software behavior is defined in hypervisor such as:
https://github.com/MicrosoftDocs/Virtualization-Documentation/blob/main/tlfs/Requirements%20for%20Implementing%20the%20Microsoft%20Hypervisor%20Interface.pdf
https://kb.vmware.com/s/article/1009458
If hypercall method is used, there should be ABI also like aarch64:
https://documentation-service.arm.com/static/6013e5faeee5236980d08619
Regards
Bibo Mao
>
> [1]:
> https://lore.kernel.org/loongarch/[email protected]/
>
>
On 2/27/24 18:12, maobibo wrote:
>
>
> On 2024/2/27 下午5:10, WANG Xuerui wrote:
>> On 2/27/24 11:14, maobibo wrote:
>>>
>>>
>>> On 2024/2/27 上午4:02, Jiaxun Yang wrote:
>>>>
>>>>
>>>> 在2024年2月26日二月 上午8:04,maobibo写道:
>>>>> On 2024/2/26 下午2:12, Huacai Chen wrote:
>>>>>> On Mon, Feb 26, 2024 at 10:04 AM maobibo <[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2024/2/24 下午5:13, Huacai Chen wrote:
>>>>>>>> Hi, Bibo,
>>>>>>>>
>>>>>>>> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Instruction cpucfg can be used to get processor features. And
>>>>>>>>> there
>>>>>>>>> is trap exception when it is executed in VM mode, and also it is
>>>>>>>>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
>>>>>>>>> is used. Here one specified area 0x40000000 -- 0x400000ff is used
>>>>>>>>> for KVM hypervisor to privide PV features, and the area can be
>>>>>>>>> extended
>>>>>>>>> for other hypervisors in future. This area will never be used for
>>>>>>>>> real HW, it is only used by software.
>>>>>>>> After reading and thinking, I find that the hypercall method
>>>>>>>> which is
>>>>>>>> used in our productive kernel is better than this cpucfg method.
>>>>>>>> Because hypercall is more simple and straightforward, plus we don't
>>>>>>>> worry about conflicting with the real hardware.
>>>>>>> No, I do not think so. cpucfg is simper than hypercall, hypercall
>>>>>>> can
>>>>>>> be in effect when system runs in guest mode. In some scenario
>>>>>>> like TCG
>>>>>>> mode, hypercall is illegal intruction, however cpucfg can work.
>>>>>> Nearly all architectures use hypercall except x86 for its historical
>>>>> Only x86 support multiple hypervisors and there is multiple hypervisor
>>>>> in x86 only. It is an advantage, not historical reason.
>>>>
>>>> I do believe that all those stuff should not be exposed to guest
>>>> user space
>>>> for security reasons.
>>> Can you add PLV checking when cpucfg 0x40000000-0x400000FF is
>>> emulated? if it is user mode return value is zero and it is kernel
>>> mode emulated value will be returned. It can avoid information leaking.
>>
>> I've suggested this approach in another reply [1], but I've rechecked
>> the manual, and it turns out this behavior is not permitted by the
>> current wording. See LoongArch Reference Manual v1.10, Volume 1,
>> Section 2.2.10.5 "CPUCFG":
>>
>> > CPUCFG 访问未定义的配置字将读回全 0 值。
>> >
>> > Reads of undefined CPUCFG configuration words shall return all-zeroes.
>>
>> This sentence mentions no distinction based on privilege modes, so it
>> can only mean the behavior applies universally regardless of privilege
>> modes.
>>
>> I think if you want to make CPUCFG behavior PLV-dependent, you may
>> have to ask the LoongArch spec editors, internally or in public, for a
>> new spec revision.
> No, CPUCFG behavior between CPUCFG0-CPUCFG21 is unchanged, only that it
> can be defined by software since CPUCFG 0x400000000 is used by software.
The 0x40000000 range is not mentioned in the manuals. I know you've
confirmed privately with HW team but this needs to be properly
documented for public projects to properly rely on.
>> (There are already multiple third-party LoongArch implementers as of
>> late 2023, so any ISA-level change like this would best be
>> coordinated, to minimize surprises.)
> With document Vol 4-23
> https://www.intel.com/content/dam/develop/external/us/en/documents/335592-sdm-vol-4.pdf
>
> There is one line "MSR address range between 40000000H - 400000FFH is
> marked as a specially reserved range. All existing and
> future processors will not implement any features using any MSR in this
> range."
Thanks for providing this info, now at least we know why it's this
specific range of 0x400000XX that's chosen.
>
> It only says that it is reserved, it does not say detailed software
> behavior. Software behavior is defined in hypervisor such as:
> https://github.com/MicrosoftDocs/Virtualization-Documentation/blob/main/tlfs/Requirements%20for%20Implementing%20the%20Microsoft%20Hypervisor%20Interface.pdf
> https://kb.vmware.com/s/article/1009458
>
> If hypercall method is used, there should be ABI also like aarch64:
> https://documentation-service.arm.com/static/6013e5faeee5236980d08619
Yes proper documentation of public API surface is always necessary
*before* doing real work. Because right now the hypercall provider is
Linux KVM, maybe we can document the existing and planned hypercall
usage and ABI in the kernel docs along with code changes.
--
WANG "xen0n" Xuerui
Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/
Hi, Paolo,
I'm very sorry to bother you, now we have a controversy about how to
query hypervisor features (PV IPI, PV timer and maybe PV spinlock) on
LoongArch, and we have two choices:
1, Get from CPUCFG registers;
2, Get from a hypercall.
CPUCFG are unprivileged registers which can be read from user space,
so the first method can unnecessarily leak information to userspace,
but this method is more or less similar to x86's CPUID solution.
Hypercall is of course a privileged method (so no info leak issues),
and this method is used by ARM/ARM64 and most other architectures.
Besides, LoongArch's CPUCFG is supposed to provide per-core features,
while not all hypervisor features are per-core information (Of course
PV IPI is, but others may be or may not be, while 'extioi' is
obviously not). Bibo thinks that only CPUCFG has enough space to
contain all hypervisor features (CSR and IOCSR are not enough).
However it is obvious that we don't need any register space to hold
these features, because they are held in the hypervisor's memory. The
only information that needs register space is "whether I am in a VM"
(in this case we really cannot use hypercall), but this feature is
already in IOCSR (LOONGARCH_IOCSR_FEATURES).
Now my question is: for a new architecture, which method is
preferable, maintainable and extensible? Of course "they both OK" for
the current purpose in this patch, but I think you can give us more
useful information from a professor's view.
More details are available in this thread, about the 3rd patch. Any
suggestions are welcome.
Huacai
On Tue, Feb 27, 2024 at 6:19 PM WANG Xuerui <[email protected]> wrote:
>
> On 2/27/24 18:12, maobibo wrote:
> >
> >
> > On 2024/2/27 下午5:10, WANG Xuerui wrote:
> >> On 2/27/24 11:14, maobibo wrote:
> >>>
> >>>
> >>> On 2024/2/27 上午4:02, Jiaxun Yang wrote:
> >>>>
> >>>>
> >>>> 在2024年2月26日二月 上午8:04,maobibo写道:
> >>>>> On 2024/2/26 下午2:12, Huacai Chen wrote:
> >>>>>> On Mon, Feb 26, 2024 at 10:04 AM maobibo <maobibo@loongsoncn> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 2024/2/24 下午5:13, Huacai Chen wrote:
> >>>>>>>> Hi, Bibo,
> >>>>>>>>
> >>>>>>>> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Instruction cpucfg can be used to get processor features. And
> >>>>>>>>> there
> >>>>>>>>> is trap exception when it is executed in VM mode, and also it is
> >>>>>>>>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
> >>>>>>>>> is used. Here one specified area 0x40000000 -- 0x400000ff is used
> >>>>>>>>> for KVM hypervisor to privide PV features, and the area can be
> >>>>>>>>> extended
> >>>>>>>>> for other hypervisors in future. This area will never be used for
> >>>>>>>>> real HW, it is only used by software.
> >>>>>>>> After reading and thinking, I find that the hypercall method
> >>>>>>>> which is
> >>>>>>>> used in our productive kernel is better than this cpucfg method.
> >>>>>>>> Because hypercall is more simple and straightforward, plus we don't
> >>>>>>>> worry about conflicting with the real hardware.
> >>>>>>> No, I do not think so. cpucfg is simper than hypercall, hypercall
> >>>>>>> can
> >>>>>>> be in effect when system runs in guest mode. In some scenario
> >>>>>>> like TCG
> >>>>>>> mode, hypercall is illegal intruction, however cpucfg can work.
> >>>>>> Nearly all architectures use hypercall except x86 for its historical
> >>>>> Only x86 support multiple hypervisors and there is multiple hypervisor
> >>>>> in x86 only. It is an advantage, not historical reason.
> >>>>
> >>>> I do believe that all those stuff should not be exposed to guest
> >>>> user space
> >>>> for security reasons.
> >>> Can you add PLV checking when cpucfg 0x40000000-0x400000FF is
> >>> emulated? if it is user mode return value is zero and it is kernel
> >>> mode emulated value will be returned. It can avoid information leaking.
> >>
> >> I've suggested this approach in another reply [1], but I've rechecked
> >> the manual, and it turns out this behavior is not permitted by the
> >> current wording. See LoongArch Reference Manual v1.10, Volume 1,
> >> Section 2.2.10.5 "CPUCFG":
> >>
> >> > CPUCFG 访问未定义的配置字将读回全 0 值。
> >> >
> >> > Reads of undefined CPUCFG configuration words shall return all-zeroes.
> >>
> >> This sentence mentions no distinction based on privilege modes, so it
> >> can only mean the behavior applies universally regardless of privilege
> >> modes.
> >>
> >> I think if you want to make CPUCFG behavior PLV-dependent, you may
> >> have to ask the LoongArch spec editors, internally or in public, for a
> >> new spec revision.
> > No, CPUCFG behavior between CPUCFG0-CPUCFG21 is unchanged, only that it
> > can be defined by software since CPUCFG 0x400000000 is used by software.
>
> The 0x40000000 range is not mentioned in the manuals. I know you've
> confirmed privately with HW team but this needs to be properly
> documented for public projects to properly rely on.
>
> >> (There are already multiple third-party LoongArch implementers as of
> >> late 2023, so any ISA-level change like this would best be
> >> coordinated, to minimize surprises.)
> > With document Vol 4-23
> > https://www.intel.com/content/dam/develop/external/us/en/documents/335592-sdm-vol-4.pdf
> >
> > There is one line "MSR address range between 40000000H - 400000FFH is
> > marked as a specially reserved range. All existing and
> > future processors will not implement any features using any MSR in this
> > range."
>
> Thanks for providing this info, now at least we know why it's this
> specific range of 0x400000XX that's chosen.
>
> >
> > It only says that it is reserved, it does not say detailed software
> > behavior. Software behavior is defined in hypervisor such as:
> > https://github.com/MicrosoftDocs/Virtualization-Documentation/blob/main/tlfs/Requirements%20for%20Implementing%20the%20Microsoft%20Hypervisor%20Interface.pdf
> > https://kb.vmware.com/s/article/1009458
> >
> > If hypercall method is used, there should be ABI also like aarch64:
> > https://documentation-service.arm.com/static/6013e5faeee5236980d08619
>
> Yes proper documentation of public API surface is always necessary
> *before* doing real work. Because right now the hypercall provider is
> Linux KVM, maybe we can document the existing and planned hypercall
> usage and ABI in the kernel docs along with code changes.
>
> --
> WANG "xen0n" Xuerui
>
> Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/
>
On 2024/2/27 下午6:19, WANG Xuerui wrote:
> On 2/27/24 18:12, maobibo wrote:
>>
>>
>> On 2024/2/27 下午5:10, WANG Xuerui wrote:
>>> On 2/27/24 11:14, maobibo wrote:
>>>>
>>>>
>>>> On 2024/2/27 上午4:02, Jiaxun Yang wrote:
>>>>>
>>>>>
>>>>> 在2024年2月26日二月 上午8:04,maobibo写道:
>>>>>> On 2024/2/26 下午2:12, Huacai Chen wrote:
>>>>>>> On Mon, Feb 26, 2024 at 10:04 AM maobibo <[email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2024/2/24 下午5:13, Huacai Chen wrote:
>>>>>>>>> Hi, Bibo,
>>>>>>>>>
>>>>>>>>> On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Instruction cpucfg can be used to get processor features. And
>>>>>>>>>> there
>>>>>>>>>> is trap exception when it is executed in VM mode, and also it is
>>>>>>>>>> to provide cpu features to VM. On real hardware cpucfg area 0
>>>>>>>>>> - 20
>>>>>>>>>> is used. Here one specified area 0x40000000 -- 0x400000ff is
>>>>>>>>>> used
>>>>>>>>>> for KVM hypervisor to privide PV features, and the area can be
>>>>>>>>>> extended
>>>>>>>>>> for other hypervisors in future. This area will never be used for
>>>>>>>>>> real HW, it is only used by software.
>>>>>>>>> After reading and thinking, I find that the hypercall method
>>>>>>>>> which is
>>>>>>>>> used in our productive kernel is better than this cpucfg method.
>>>>>>>>> Because hypercall is more simple and straightforward, plus we
>>>>>>>>> don't
>>>>>>>>> worry about conflicting with the real hardware.
>>>>>>>> No, I do not think so. cpucfg is simper than hypercall,
>>>>>>>> hypercall can
>>>>>>>> be in effect when system runs in guest mode. In some scenario
>>>>>>>> like TCG
>>>>>>>> mode, hypercall is illegal intruction, however cpucfg can work.
>>>>>>> Nearly all architectures use hypercall except x86 for its historical
>>>>>> Only x86 support multiple hypervisors and there is multiple
>>>>>> hypervisor
>>>>>> in x86 only. It is an advantage, not historical reason.
>>>>>
>>>>> I do believe that all those stuff should not be exposed to guest
>>>>> user space
>>>>> for security reasons.
>>>> Can you add PLV checking when cpucfg 0x40000000-0x400000FF is
>>>> emulated? if it is user mode return value is zero and it is kernel
>>>> mode emulated value will be returned. It can avoid information leaking.
>>>
>>> I've suggested this approach in another reply [1], but I've rechecked
>>> the manual, and it turns out this behavior is not permitted by the
>>> current wording. See LoongArch Reference Manual v1.10, Volume 1,
>>> Section 2.2.10.5 "CPUCFG":
>>>
>>> > CPUCFG 访问未定义的配置字将读回全 0 值。
>>> >
>>> > Reads of undefined CPUCFG configuration words shall return
>>> all-zeroes.
>>>
>>> This sentence mentions no distinction based on privilege modes, so it
>>> can only mean the behavior applies universally regardless of
>>> privilege modes.
>>>
>>> I think if you want to make CPUCFG behavior PLV-dependent, you may
>>> have to ask the LoongArch spec editors, internally or in public, for
>>> a new spec revision.
>> No, CPUCFG behavior between CPUCFG0-CPUCFG21 is unchanged, only that
>> it can be defined by software since CPUCFG 0x400000000 is used by
>> software.
>
> The 0x40000000 range is not mentioned in the manuals. I know you've
> confirmed privately with HW team but this needs to be properly
> documented for public projects to properly rely on.
>
>>> (There are already multiple third-party LoongArch implementers as of
>>> late 2023, so any ISA-level change like this would best be
>>> coordinated, to minimize surprises.)
>> With document Vol 4-23
>> https://www.intel.com/content/dam/develop/external/us/en/documents/335592-sdm-vol-4.pdf
>>
>>
>> There is one line "MSR address range between 40000000H - 400000FFH is
>> marked as a specially reserved range. All existing and
>> future processors will not implement any features using any MSR in
>> this range."
>
> Thanks for providing this info, now at least we know why it's this
> specific range of 0x400000XX that's chosen.
>
>>
>> It only says that it is reserved, it does not say detailed software
>> behavior. Software behavior is defined in hypervisor such as:
>> https://github.com/MicrosoftDocs/Virtualization-Documentation/blob/main/tlfs/Requirements%20for%20Implementing%20the%20Microsoft%20Hypervisor%20Interface.pdf
>>
>> https://kb.vmware.com/s/article/1009458
>>
>> If hypercall method is used, there should be ABI also like aarch64:
>> https://documentation-service.arm.com/static/6013e5faeee5236980d08619
>
> Yes proper documentation of public API surface is always necessary
> *before* doing real work. Because right now the hypercall provider is
> Linux KVM, maybe we can document the existing and planned hypercall
> usage and ABI in the kernel docs along with code changes.
>
yes, I will add hypercall in kernel docs.
SMC calling convention is ABI between OS and secure firmware, LoongArch
has no secure mode, it not necessary to has such doc.The hypercall
calling convention is relative with hypervisor SW, not relative with CPU
HW vendor. Each hypervisor maybe has its separate hypercall calling
convention just like syscall ABIs.
Regards
Bibo Mao