2024-04-28 10:05:44

by Bibo Mao

[permalink] [raw]
Subject: [PATCH v8 0/6] LoongArch: Add pv ipi support on LoongArch VM

On physical machine, ipi HW uses IOCSR registers, however there is trap
into hypervisor when vcpu accesses IOCSR registers if system is in VM
mode. SWI is a interrupt mechanism like SGI on ARM, software can send
interrupt to CPU, only that on LoongArch SWI can only be sent to local CPU
now. So SWI can not used for IPI on real HW system, however it can be used
on VM when combined with hypercall method. IPI can be sent with hypercall
method and SWI interrupt is injected to vcpu, vcpu can treat SWI
interrupt as IPI.

With PV IPI supported, there is one trap with IPI sending, however with IPI
receiving there is no trap. with IOCSR HW ipi method, there will be one
trap with IPI sending and two trap with ipi receiving.

Also IPI multicast support is added for VM, the idea comes from x86 PV ipi.
IPI can be sent to 128 vcpus in one time. With IPI multicast support, trap
will be reduced greatly.

Here is the microbenchmarck data with "perf bench futex wake" testcase on
3C5000 single-way machine, there are 16 cpus on 3C5000 single-way machine,
VM has 16 vcpus also. The benchmark data is ms time unit to wakeup 16
threads, the performance is better if data is smaller.

physical machine 0.0176 ms
VM original 0.1140 ms
VM with pv ipi patch 0.0481 ms

It passes to boot with 128/256 vcpus, and passes to run runltp command
with package ltp-20230516.

---
v7 --- v8:
1. Remove kernel PLV mode checking with cpucfg emulation for hypervisor
feature inquiry.
2. Remove document about loongarch hypercall ABI per request of huacai,
will add English/Chinese doc at the same time in later.

v6 --- v7:
1. Refine LoongArch virt document by review comments.
2. Add function kvm_read_reg()/kvm_write_reg() in hypercall emulation,
and later it can be used for other trap emulations.

v5 --- v6:
1. Add privilege checking when emulating cpucfg at index 0x4000000 --
0x400000FF, return 0 if not executed at kernel mode.
2. Add document about LoongArch pv ipi with new creatly directory
Documentation/virt/kvm/loongarch/
3. Fix pv ipi handling in kvm backend function kvm_pv_send_ipi(),
where min should plus BITS_PER_LONG with second bitmap, otherwise
VM with more than 64 vpus fails to boot.
4. Adjust patch order and code refine with review comments.

v4 --- v5:
1. Refresh function/macro name from review comments.

v3 --- v4:
1. Modfiy pv ipi hook function name call_func_ipi() and
call_func_single_ipi() with send_ipi_mask()/send_ipi_single(), since pv
ipi is used for both remote function call and reschedule notification.
2. Refresh changelog.

v2 --- v3:
1. Add 128 vcpu ipi multicast support like x86
2. Change cpucfg base address from 0x10000000 to 0x40000000, in order
to avoid confliction with future hw usage
3. Adjust patch order in this patchset, move patch
Refine-ipi-ops-on-LoongArch-platform to the first one.

v1 --- v2:
1. Add hw cpuid map support since ipi routing uses hw cpuid
2. Refine changelog description
3. Add hypercall statistic support for vcpu
4. Set percpu pv ipi message buffer aligned with cacheline
5. Refine pv ipi send logic, do not send ipi message with if there is
pending ipi message.
---
Bibo Mao (6):
LoongArch/smp: Refine some ipi functions on LoongArch platform
LoongArch: KVM: Add hypercall instruction emulation support
LoongArch: KVM: Add cpucfg area for kvm hypervisor
LoongArch: KVM: Add vcpu search support from physical cpuid
LoongArch: KVM: Add pv ipi support on kvm side
LoongArch: Add pv ipi support on guest kernel side

arch/loongarch/Kconfig | 9 +
arch/loongarch/include/asm/Kbuild | 1 -
arch/loongarch/include/asm/hardirq.h | 5 +
arch/loongarch/include/asm/inst.h | 1 +
arch/loongarch/include/asm/irq.h | 10 +-
arch/loongarch/include/asm/kvm_host.h | 27 +++
arch/loongarch/include/asm/kvm_para.h | 155 ++++++++++++++++++
arch/loongarch/include/asm/kvm_vcpu.h | 11 ++
arch/loongarch/include/asm/loongarch.h | 11 ++
arch/loongarch/include/asm/paravirt.h | 27 +++
.../include/asm/paravirt_api_clock.h | 1 +
arch/loongarch/include/asm/smp.h | 31 ++--
arch/loongarch/include/uapi/asm/Kbuild | 2 -
arch/loongarch/kernel/Makefile | 1 +
arch/loongarch/kernel/irq.c | 24 +--
arch/loongarch/kernel/paravirt.c | 151 +++++++++++++++++
arch/loongarch/kernel/perf_event.c | 14 +-
arch/loongarch/kernel/smp.c | 62 ++++---
arch/loongarch/kernel/time.c | 12 +-
arch/loongarch/kvm/exit.c | 132 +++++++++++++--
arch/loongarch/kvm/vcpu.c | 94 ++++++++++-
arch/loongarch/kvm/vm.c | 11 ++
22 files changed, 690 insertions(+), 102 deletions(-)
create mode 100644 arch/loongarch/include/asm/kvm_para.h
create mode 100644 arch/loongarch/include/asm/paravirt.h
create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild
create mode 100644 arch/loongarch/kernel/paravirt.c


base-commit: 5eb4573ea63d0c83bf58fb7c243fc2c2b6966c02
--
2.39.3



2024-04-28 10:06:00

by Bibo Mao

[permalink] [raw]
Subject: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid

Physical cpuid is used for interrupt routing for irqchips such as
ipi/msi/extioi interrupt controller. And physical cpuid is stored
at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
is created and physical cpuid of two vcpus cannot be the same.

Different irqchips have different size declaration about physical cpuid,
max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
for MSI irqchip.

The smallest value from all interrupt controllers is selected now,
and the max cpuid size is defines as 256 by KVM which comes from
extioi irqchip.

Signed-off-by: Bibo Mao <[email protected]>
---
arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
arch/loongarch/include/asm/kvm_vcpu.h | 1 +
arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
arch/loongarch/kvm/vm.c | 11 ++++
4 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
index 2d62f7b0d377..3ba16ef1fe69 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -64,6 +64,30 @@ struct kvm_world_switch {

#define MAX_PGTABLE_LEVELS 4

+/*
+ * Physical cpu id is used for interrupt routing, there are different
+ * definitions about physical cpuid on different hardwares.
+ * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
+ * For IPI HW, max dest CPUID size 1024
+ * For extioi interrupt controller, max dest CPUID size is 256
+ * For MSI interrupt controller, max supported CPUID size is 65536
+ *
+ * Currently max CPUID is defined as 256 for KVM hypervisor, in future
+ * it will be expanded to 4096, including 16 packages at most. And every
+ * package supports at most 256 vcpus
+ */
+#define KVM_MAX_PHYID 256
+
+struct kvm_phyid_info {
+ struct kvm_vcpu *vcpu;
+ bool enabled;
+};
+
+struct kvm_phyid_map {
+ int max_phyid;
+ struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
+};
+
struct kvm_arch {
/* Guest physical mm */
kvm_pte_t *pgd;
@@ -71,6 +95,8 @@ struct kvm_arch {
unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
unsigned int root_level;
+ spinlock_t phyid_map_lock;
+ struct kvm_phyid_map *phyid_map;

s64 time_offset;
struct kvm_context __percpu *vmcs;
diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
index 0cb4fdb8a9b5..9f53950959da 100644
--- a/arch/loongarch/include/asm/kvm_vcpu.h
+++ b/arch/loongarch/include/asm/kvm_vcpu.h
@@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
void kvm_restore_timer(struct kvm_vcpu *vcpu);

int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
+struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);

/*
* Loongarch KVM guest interrupt handling
diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index 3a8779065f73..b633fd28b8db 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
return 0;
}

+static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
+{
+ int cpuid;
+ struct loongarch_csrs *csr = vcpu->arch.csr;
+ struct kvm_phyid_map *map;
+
+ if (val >= KVM_MAX_PHYID)
+ return -EINVAL;
+
+ cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
+ map = vcpu->kvm->arch.phyid_map;
+ spin_lock(&vcpu->kvm->arch.phyid_map_lock);
+ if (map->phys_map[cpuid].enabled) {
+ /*
+ * Cpuid is already set before
+ * Forbid changing different cpuid at runtime
+ */
+ if (cpuid != val) {
+ /*
+ * Cpuid 0 is initial value for vcpu, maybe invalid
+ * unset value for vcpu
+ */
+ if (cpuid) {
+ spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+ return -EINVAL;
+ }
+ } else {
+ /* Discard duplicated cpuid set */
+ spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+ return 0;
+ }
+ }
+
+ if (map->phys_map[val].enabled) {
+ /*
+ * New cpuid is already set with other vcpu
+ * Forbid sharing the same cpuid between different vcpus
+ */
+ if (map->phys_map[val].vcpu != vcpu) {
+ spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+ return -EINVAL;
+ }
+
+ /* Discard duplicated cpuid set operation*/
+ spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+ return 0;
+ }
+
+ kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
+ map->phys_map[val].enabled = true;
+ map->phys_map[val].vcpu = vcpu;
+ if (map->max_phyid < val)
+ map->max_phyid = val;
+ spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+ return 0;
+}
+
+struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
+{
+ struct kvm_phyid_map *map;
+
+ if (cpuid >= KVM_MAX_PHYID)
+ return NULL;
+
+ map = kvm->arch.phyid_map;
+ if (map->phys_map[cpuid].enabled)
+ return map->phys_map[cpuid].vcpu;
+
+ return NULL;
+}
+
+static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
+{
+ int cpuid;
+ struct loongarch_csrs *csr = vcpu->arch.csr;
+ struct kvm_phyid_map *map;
+
+ map = vcpu->kvm->arch.phyid_map;
+ cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
+ if (cpuid >= KVM_MAX_PHYID)
+ return;
+
+ if (map->phys_map[cpuid].enabled) {
+ map->phys_map[cpuid].vcpu = NULL;
+ map->phys_map[cpuid].enabled = false;
+ kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
+ }
+}
+
static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
{
int ret = 0, gintc;
@@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);

return ret;
- }
+ } else if (id == LOONGARCH_CSR_CPUID)
+ return kvm_set_cpuid(vcpu, val);

kvm_write_sw_gcsr(csr, id, val);

@@ -943,6 +1033,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
hrtimer_cancel(&vcpu->arch.swtimer);
kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
kfree(vcpu->arch.csr);
+ kvm_drop_cpuid(vcpu);

/*
* If the vCPU is freed and reused as another vCPU, we don't want the
diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
index 0a37f6fa8f2d..6006a28653ad 100644
--- a/arch/loongarch/kvm/vm.c
+++ b/arch/loongarch/kvm/vm.c
@@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (!kvm->arch.pgd)
return -ENOMEM;

+ kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
+ GFP_KERNEL_ACCOUNT);
+ if (!kvm->arch.phyid_map) {
+ free_page((unsigned long)kvm->arch.pgd);
+ kvm->arch.pgd = NULL;
+ return -ENOMEM;
+ }
+
kvm_init_vmcs(kvm);
kvm->arch.gpa_size = BIT(cpu_vabits - 1);
kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
@@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
for (i = 0; i <= kvm->arch.root_level; i++)
kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);

+ spin_lock_init(&kvm->arch.phyid_map_lock);
return 0;
}

@@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
{
kvm_destroy_vcpus(kvm);
free_page((unsigned long)kvm->arch.pgd);
+ kvfree(kvm->arch.phyid_map);
kvm->arch.pgd = NULL;
+ kvm->arch.phyid_map = NULL;
}

int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
--
2.39.3


2024-04-28 10:06:22

by Bibo Mao

[permalink] [raw]
Subject: [PATCH v8 3/6] LoongArch: KVM: Add cpucfg area for kvm hypervisor

Instruction cpucfg can be used to get processor features. And there
is trap exception when it is executed in VM mode, and also it is
to provide cpu features to VM. On real hardware cpucfg area 0 - 20
is used. Here one specified area 0x40000000 -- 0x400000ff is used
for KVM hypervisor to privide PV features, and the area can be extended
for other hypervisors in future. This area will never be used for
real HW, it is only used by software.

Signed-off-by: Bibo Mao <[email protected]>
---
arch/loongarch/include/asm/inst.h | 1 +
arch/loongarch/include/asm/loongarch.h | 10 +++++
arch/loongarch/kvm/exit.c | 53 ++++++++++++++++++--------
3 files changed, 48 insertions(+), 16 deletions(-)

diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
index d8f637f9e400..ad120f924905 100644
--- a/arch/loongarch/include/asm/inst.h
+++ b/arch/loongarch/include/asm/inst.h
@@ -67,6 +67,7 @@ enum reg2_op {
revhd_op = 0x11,
extwh_op = 0x16,
extwb_op = 0x17,
+ cpucfg_op = 0x1b,
iocsrrdb_op = 0x19200,
iocsrrdh_op = 0x19201,
iocsrrdw_op = 0x19202,
diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
index 46366e783c84..a1d22e8b6f94 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -158,6 +158,16 @@
#define CPUCFG48_VFPU_CG BIT(2)
#define CPUCFG48_RAM_CG BIT(3)

+/*
+ * cpucfg index area: 0x40000000 -- 0x400000ff
+ * SW emulation for KVM hypervirsor
+ */
+#define CPUCFG_KVM_BASE 0x40000000UL
+#define CPUCFG_KVM_SIZE 0x100
+#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
+#define KVM_SIGNATURE "KVM\0"
+#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
+
#ifndef __ASSEMBLY__

/* CSR */
diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
index 923bbca9bd22..552a2fedbe44 100644
--- a/arch/loongarch/kvm/exit.c
+++ b/arch/loongarch/kvm/exit.c
@@ -206,10 +206,44 @@ int kvm_emu_idle(struct kvm_vcpu *vcpu)
return EMULATE_DONE;
}

-static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
+static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
{
int rd, rj;
unsigned int index;
+ unsigned long plv;
+
+ rd = inst.reg2_format.rd;
+ rj = inst.reg2_format.rj;
+ ++vcpu->stat.cpucfg_exits;
+ index = vcpu->arch.gprs[rj];
+
+ /*
+ * By LoongArch Reference Manual 2.2.10.5
+ * Return value is 0 for undefined cpucfg index
+ *
+ * Disable preemption since hw gcsr is accessed
+ */
+ preempt_disable();
+ plv = kvm_read_hw_gcsr(LOONGARCH_CSR_CRMD) >> CSR_CRMD_PLV_SHIFT;
+ switch (index) {
+ case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
+ vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
+ break;
+ case CPUCFG_KVM_SIG:
+ /* Cpucfg emulation between 0x40000000 -- 0x400000ff */
+ vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
+ break;
+ default:
+ vcpu->arch.gprs[rd] = 0;
+ break;
+ }
+
+ preempt_enable();
+ return EMULATE_DONE;
+}
+
+static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
+{
unsigned long curr_pc;
larch_inst inst;
enum emulation_result er = EMULATE_DONE;
@@ -224,21 +258,8 @@ static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
er = EMULATE_FAIL;
switch (((inst.word >> 24) & 0xff)) {
case 0x0: /* CPUCFG GSPR */
- if (inst.reg2_format.opcode == 0x1B) {
- rd = inst.reg2_format.rd;
- rj = inst.reg2_format.rj;
- ++vcpu->stat.cpucfg_exits;
- index = vcpu->arch.gprs[rj];
- er = EMULATE_DONE;
- /*
- * By LoongArch Reference Manual 2.2.10.5
- * return value is 0 for undefined cpucfg index
- */
- if (index < KVM_MAX_CPUCFG_REGS)
- vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
- else
- vcpu->arch.gprs[rd] = 0;
- }
+ if (inst.reg2_format.opcode == cpucfg_op)
+ er = kvm_emu_cpucfg(vcpu, inst);
break;
case 0x4: /* CSR{RD,WR,XCHG} GSPR */
er = kvm_handle_csr(vcpu, inst);
--
2.39.3


2024-04-28 10:06:47

by Bibo Mao

[permalink] [raw]
Subject: [PATCH v8 2/6] LoongArch: KVM: Add hypercall instruction emulation support

On LoongArch system, there is hypercall instruction special for
virtualization. When system executes this instruction on host side,
there is illegal instruction exception reported, however it will
trap into host when it is executed in VM mode.

When hypercall is emulated, A0 register is set with value
KVM_HCALL_INVALID_CODE, rather than inject EXCCODE_INE invalid
instruction exception. So VM can continue to executing the next code.

Signed-off-by: Bibo Mao <[email protected]>
---
arch/loongarch/include/asm/Kbuild | 1 -
arch/loongarch/include/asm/kvm_para.h | 26 ++++++++++++++++++++++++++
arch/loongarch/include/uapi/asm/Kbuild | 2 --
arch/loongarch/kvm/exit.c | 10 ++++++++++
4 files changed, 36 insertions(+), 3 deletions(-)
create mode 100644 arch/loongarch/include/asm/kvm_para.h
delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild

diff --git a/arch/loongarch/include/asm/Kbuild b/arch/loongarch/include/asm/Kbuild
index 2dbec7853ae8..c862672ed953 100644
--- a/arch/loongarch/include/asm/Kbuild
+++ b/arch/loongarch/include/asm/Kbuild
@@ -26,4 +26,3 @@ generic-y += poll.h
generic-y += param.h
generic-y += posix_types.h
generic-y += resource.h
-generic-y += kvm_para.h
diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h
new file mode 100644
index 000000000000..d48f993ae206
--- /dev/null
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_KVM_PARA_H
+#define _ASM_LOONGARCH_KVM_PARA_H
+
+/*
+ * LoongArch hypercall return code
+ */
+#define KVM_HCALL_STATUS_SUCCESS 0
+#define KVM_HCALL_INVALID_CODE -1UL
+#define KVM_HCALL_INVALID_PARAMETER -2UL
+
+static inline unsigned int kvm_arch_para_features(void)
+{
+ return 0;
+}
+
+static inline unsigned int kvm_arch_para_hints(void)
+{
+ return 0;
+}
+
+static inline bool kvm_check_and_clear_guest_paused(void)
+{
+ return false;
+}
+#endif /* _ASM_LOONGARCH_KVM_PARA_H */
diff --git a/arch/loongarch/include/uapi/asm/Kbuild b/arch/loongarch/include/uapi/asm/Kbuild
deleted file mode 100644
index 4aa680ca2e5f..000000000000
--- a/arch/loongarch/include/uapi/asm/Kbuild
+++ /dev/null
@@ -1,2 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-generic-y += kvm_para.h
diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
index ed1d89d53e2e..923bbca9bd22 100644
--- a/arch/loongarch/kvm/exit.c
+++ b/arch/loongarch/kvm/exit.c
@@ -685,6 +685,15 @@ static int kvm_handle_lasx_disabled(struct kvm_vcpu *vcpu)
return RESUME_GUEST;
}

+static int kvm_handle_hypercall(struct kvm_vcpu *vcpu)
+{
+ update_pc(&vcpu->arch);
+
+ /* Treat it as noop intruction, only set return value */
+ vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HCALL_INVALID_CODE;
+ return RESUME_GUEST;
+}
+
/*
* LoongArch KVM callback handling for unimplemented guest exiting
*/
@@ -716,6 +725,7 @@ static exit_handle_fn kvm_fault_tables[EXCCODE_INT_START] = {
[EXCCODE_LSXDIS] = kvm_handle_lsx_disabled,
[EXCCODE_LASXDIS] = kvm_handle_lasx_disabled,
[EXCCODE_GSPR] = kvm_handle_gspr,
+ [EXCCODE_HVC] = kvm_handle_hypercall,
};

int kvm_handle_fault(struct kvm_vcpu *vcpu, int fault)
--
2.39.3


2024-04-28 10:07:13

by Bibo Mao

[permalink] [raw]
Subject: [PATCH v8 5/6] LoongArch: KVM: Add pv ipi support on kvm side

On LoongArch system, ipi hw uses iocsr registers, there is one iocsr
register access on ipi sending, and two iocsr access on ipi receiving
which is ipi interrupt handler. On VM mode all iocsr accessing will
cause VM to trap into hypervisor. So with one ipi hw notification
there will be three times of trap.

PV ipi is added for VM, hypercall instruction is used for ipi sender,
and hypervisor will inject SWI to destination vcpu. During SWI
interrupt handler, only estat CSR register is written to clear irq.
Estat CSR register access will not trap into hypervisor. So with pv
ipi supported, there is one trap with pv ipi sender, and no trap with
ipi receiver, there is only one trap with ipi notification.

Also this patch adds ipi multicast support, the method is similar with
x86. With ipi multicast support, ipi notification can be sent to at
most 128 vcpus at one time. It reduces trap times into hypervisor
greatly.

Signed-off-by: Bibo Mao <[email protected]>
---
arch/loongarch/include/asm/kvm_host.h | 1 +
arch/loongarch/include/asm/kvm_para.h | 129 +++++++++++++++++++++++++
arch/loongarch/include/asm/kvm_vcpu.h | 10 ++
arch/loongarch/include/asm/loongarch.h | 1 +
arch/loongarch/kvm/exit.c | 73 +++++++++++++-
arch/loongarch/kvm/vcpu.c | 1 +
6 files changed, 213 insertions(+), 2 deletions(-)

diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
index 3ba16ef1fe69..0b96c6303cf7 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -43,6 +43,7 @@ struct kvm_vcpu_stat {
u64 idle_exits;
u64 cpucfg_exits;
u64 signal_exits;
+ u64 hypercall_exits;
};

#define KVM_MEM_HUGEPAGE_CAPABLE (1UL << 0)
diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h
index d48f993ae206..a5809a854bae 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,16 @@
#ifndef _ASM_LOONGARCH_KVM_PARA_H
#define _ASM_LOONGARCH_KVM_PARA_H

+/*
+ * Hypercall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT 8
+#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code)
+#define KVM_HCALL_CODE_PV_SERVICE 0
+#define KVM_HCALL_PV_SERVICE HYPERCALL_CODE(HYPERVISOR_KVM, KVM_HCALL_CODE_PV_SERVICE)
+#define KVM_HCALL_FUNC_PV_IPI 1
+
/*
* LoongArch hypercall return code
*/
@@ -9,6 +19,125 @@
#define KVM_HCALL_INVALID_CODE -1UL
#define KVM_HCALL_INVALID_PARAMETER -2UL

+/*
+ * Hypercall interface for KVM hypervisor
+ *
+ * a0: function identifier
+ * a1-a6: args
+ * Return value will be placed in a0.
+ * Up to 6 arguments are passed in a1, a2, a3, a4, a5, a6.
+ */
+static __always_inline long kvm_hypercall(u64 fid)
+{
+ register long ret asm("a0");
+ register unsigned long fun asm("a0") = fid;
+
+ __asm__ __volatile__(
+ "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
+ : "=r" (ret)
+ : "r" (fun)
+ : "memory"
+ );
+
+ return ret;
+}
+
+static __always_inline long kvm_hypercall1(u64 fid, unsigned long arg0)
+{
+ register long ret asm("a0");
+ register unsigned long fun asm("a0") = fid;
+ register unsigned long a1 asm("a1") = arg0;
+
+ __asm__ __volatile__(
+ "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
+ : "=r" (ret)
+ : "r" (fun), "r" (a1)
+ : "memory"
+ );
+
+ return ret;
+}
+
+static __always_inline long kvm_hypercall2(u64 fid,
+ unsigned long arg0, unsigned long arg1)
+{
+ register long ret asm("a0");
+ register unsigned long fun asm("a0") = fid;
+ register unsigned long a1 asm("a1") = arg0;
+ register unsigned long a2 asm("a2") = arg1;
+
+ __asm__ __volatile__(
+ "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
+ : "=r" (ret)
+ : "r" (fun), "r" (a1), "r" (a2)
+ : "memory"
+ );
+
+ return ret;
+}
+
+static __always_inline long kvm_hypercall3(u64 fid,
+ unsigned long arg0, unsigned long arg1, unsigned long arg2)
+{
+ register long ret asm("a0");
+ register unsigned long fun asm("a0") = fid;
+ register unsigned long a1 asm("a1") = arg0;
+ register unsigned long a2 asm("a2") = arg1;
+ register unsigned long a3 asm("a3") = arg2;
+
+ __asm__ __volatile__(
+ "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
+ : "=r" (ret)
+ : "r" (fun), "r" (a1), "r" (a2), "r" (a3)
+ : "memory"
+ );
+
+ return ret;
+}
+
+static __always_inline long kvm_hypercall4(u64 fid,
+ unsigned long arg0, unsigned long arg1, unsigned long arg2,
+ unsigned long arg3)
+{
+ register long ret asm("a0");
+ register unsigned long fun asm("a0") = fid;
+ register unsigned long a1 asm("a1") = arg0;
+ register unsigned long a2 asm("a2") = arg1;
+ register unsigned long a3 asm("a3") = arg2;
+ register unsigned long a4 asm("a4") = arg3;
+
+ __asm__ __volatile__(
+ "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
+ : "=r" (ret)
+ : "r"(fun), "r" (a1), "r" (a2), "r" (a3), "r" (a4)
+ : "memory"
+ );
+
+ return ret;
+}
+
+static __always_inline long kvm_hypercall5(u64 fid,
+ unsigned long arg0, unsigned long arg1, unsigned long arg2,
+ unsigned long arg3, unsigned long arg4)
+{
+ register long ret asm("a0");
+ register unsigned long fun asm("a0") = fid;
+ register unsigned long a1 asm("a1") = arg0;
+ register unsigned long a2 asm("a2") = arg1;
+ register unsigned long a3 asm("a3") = arg2;
+ register unsigned long a4 asm("a4") = arg3;
+ register unsigned long a5 asm("a5") = arg4;
+
+ __asm__ __volatile__(
+ "hvcl "__stringify(KVM_HCALL_PV_SERVICE)
+ : "=r" (ret)
+ : "r"(fun), "r" (a1), "r" (a2), "r" (a3), "r" (a4), "r" (a5)
+ : "memory"
+ );
+
+ return ret;
+}
+
static inline unsigned int kvm_arch_para_features(void)
{
return 0;
diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
index 9f53950959da..de6b17262d8e 100644
--- a/arch/loongarch/include/asm/kvm_vcpu.h
+++ b/arch/loongarch/include/asm/kvm_vcpu.h
@@ -110,4 +110,14 @@ static inline int kvm_queue_exception(struct kvm_vcpu *vcpu,
return -1;
}

+static inline unsigned long kvm_read_reg(struct kvm_vcpu *vcpu, int num)
+{
+ return vcpu->arch.gprs[num];
+}
+
+static inline void kvm_write_reg(struct kvm_vcpu *vcpu, int num,
+ unsigned long val)
+{
+ vcpu->arch.gprs[num] = val;
+}
#endif /* __ASM_LOONGARCH_KVM_VCPU_H__ */
diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
index a1d22e8b6f94..0ad36704cb4b 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -167,6 +167,7 @@
#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
#define KVM_SIGNATURE "KVM\0"
#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
+#define KVM_FEATURE_PV_IPI BIT(1)

#ifndef __ASSEMBLY__

diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
index 552a2fedbe44..faa9e1ba1a6a 100644
--- a/arch/loongarch/kvm/exit.c
+++ b/arch/loongarch/kvm/exit.c
@@ -233,6 +233,9 @@ static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
/* Cpucfg emulation between 0x40000000 -- 0x400000ff */
vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
break;
+ case CPUCFG_KVM_FEATURE:
+ vcpu->arch.gprs[rd] = KVM_FEATURE_PV_IPI;
+ break;
default:
vcpu->arch.gprs[rd] = 0;
break;
@@ -706,12 +709,78 @@ static int kvm_handle_lasx_disabled(struct kvm_vcpu *vcpu)
return RESUME_GUEST;
}

+static int kvm_pv_send_ipi(struct kvm_vcpu *vcpu)
+{
+ unsigned long ipi_bitmap;
+ unsigned int min, cpu, i;
+ struct kvm_vcpu *dest;
+
+ min = kvm_read_reg(vcpu, LOONGARCH_GPR_A3);
+ for (i = 0; i < 2; i++, min += BITS_PER_LONG) {
+ ipi_bitmap = kvm_read_reg(vcpu, LOONGARCH_GPR_A1 + i);
+ if (!ipi_bitmap)
+ continue;
+
+ cpu = find_first_bit((void *)&ipi_bitmap, BITS_PER_LONG);
+ while (cpu < BITS_PER_LONG) {
+ dest = kvm_get_vcpu_by_cpuid(vcpu->kvm, cpu + min);
+ cpu = find_next_bit((void *)&ipi_bitmap, BITS_PER_LONG,
+ cpu + 1);
+ if (!dest)
+ continue;
+
+ /*
+ * Send SWI0 to dest vcpu to emulate IPI interrupt
+ */
+ kvm_queue_irq(dest, INT_SWI0);
+ kvm_vcpu_kick(dest);
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * hypercall emulation always return to guest, Caller should check retval.
+ */
+static void kvm_handle_pv_service(struct kvm_vcpu *vcpu)
+{
+ unsigned long func = kvm_read_reg(vcpu, LOONGARCH_GPR_A0);
+ long ret;
+
+ switch (func) {
+ case KVM_HCALL_FUNC_PV_IPI:
+ kvm_pv_send_ipi(vcpu);
+ ret = KVM_HCALL_STATUS_SUCCESS;
+ break;
+ default:
+ ret = KVM_HCALL_INVALID_CODE;
+ break;
+ };
+
+ kvm_write_reg(vcpu, LOONGARCH_GPR_A0, ret);
+}
+
static int kvm_handle_hypercall(struct kvm_vcpu *vcpu)
{
+ larch_inst inst;
+ unsigned int code;
+
+ inst.word = vcpu->arch.badi;
+ code = inst.reg0i15_format.immediate;
update_pc(&vcpu->arch);

- /* Treat it as noop intruction, only set return value */
- vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HCALL_INVALID_CODE;
+ switch (code) {
+ case KVM_HCALL_PV_SERVICE:
+ vcpu->stat.hypercall_exits++;
+ kvm_handle_pv_service(vcpu);
+ break;
+ default:
+ /* Treat it as noop intruction, only set return value */
+ kvm_write_reg(vcpu, LOONGARCH_GPR_A0, KVM_HCALL_INVALID_CODE);
+ break;
+ }
+
return RESUME_GUEST;
}

diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index b633fd28b8db..76f2086ab68b 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -19,6 +19,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
STATS_DESC_COUNTER(VCPU, idle_exits),
STATS_DESC_COUNTER(VCPU, cpucfg_exits),
STATS_DESC_COUNTER(VCPU, signal_exits),
+ STATS_DESC_COUNTER(VCPU, hypercall_exits)
};

const struct kvm_stats_header kvm_vcpu_stats_header = {
--
2.39.3


2024-04-28 10:07:22

by Bibo Mao

[permalink] [raw]
Subject: [PATCH v8 1/6] LoongArch/smp: Refine some ipi functions on LoongArch platform

It is code refine about ipi handling on LoongArch platform, there are
three modifications.
1. Add generic function get_percpu_irq(), replacing some percpu irq
functions such as get_ipi_irq()/get_pmc_irq()/get_timer_irq() with
get_percpu_irq().

2. Change definition about parameter action called by function
loongson_send_ipi_single() and loongson_send_ipi_mask(), and it is
defined as decimal encoding format at ipi sender side. Normal decimal
encoding is used rather than binary bitmap encoding for ipi action, ipi
hw sender uses decimal encoding code, and ipi receiver will get binary
bitmap encoding, the ipi hw will convert it into bitmap in ipi message
buffer.

3. Add structure smp_ops on LoongArch platform so that pv ipi can be used
later.

Signed-off-by: Bibo Mao <[email protected]>
---
arch/loongarch/include/asm/hardirq.h | 4 ++
arch/loongarch/include/asm/irq.h | 10 ++++-
arch/loongarch/include/asm/smp.h | 31 +++++++--------
arch/loongarch/kernel/irq.c | 22 +----------
arch/loongarch/kernel/perf_event.c | 14 +------
arch/loongarch/kernel/smp.c | 58 +++++++++++++++++++---------
arch/loongarch/kernel/time.c | 12 +-----
7 files changed, 71 insertions(+), 80 deletions(-)

diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h
index 0ef3b18f8980..9f0038e19c7f 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -12,6 +12,10 @@
extern void ack_bad_irq(unsigned int irq);
#define ack_bad_irq ack_bad_irq

+enum ipi_msg_type {
+ IPI_RESCHEDULE,
+ IPI_CALL_FUNCTION,
+};
#define NR_IPI 2

typedef struct {
diff --git a/arch/loongarch/include/asm/irq.h b/arch/loongarch/include/asm/irq.h
index 218b4da0ea90..00101b6d601e 100644
--- a/arch/loongarch/include/asm/irq.h
+++ b/arch/loongarch/include/asm/irq.h
@@ -117,8 +117,16 @@ extern struct fwnode_handle *liointc_handle;
extern struct fwnode_handle *pch_lpc_handle;
extern struct fwnode_handle *pch_pic_handle[MAX_IO_PICS];

-extern irqreturn_t loongson_ipi_interrupt(int irq, void *dev);
+static inline int get_percpu_irq(int vector)
+{
+ struct irq_domain *d;
+
+ d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
+ if (d)
+ return irq_create_mapping(d, vector);

+ return -EINVAL;
+}
#include <asm-generic/irq.h>

#endif /* _ASM_IRQ_H */
diff --git a/arch/loongarch/include/asm/smp.h b/arch/loongarch/include/asm/smp.h
index f81e5f01d619..75d30529748c 100644
--- a/arch/loongarch/include/asm/smp.h
+++ b/arch/loongarch/include/asm/smp.h
@@ -12,6 +12,13 @@
#include <linux/threads.h>
#include <linux/cpumask.h>

+struct smp_ops {
+ void (*init_ipi)(void);
+ void (*send_ipi_mask)(const struct cpumask *mask, unsigned int action);
+ void (*send_ipi_single)(int cpu, unsigned int action);
+};
+
+extern struct smp_ops smp_ops;
extern int smp_num_siblings;
extern int num_processors;
extern int disabled_cpus;
@@ -24,8 +31,6 @@ void loongson_prepare_cpus(unsigned int max_cpus);
void loongson_boot_secondary(int cpu, struct task_struct *idle);
void loongson_init_secondary(void);
void loongson_smp_finish(void);
-void loongson_send_ipi_single(int cpu, unsigned int action);
-void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action);
#ifdef CONFIG_HOTPLUG_CPU
int loongson_cpu_disable(void);
void loongson_cpu_die(unsigned int cpu);
@@ -59,9 +64,12 @@ extern int __cpu_logical_map[NR_CPUS];

#define cpu_physical_id(cpu) cpu_logical_map(cpu)

-#define SMP_BOOT_CPU 0x1
-#define SMP_RESCHEDULE 0x2
-#define SMP_CALL_FUNCTION 0x4
+#define ACTION_BOOT_CPU 0
+#define ACTION_RESCHEDULE 1
+#define ACTION_CALL_FUNCTION 2
+#define SMP_BOOT_CPU BIT(ACTION_BOOT_CPU)
+#define SMP_RESCHEDULE BIT(ACTION_RESCHEDULE)
+#define SMP_CALL_FUNCTION BIT(ACTION_CALL_FUNCTION)

struct secondary_data {
unsigned long stack;
@@ -71,7 +79,8 @@ extern struct secondary_data cpuboot_data;

extern asmlinkage void smpboot_entry(void);
extern asmlinkage void start_secondary(void);
-
+extern void arch_send_call_function_single_ipi(int cpu);
+extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
extern void calculate_cpu_foreign_map(void);

/*
@@ -79,16 +88,6 @@ extern void calculate_cpu_foreign_map(void);
*/
extern void show_ipi_list(struct seq_file *p, int prec);

-static inline void arch_send_call_function_single_ipi(int cpu)
-{
- loongson_send_ipi_single(cpu, SMP_CALL_FUNCTION);
-}
-
-static inline void arch_send_call_function_ipi_mask(const struct cpumask *mask)
-{
- loongson_send_ipi_mask(mask, SMP_CALL_FUNCTION);
-}
-
#ifdef CONFIG_HOTPLUG_CPU
static inline int __cpu_disable(void)
{
diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index 883e5066ae44..ce36897d1e5a 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -87,23 +87,9 @@ static void __init init_vec_parent_group(void)
acpi_table_parse(ACPI_SIG_MCFG, early_pci_mcfg_parse);
}

-static int __init get_ipi_irq(void)
-{
- struct irq_domain *d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
-
- if (d)
- return irq_create_mapping(d, INT_IPI);
-
- return -EINVAL;
-}
-
void __init init_IRQ(void)
{
int i;
-#ifdef CONFIG_SMP
- int r, ipi_irq;
- static int ipi_dummy_dev;
-#endif
unsigned int order = get_order(IRQ_STACK_SIZE);
struct page *page;

@@ -113,13 +99,7 @@ void __init init_IRQ(void)
init_vec_parent_group();
irqchip_init();
#ifdef CONFIG_SMP
- ipi_irq = get_ipi_irq();
- if (ipi_irq < 0)
- panic("IPI IRQ mapping failed\n");
- irq_set_percpu_devid(ipi_irq);
- r = request_percpu_irq(ipi_irq, loongson_ipi_interrupt, "IPI", &ipi_dummy_dev);
- if (r < 0)
- panic("IPI IRQ request failed\n");
+ smp_ops.init_ipi();
#endif

for (i = 0; i < NR_IRQS; i++)
diff --git a/arch/loongarch/kernel/perf_event.c b/arch/loongarch/kernel/perf_event.c
index cac7cba81b65..f86a4b838dd7 100644
--- a/arch/loongarch/kernel/perf_event.c
+++ b/arch/loongarch/kernel/perf_event.c
@@ -456,16 +456,6 @@ static void loongarch_pmu_disable(struct pmu *pmu)
static DEFINE_MUTEX(pmu_reserve_mutex);
static atomic_t active_events = ATOMIC_INIT(0);

-static int get_pmc_irq(void)
-{
- struct irq_domain *d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
-
- if (d)
- return irq_create_mapping(d, INT_PCOV);
-
- return -EINVAL;
-}
-
static void reset_counters(void *arg);
static int __hw_perf_event_init(struct perf_event *event);

@@ -473,7 +463,7 @@ static void hw_perf_event_destroy(struct perf_event *event)
{
if (atomic_dec_and_mutex_lock(&active_events, &pmu_reserve_mutex)) {
on_each_cpu(reset_counters, NULL, 1);
- free_irq(get_pmc_irq(), &loongarch_pmu);
+ free_irq(get_percpu_irq(INT_PCOV), &loongarch_pmu);
mutex_unlock(&pmu_reserve_mutex);
}
}
@@ -562,7 +552,7 @@ static int loongarch_pmu_event_init(struct perf_event *event)
if (event->cpu >= 0 && !cpu_online(event->cpu))
return -ENODEV;

- irq = get_pmc_irq();
+ irq = get_percpu_irq(INT_PCOV);
flags = IRQF_PERCPU | IRQF_NOBALANCING | IRQF_NO_THREAD | IRQF_NO_SUSPEND | IRQF_SHARED;
if (!atomic_inc_not_zero(&active_events)) {
mutex_lock(&pmu_reserve_mutex);
diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
index aabee0b280fe..1fce775be4f6 100644
--- a/arch/loongarch/kernel/smp.c
+++ b/arch/loongarch/kernel/smp.c
@@ -66,11 +66,6 @@ static cpumask_t cpu_core_setup_map;
struct secondary_data cpuboot_data;
static DEFINE_PER_CPU(int, cpu_state);

-enum ipi_msg_type {
- IPI_RESCHEDULE,
- IPI_CALL_FUNCTION,
-};
-
static const char *ipi_types[NR_IPI] __tracepoint_string = {
[IPI_RESCHEDULE] = "Rescheduling interrupts",
[IPI_CALL_FUNCTION] = "Function call interrupts",
@@ -190,24 +185,19 @@ static u32 ipi_read_clear(int cpu)

static void ipi_write_action(int cpu, u32 action)
{
- unsigned int irq = 0;
-
- while ((irq = ffs(action))) {
- uint32_t val = IOCSR_IPI_SEND_BLOCKING;
+ uint32_t val;

- val |= (irq - 1);
- val |= (cpu << IOCSR_IPI_SEND_CPU_SHIFT);
- iocsr_write32(val, LOONGARCH_IOCSR_IPI_SEND);
- action &= ~BIT(irq - 1);
- }
+ val = IOCSR_IPI_SEND_BLOCKING | action;
+ val |= (cpu << IOCSR_IPI_SEND_CPU_SHIFT);
+ iocsr_write32(val, LOONGARCH_IOCSR_IPI_SEND);
}

-void loongson_send_ipi_single(int cpu, unsigned int action)
+static void loongson_send_ipi_single(int cpu, unsigned int action)
{
ipi_write_action(cpu_logical_map(cpu), (u32)action);
}

-void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action)
+static void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action)
{
unsigned int i;

@@ -215,6 +205,16 @@ void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action)
ipi_write_action(cpu_logical_map(i), (u32)action);
}

+void arch_send_call_function_single_ipi(int cpu)
+{
+ smp_ops.send_ipi_single(cpu, ACTION_CALL_FUNCTION);
+}
+
+void arch_send_call_function_ipi_mask(const struct cpumask *mask)
+{
+ smp_ops.send_ipi_mask(mask, ACTION_CALL_FUNCTION);
+}
+
/*
* This function sends a 'reschedule' IPI to another CPU.
* it goes straight through and wastes no time serializing
@@ -222,11 +222,11 @@ void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action)
*/
void arch_smp_send_reschedule(int cpu)
{
- loongson_send_ipi_single(cpu, SMP_RESCHEDULE);
+ smp_ops.send_ipi_single(cpu, ACTION_RESCHEDULE);
}
EXPORT_SYMBOL_GPL(arch_smp_send_reschedule);

-irqreturn_t loongson_ipi_interrupt(int irq, void *dev)
+static irqreturn_t loongson_ipi_interrupt(int irq, void *dev)
{
unsigned int action;
unsigned int cpu = smp_processor_id();
@@ -246,6 +246,26 @@ irqreturn_t loongson_ipi_interrupt(int irq, void *dev)
return IRQ_HANDLED;
}

+static void loongson_init_ipi(void)
+{
+ int r, ipi_irq;
+
+ ipi_irq = get_percpu_irq(INT_IPI);
+ if (ipi_irq < 0)
+ panic("IPI IRQ mapping failed\n");
+
+ irq_set_percpu_devid(ipi_irq);
+ r = request_percpu_irq(ipi_irq, loongson_ipi_interrupt, "IPI", &irq_stat);
+ if (r < 0)
+ panic("IPI IRQ request failed\n");
+}
+
+struct smp_ops smp_ops = {
+ .init_ipi = loongson_init_ipi,
+ .send_ipi_single = loongson_send_ipi_single,
+ .send_ipi_mask = loongson_send_ipi_mask,
+};
+
static void __init fdt_smp_setup(void)
{
#ifdef CONFIG_OF
@@ -323,7 +343,7 @@ void loongson_boot_secondary(int cpu, struct task_struct *idle)

csr_mail_send(entry, cpu_logical_map(cpu), 0);

- loongson_send_ipi_single(cpu, SMP_BOOT_CPU);
+ loongson_send_ipi_single(cpu, ACTION_BOOT_CPU);
}

/*
diff --git a/arch/loongarch/kernel/time.c b/arch/loongarch/kernel/time.c
index e7015f7b70e3..fd5354f9be7c 100644
--- a/arch/loongarch/kernel/time.c
+++ b/arch/loongarch/kernel/time.c
@@ -123,16 +123,6 @@ void sync_counter(void)
csr_write64(init_offset, LOONGARCH_CSR_CNTC);
}

-static int get_timer_irq(void)
-{
- struct irq_domain *d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
-
- if (d)
- return irq_create_mapping(d, INT_TI);
-
- return -EINVAL;
-}
-
int constant_clockevent_init(void)
{
unsigned int cpu = smp_processor_id();
@@ -142,7 +132,7 @@ int constant_clockevent_init(void)
static int irq = 0, timer_irq_installed = 0;

if (!timer_irq_installed) {
- irq = get_timer_irq();
+ irq = get_percpu_irq(INT_TI);
if (irq < 0)
pr_err("Failed to map irq %d (timer)\n", irq);
}
--
2.39.3


2024-04-28 10:07:38

by Bibo Mao

[permalink] [raw]
Subject: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side

PARAVIRT option and pv ipi is added on guest kernel side, function
pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
firstly checks whether system runs on VM mode. If kernel runs on VM mode,
it will call function kvm_para_available() to detect current hypervirsor
type. Now only KVM type detection is supported, the paravirt function can
work only if current hypervisor type is KVM, since there is only KVM
supported on LoongArch now.

PV IPI uses virtual IPI sender and virtual IPI receiver function. With
virutal IPI sender, ipi message is stored in DDR memory rather than
emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
at the same time like X86 KVM method. Hypercall method is used for IPI
sending.

With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
acknowledge. And IPI message is stored in DDR, no trap in get IPI message.

Signed-off-by: Bibo Mao <[email protected]>
---
arch/loongarch/Kconfig | 9 ++
arch/loongarch/include/asm/hardirq.h | 1 +
arch/loongarch/include/asm/paravirt.h | 27 ++++
.../include/asm/paravirt_api_clock.h | 1 +
arch/loongarch/kernel/Makefile | 1 +
arch/loongarch/kernel/irq.c | 2 +-
arch/loongarch/kernel/paravirt.c | 151 ++++++++++++++++++
arch/loongarch/kernel/smp.c | 4 +-
8 files changed, 194 insertions(+), 2 deletions(-)
create mode 100644 arch/loongarch/include/asm/paravirt.h
create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 54ad04dacdee..0a1540a8853e 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
bool
default y

+config PARAVIRT
+ bool "Enable paravirtualization code"
+ depends on AS_HAS_LVZ_EXTENSION
+ help
+ This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization. However, when run without a hypervisor
+ the kernel is theoretically slower and slightly larger.
+
config ARCH_SUPPORTS_KEXEC
def_bool y

diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..b26d596a73aa 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
typedef struct {
unsigned int ipi_irqs[NR_IPI];
unsigned int __softirq_pending;
+ atomic_t message ____cacheline_aligned_in_smp;
} ____cacheline_aligned irq_cpustat_t;

DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index 000000000000..58f7b7b89f2c
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include <linux/static_call_types.h>
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+ return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_ipi_init(void);
+#else
+static inline int pv_ipi_init(void)
+{
+ return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index 000000000000..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include <asm/paravirt.h>
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3a7620b66bc6..c9bfeda89e40 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o

obj-$(CONFIG_PROC_FS) += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o

obj-$(CONFIG_SMP) += smp.o

diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index ce36897d1e5a..4863e6c1b739 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -113,5 +113,5 @@ void __init init_IRQ(void)
per_cpu(irq_stack, i), per_cpu(irq_stack, i) + IRQ_STACK_SIZE);
}

- set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
+ set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
}
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index 000000000000..9044ed62045c
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/export.h>
+#include <linux/types.h>
+#include <linux/interrupt.h>
+#include <linux/jump_label.h>
+#include <linux/kvm_para.h>
+#include <asm/paravirt.h>
+#include <linux/static_call.h>
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+ return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+
+#ifdef CONFIG_SMP
+static void pv_send_ipi_single(int cpu, unsigned int action)
+{
+ unsigned int min, old;
+ irq_cpustat_t *info = &per_cpu(irq_stat, cpu);
+
+ old = atomic_fetch_or(BIT(action), &info->message);
+ if (old)
+ return;
+
+ min = cpu_logical_map(cpu);
+ kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI, 1, 0, min);
+}
+
+#define KVM_IPI_CLUSTER_SIZE (2 * BITS_PER_LONG)
+static void pv_send_ipi_mask(const struct cpumask *mask, unsigned int action)
+{
+ unsigned int cpu, i, min = 0, max = 0, old;
+ __uint128_t bitmap = 0;
+ irq_cpustat_t *info;
+
+ if (cpumask_empty(mask))
+ return;
+
+ action = BIT(action);
+ for_each_cpu(i, mask) {
+ info = &per_cpu(irq_stat, i);
+ old = atomic_fetch_or(action, &info->message);
+ if (old)
+ continue;
+
+ cpu = cpu_logical_map(i);
+ if (!bitmap) {
+ min = max = cpu;
+ } else if (cpu > min && cpu < min + KVM_IPI_CLUSTER_SIZE) {
+ max = cpu > max ? cpu : max;
+ } else if (cpu < min && (max - cpu) < KVM_IPI_CLUSTER_SIZE) {
+ bitmap <<= min - cpu;
+ min = cpu;
+ } else {
+ /*
+ * Physical cpuid is sorted in ascending order ascend
+ * for the next mask calculation, send IPI here
+ * directly and skip the remainding cpus
+ */
+ kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI,
+ (unsigned long)bitmap,
+ (unsigned long)(bitmap >> BITS_PER_LONG), min);
+ min = max = cpu;
+ bitmap = 0;
+ }
+ __set_bit(cpu - min, (unsigned long *)&bitmap);
+ }
+
+ if (bitmap)
+ kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI, (unsigned long)bitmap,
+ (unsigned long)(bitmap >> BITS_PER_LONG), min);
+}
+
+static irqreturn_t loongson_do_swi(int irq, void *dev)
+{
+ irq_cpustat_t *info;
+ long action;
+
+ /* Clear swi interrupt */
+ clear_csr_estat(1 << INT_SWI0);
+ info = this_cpu_ptr(&irq_stat);
+ action = atomic_xchg(&info->message, 0);
+ if (action & SMP_CALL_FUNCTION) {
+ generic_smp_call_function_interrupt();
+ info->ipi_irqs[IPI_CALL_FUNCTION]++;
+ }
+
+ if (action & SMP_RESCHEDULE) {
+ scheduler_ipi();
+ info->ipi_irqs[IPI_RESCHEDULE]++;
+ }
+
+ return IRQ_HANDLED;
+}
+
+static void pv_init_ipi(void)
+{
+ int r, swi0;
+
+ swi0 = get_percpu_irq(INT_SWI0);
+ if (swi0 < 0)
+ panic("SWI0 IRQ mapping failed\n");
+ irq_set_percpu_devid(swi0);
+ r = request_percpu_irq(swi0, loongson_do_swi, "SWI0", &irq_stat);
+ if (r < 0)
+ panic("SWI0 IRQ request failed\n");
+}
+#endif
+
+static bool kvm_para_available(void)
+{
+ static int hypervisor_type;
+ int config;
+
+ if (!hypervisor_type) {
+ config = read_cpucfg(CPUCFG_KVM_SIG);
+ if (!memcmp(&config, KVM_SIGNATURE, 4))
+ hypervisor_type = HYPERVISOR_KVM;
+ }
+
+ return hypervisor_type == HYPERVISOR_KVM;
+}
+
+int __init pv_ipi_init(void)
+{
+ int feature;
+
+ if (!cpu_has_hypervisor)
+ return 0;
+ if (!kvm_para_available())
+ return 0;
+
+ /*
+ * check whether KVM hypervisor supports pv_ipi or not
+ */
+ feature = read_cpucfg(CPUCFG_KVM_FEATURE);
+#ifdef CONFIG_SMP
+ if (feature & KVM_FEATURE_PV_IPI) {
+ smp_ops.init_ipi = pv_init_ipi;
+ smp_ops.send_ipi_single = pv_send_ipi_single;
+ smp_ops.send_ipi_mask = pv_send_ipi_mask;
+ }
+#endif
+
+ return 1;
+}
diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
index 1fce775be4f6..9eff7aa4c552 100644
--- a/arch/loongarch/kernel/smp.c
+++ b/arch/loongarch/kernel/smp.c
@@ -29,6 +29,7 @@
#include <asm/loongson.h>
#include <asm/mmu_context.h>
#include <asm/numa.h>
+#include <asm/paravirt.h>
#include <asm/processor.h>
#include <asm/setup.h>
#include <asm/time.h>
@@ -309,6 +310,7 @@ void __init loongson_smp_setup(void)
cpu_data[0].core = cpu_logical_map(0) % loongson_sysconf.cores_per_package;
cpu_data[0].package = cpu_logical_map(0) / loongson_sysconf.cores_per_package;

+ pv_ipi_init();
iocsr_write32(0xffffffff, LOONGARCH_IOCSR_IPI_EN);
pr_info("Detected %i available CPU(s)\n", loongson_sysconf.nr_cpus);
}
@@ -352,7 +354,7 @@ void loongson_boot_secondary(int cpu, struct task_struct *idle)
void loongson_init_secondary(void)
{
unsigned int cpu = smp_processor_id();
- unsigned int imask = ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 |
+ unsigned int imask = ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 |
ECFGF_IPI | ECFGF_PMC | ECFGF_TIMER;

change_csr_ecfg(ECFG0_IM, imask);
--
2.39.3


2024-04-28 16:38:51

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v8 3/6] LoongArch: KVM: Add cpucfg area for kvm hypervisor

Hi Bibo,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 5eb4573ea63d0c83bf58fb7c243fc2c2b6966c02]

url: https://github.com/intel-lab-lkp/linux/commits/Bibo-Mao/LoongArch-smp-Refine-some-ipi-functions-on-LoongArch-platform/20240428-180850
base: 5eb4573ea63d0c83bf58fb7c243fc2c2b6966c02
patch link: https://lore.kernel.org/r/20240428100518.1642324-4-maobibo%40loongson.cn
patch subject: [PATCH v8 3/6] LoongArch: KVM: Add cpucfg area for kvm hypervisor
config: loongarch-defconfig (https://download.01.org/0day-ci/archive/20240429/[email protected]/config)
compiler: loongarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240429/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

arch/loongarch/kvm/exit.c: In function 'kvm_emu_cpucfg':
>> arch/loongarch/kvm/exit.c:213:23: warning: variable 'plv' set but not used [-Wunused-but-set-variable]
213 | unsigned long plv;
| ^~~


vim +/plv +213 arch/loongarch/kvm/exit.c

208
209 static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
210 {
211 int rd, rj;
212 unsigned int index;
> 213 unsigned long plv;
214
215 rd = inst.reg2_format.rd;
216 rj = inst.reg2_format.rj;
217 ++vcpu->stat.cpucfg_exits;
218 index = vcpu->arch.gprs[rj];
219
220 /*
221 * By LoongArch Reference Manual 2.2.10.5
222 * Return value is 0 for undefined cpucfg index
223 *
224 * Disable preemption since hw gcsr is accessed
225 */
226 preempt_disable();
227 plv = kvm_read_hw_gcsr(LOONGARCH_CSR_CRMD) >> CSR_CRMD_PLV_SHIFT;
228 switch (index) {
229 case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
230 vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
231 break;
232 case CPUCFG_KVM_SIG:
233 /* Cpucfg emulation between 0x40000000 -- 0x400000ff */
234 vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
235 break;
236 default:
237 vcpu->arch.gprs[rd] = 0;
238 break;
239 }
240
241 preempt_enable();
242 return EMULATE_DONE;
243 }
244

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2024-05-06 01:45:38

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v8 0/6] LoongArch: Add pv ipi support on LoongArch VM

Hi, Bibo,

I have done an off-list discussion with some KVM experts, and they
think user-space have its right to know PV features, so cpucfg
solution is acceptable.

And I applied this series with some modifications at
https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/log/?h=loongarch-kvm
You can test it now. But it seems the upstream qemu cannot enable PV IPI now.

I will reply to other patches about my modifications.

Huacai

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
>
> On physical machine, ipi HW uses IOCSR registers, however there is trap
> into hypervisor when vcpu accesses IOCSR registers if system is in VM
> mode. SWI is a interrupt mechanism like SGI on ARM, software can send
> interrupt to CPU, only that on LoongArch SWI can only be sent to local CPU
> now. So SWI can not used for IPI on real HW system, however it can be used
> on VM when combined with hypercall method. IPI can be sent with hypercall
> method and SWI interrupt is injected to vcpu, vcpu can treat SWI
> interrupt as IPI.
>
> With PV IPI supported, there is one trap with IPI sending, however with IPI
> receiving there is no trap. with IOCSR HW ipi method, there will be one
> trap with IPI sending and two trap with ipi receiving.
>
> Also IPI multicast support is added for VM, the idea comes from x86 PV ipi.
> IPI can be sent to 128 vcpus in one time. With IPI multicast support, trap
> will be reduced greatly.
>
> Here is the microbenchmarck data with "perf bench futex wake" testcase on
> 3C5000 single-way machine, there are 16 cpus on 3C5000 single-way machine,
> VM has 16 vcpus also. The benchmark data is ms time unit to wakeup 16
> threads, the performance is better if data is smaller.
>
> physical machine 0.0176 ms
> VM original 0.1140 ms
> VM with pv ipi patch 0.0481 ms
>
> It passes to boot with 128/256 vcpus, and passes to run runltp command
> with package ltp-20230516.
>
> ---
> v7 --- v8:
> 1. Remove kernel PLV mode checking with cpucfg emulation for hypervisor
> feature inquiry.
> 2. Remove document about loongarch hypercall ABI per request of huacai,
> will add English/Chinese doc at the same time in later.
>
> v6 --- v7:
> 1. Refine LoongArch virt document by review comments.
> 2. Add function kvm_read_reg()/kvm_write_reg() in hypercall emulation,
> and later it can be used for other trap emulations.
>
> v5 --- v6:
> 1. Add privilege checking when emulating cpucfg at index 0x4000000 --
> 0x400000FF, return 0 if not executed at kernel mode.
> 2. Add document about LoongArch pv ipi with new creatly directory
> Documentation/virt/kvm/loongarch/
> 3. Fix pv ipi handling in kvm backend function kvm_pv_send_ipi(),
> where min should plus BITS_PER_LONG with second bitmap, otherwise
> VM with more than 64 vpus fails to boot.
> 4. Adjust patch order and code refine with review comments.
>
> v4 --- v5:
> 1. Refresh function/macro name from review comments.
>
> v3 --- v4:
> 1. Modfiy pv ipi hook function name call_func_ipi() and
> call_func_single_ipi() with send_ipi_mask()/send_ipi_single(), since pv
> ipi is used for both remote function call and reschedule notification.
> 2. Refresh changelog.
>
> v2 --- v3:
> 1. Add 128 vcpu ipi multicast support like x86
> 2. Change cpucfg base address from 0x10000000 to 0x40000000, in order
> to avoid confliction with future hw usage
> 3. Adjust patch order in this patchset, move patch
> Refine-ipi-ops-on-LoongArch-platform to the first one.
>
> v1 --- v2:
> 1. Add hw cpuid map support since ipi routing uses hw cpuid
> 2. Refine changelog description
> 3. Add hypercall statistic support for vcpu
> 4. Set percpu pv ipi message buffer aligned with cacheline
> 5. Refine pv ipi send logic, do not send ipi message with if there is
> pending ipi message.
> ---
> Bibo Mao (6):
> LoongArch/smp: Refine some ipi functions on LoongArch platform
> LoongArch: KVM: Add hypercall instruction emulation support
> LoongArch: KVM: Add cpucfg area for kvm hypervisor
> LoongArch: KVM: Add vcpu search support from physical cpuid
> LoongArch: KVM: Add pv ipi support on kvm side
> LoongArch: Add pv ipi support on guest kernel side
>
> arch/loongarch/Kconfig | 9 +
> arch/loongarch/include/asm/Kbuild | 1 -
> arch/loongarch/include/asm/hardirq.h | 5 +
> arch/loongarch/include/asm/inst.h | 1 +
> arch/loongarch/include/asm/irq.h | 10 +-
> arch/loongarch/include/asm/kvm_host.h | 27 +++
> arch/loongarch/include/asm/kvm_para.h | 155 ++++++++++++++++++
> arch/loongarch/include/asm/kvm_vcpu.h | 11 ++
> arch/loongarch/include/asm/loongarch.h | 11 ++
> arch/loongarch/include/asm/paravirt.h | 27 +++
> .../include/asm/paravirt_api_clock.h | 1 +
> arch/loongarch/include/asm/smp.h | 31 ++--
> arch/loongarch/include/uapi/asm/Kbuild | 2 -
> arch/loongarch/kernel/Makefile | 1 +
> arch/loongarch/kernel/irq.c | 24 +--
> arch/loongarch/kernel/paravirt.c | 151 +++++++++++++++++
> arch/loongarch/kernel/perf_event.c | 14 +-
> arch/loongarch/kernel/smp.c | 62 ++++---
> arch/loongarch/kernel/time.c | 12 +-
> arch/loongarch/kvm/exit.c | 132 +++++++++++++--
> arch/loongarch/kvm/vcpu.c | 94 ++++++++++-
> arch/loongarch/kvm/vm.c | 11 ++
> 22 files changed, 690 insertions(+), 102 deletions(-)
> create mode 100644 arch/loongarch/include/asm/kvm_para.h
> create mode 100644 arch/loongarch/include/asm/paravirt.h
> create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
> delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild
> create mode 100644 arch/loongarch/kernel/paravirt.c
>
>
> base-commit: 5eb4573ea63d0c83bf58fb7c243fc2c2b6966c02
> --
> 2.39.3
>
>

2024-05-06 01:50:07

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid

Hi, Bibo,

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
>
> Physical cpuid is used for interrupt routing for irqchips such as
> ipi/msi/extioi interrupt controller. And physical cpuid is stored
> at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
> is created and physical cpuid of two vcpus cannot be the same.
>
> Different irqchips have different size declaration about physical cpuid,
> max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
> supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
> for MSI irqchip.
>
> The smallest value from all interrupt controllers is selected now,
> and the max cpuid size is defines as 256 by KVM which comes from
> extioi irqchip.
>
> Signed-off-by: Bibo Mao <[email protected]>
> ---
> arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
> arch/loongarch/include/asm/kvm_vcpu.h | 1 +
> arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
> arch/loongarch/kvm/vm.c | 11 ++++
> 4 files changed, 130 insertions(+), 1 deletion(-)
>
> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
> index 2d62f7b0d377..3ba16ef1fe69 100644
> --- a/arch/loongarch/include/asm/kvm_host.h
> +++ b/arch/loongarch/include/asm/kvm_host.h
> @@ -64,6 +64,30 @@ struct kvm_world_switch {
>
> #define MAX_PGTABLE_LEVELS 4
>
> +/*
> + * Physical cpu id is used for interrupt routing, there are different
> + * definitions about physical cpuid on different hardwares.
> + * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
> + * For IPI HW, max dest CPUID size 1024
> + * For extioi interrupt controller, max dest CPUID size is 256
> + * For MSI interrupt controller, max supported CPUID size is 65536
> + *
> + * Currently max CPUID is defined as 256 for KVM hypervisor, in future
> + * it will be expanded to 4096, including 16 packages at most. And every
> + * package supports at most 256 vcpus
> + */
> +#define KVM_MAX_PHYID 256
> +
> +struct kvm_phyid_info {
> + struct kvm_vcpu *vcpu;
> + bool enabled;
> +};
> +
> +struct kvm_phyid_map {
> + int max_phyid;
> + struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
> +};
> +
> struct kvm_arch {
> /* Guest physical mm */
> kvm_pte_t *pgd;
> @@ -71,6 +95,8 @@ struct kvm_arch {
> unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
> unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
> unsigned int root_level;
> + spinlock_t phyid_map_lock;
> + struct kvm_phyid_map *phyid_map;
>
> s64 time_offset;
> struct kvm_context __percpu *vmcs;
> diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
> index 0cb4fdb8a9b5..9f53950959da 100644
> --- a/arch/loongarch/include/asm/kvm_vcpu.h
> +++ b/arch/loongarch/include/asm/kvm_vcpu.h
> @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
> void kvm_restore_timer(struct kvm_vcpu *vcpu);
>
> int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
>
> /*
> * Loongarch KVM guest interrupt handling
> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
> index 3a8779065f73..b633fd28b8db 100644
> --- a/arch/loongarch/kvm/vcpu.c
> +++ b/arch/loongarch/kvm/vcpu.c
> @@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
> return 0;
> }
>
> +static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
> +{
> + int cpuid;
> + struct loongarch_csrs *csr = vcpu->arch.csr;
> + struct kvm_phyid_map *map;
> +
> + if (val >= KVM_MAX_PHYID)
> + return -EINVAL;
> +
> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
> + map = vcpu->kvm->arch.phyid_map;
> + spin_lock(&vcpu->kvm->arch.phyid_map_lock);
> + if (map->phys_map[cpuid].enabled) {
> + /*
> + * Cpuid is already set before
> + * Forbid changing different cpuid at runtime
> + */
> + if (cpuid != val) {
> + /*
> + * Cpuid 0 is initial value for vcpu, maybe invalid
> + * unset value for vcpu
> + */
> + if (cpuid) {
> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> + return -EINVAL;
> + }
> + } else {
> + /* Discard duplicated cpuid set */
> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> + return 0;
> + }
> + }
I have changed the logic and comments when I apply, you can double
check whether it is correct.

> +
> + if (map->phys_map[val].enabled) {
> + /*
> + * New cpuid is already set with other vcpu
> + * Forbid sharing the same cpuid between different vcpus
> + */
> + if (map->phys_map[val].vcpu != vcpu) {
> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> + return -EINVAL;
> + }
> +
> + /* Discard duplicated cpuid set operation*/
> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> + return 0;
> + }
> +
> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
> + map->phys_map[val].enabled = true;
> + map->phys_map[val].vcpu = vcpu;
> + if (map->max_phyid < val)
> + map->max_phyid = val;
> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> + return 0;
> +}
> +
> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
> +{
> + struct kvm_phyid_map *map;
> +
> + if (cpuid >= KVM_MAX_PHYID)
> + return NULL;
> +
> + map = kvm->arch.phyid_map;
> + if (map->phys_map[cpuid].enabled)
> + return map->phys_map[cpuid].vcpu;
> +
> + return NULL;
> +}
> +
> +static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
> +{
> + int cpuid;
> + struct loongarch_csrs *csr = vcpu->arch.csr;
> + struct kvm_phyid_map *map;
> +
> + map = vcpu->kvm->arch.phyid_map;
> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
> + if (cpuid >= KVM_MAX_PHYID)
> + return;
> +
> + if (map->phys_map[cpuid].enabled) {
> + map->phys_map[cpuid].vcpu = NULL;
> + map->phys_map[cpuid].enabled = false;
> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
> + }
> +}
While kvm_set_cpuid() is protected by a spinlock, do kvm_drop_cpuid()
and kvm_get_vcpu_by_cpuid() also need it?

> +
> static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
> {
> int ret = 0, gintc;
> @@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
> kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
>
> return ret;
> - }
> + } else if (id == LOONGARCH_CSR_CPUID)
> + return kvm_set_cpuid(vcpu, val);
>
> kvm_write_sw_gcsr(csr, id, val);
>
> @@ -943,6 +1033,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> hrtimer_cancel(&vcpu->arch.swtimer);
> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> kfree(vcpu->arch.csr);
> + kvm_drop_cpuid(vcpu);
I think this line should be before the above kfree(), otherwise you
get a "use after free".

Huacai

>
> /*
> * If the vCPU is freed and reused as another vCPU, we don't want the
> diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
> index 0a37f6fa8f2d..6006a28653ad 100644
> --- a/arch/loongarch/kvm/vm.c
> +++ b/arch/loongarch/kvm/vm.c
> @@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> if (!kvm->arch.pgd)
> return -ENOMEM;
>
> + kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
> + GFP_KERNEL_ACCOUNT);
> + if (!kvm->arch.phyid_map) {
> + free_page((unsigned long)kvm->arch.pgd);
> + kvm->arch.pgd = NULL;
> + return -ENOMEM;
> + }
> +
> kvm_init_vmcs(kvm);
> kvm->arch.gpa_size = BIT(cpu_vabits - 1);
> kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
> @@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> for (i = 0; i <= kvm->arch.root_level; i++)
> kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);
>
> + spin_lock_init(&kvm->arch.phyid_map_lock);
> return 0;
> }
>
> @@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> {
> kvm_destroy_vcpus(kvm);
> free_page((unsigned long)kvm->arch.pgd);
> + kvfree(kvm->arch.phyid_map);
> kvm->arch.pgd = NULL;
> + kvm->arch.phyid_map = NULL;
> }
>
> int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> --
> 2.39.3
>

2024-05-06 01:53:31

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side

Hi, Bibo,

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
>
> PARAVIRT option and pv ipi is added on guest kernel side, function
> pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
> firstly checks whether system runs on VM mode. If kernel runs on VM mode,
> it will call function kvm_para_available() to detect current hypervirsor
> type. Now only KVM type detection is supported, the paravirt function can
> work only if current hypervisor type is KVM, since there is only KVM
> supported on LoongArch now.
>
> PV IPI uses virtual IPI sender and virtual IPI receiver function. With
> virutal IPI sender, ipi message is stored in DDR memory rather than
> emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
> at the same time like X86 KVM method. Hypercall method is used for IPI
> sending.
>
> With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
> VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
> acknowledge. And IPI message is stored in DDR, no trap in get IPI message.
>
> Signed-off-by: Bibo Mao <[email protected]>
> ---
> arch/loongarch/Kconfig | 9 ++
> arch/loongarch/include/asm/hardirq.h | 1 +
> arch/loongarch/include/asm/paravirt.h | 27 ++++
> .../include/asm/paravirt_api_clock.h | 1 +
> arch/loongarch/kernel/Makefile | 1 +
> arch/loongarch/kernel/irq.c | 2 +-
> arch/loongarch/kernel/paravirt.c | 151 ++++++++++++++++++
> arch/loongarch/kernel/smp.c | 4 +-
> 8 files changed, 194 insertions(+), 2 deletions(-)
> create mode 100644 arch/loongarch/include/asm/paravirt.h
> create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
> create mode 100644 arch/loongarch/kernel/paravirt.c
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 54ad04dacdee..0a1540a8853e 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
> bool
> default y
>
> +config PARAVIRT
> + bool "Enable paravirtualization code"
> + depends on AS_HAS_LVZ_EXTENSION
> + help
> + This changes the kernel so it can modify itself when it is run
> + under a hypervisor, potentially improving performance significantly
> + over full virtualization. However, when run without a hypervisor
> + the kernel is theoretically slower and slightly larger.
> +
> config ARCH_SUPPORTS_KEXEC
> def_bool y
>
> diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h
> index 9f0038e19c7f..b26d596a73aa 100644
> --- a/arch/loongarch/include/asm/hardirq.h
> +++ b/arch/loongarch/include/asm/hardirq.h
> @@ -21,6 +21,7 @@ enum ipi_msg_type {
> typedef struct {
> unsigned int ipi_irqs[NR_IPI];
> unsigned int __softirq_pending;
> + atomic_t message ____cacheline_aligned_in_smp;
> } ____cacheline_aligned irq_cpustat_t;
>
> DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
> diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h
> new file mode 100644
> index 000000000000..58f7b7b89f2c
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> +#define _ASM_LOONGARCH_PARAVIRT_H
> +
> +#ifdef CONFIG_PARAVIRT
> +#include <linux/static_call_types.h>
> +struct static_key;
> +extern struct static_key paravirt_steal_enabled;
> +extern struct static_key paravirt_steal_rq_enabled;
> +
> +u64 dummy_steal_clock(int cpu);
> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> +
> +static inline u64 paravirt_steal_clock(int cpu)
> +{
> + return static_call(pv_steal_clock)(cpu);
> +}
> +
> +int pv_ipi_init(void);
> +#else
> +static inline int pv_ipi_init(void)
> +{
> + return 0;
> +}
> +
> +#endif // CONFIG_PARAVIRT
> +#endif
> diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h
> new file mode 100644
> index 000000000000..65ac7cee0dad
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt_api_clock.h
> @@ -0,0 +1 @@
> +#include <asm/paravirt.h>
> diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
> index 3a7620b66bc6..c9bfeda89e40 100644
> --- a/arch/loongarch/kernel/Makefile
> +++ b/arch/loongarch/kernel/Makefile
> @@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
> obj-$(CONFIG_STACKTRACE) += stacktrace.o
>
> obj-$(CONFIG_PROC_FS) += proc.o
> +obj-$(CONFIG_PARAVIRT) += paravirt.o
>
> obj-$(CONFIG_SMP) += smp.o
>
> diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
> index ce36897d1e5a..4863e6c1b739 100644
> --- a/arch/loongarch/kernel/irq.c
> +++ b/arch/loongarch/kernel/irq.c
> @@ -113,5 +113,5 @@ void __init init_IRQ(void)
> per_cpu(irq_stack, i), per_cpu(irq_stack, i) + IRQ_STACK_SIZE);
> }
>
> - set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
> + set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
> }
> diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
> new file mode 100644
> index 000000000000..9044ed62045c
> --- /dev/null
> +++ b/arch/loongarch/kernel/paravirt.c
> @@ -0,0 +1,151 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/export.h>
> +#include <linux/types.h>
> +#include <linux/interrupt.h>
> +#include <linux/jump_label.h>
> +#include <linux/kvm_para.h>
> +#include <asm/paravirt.h>
> +#include <linux/static_call.h>
> +
> +struct static_key paravirt_steal_enabled;
> +struct static_key paravirt_steal_rq_enabled;
> +
> +static u64 native_steal_clock(int cpu)
> +{
> + return 0;
> +}
> +
> +DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
> +
> +#ifdef CONFIG_SMP
> +static void pv_send_ipi_single(int cpu, unsigned int action)
> +{
> + unsigned int min, old;
> + irq_cpustat_t *info = &per_cpu(irq_stat, cpu);
> +
> + old = atomic_fetch_or(BIT(action), &info->message);
> + if (old)
> + return;
> +
> + min = cpu_logical_map(cpu);
> + kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI, 1, 0, min);
> +}
> +
> +#define KVM_IPI_CLUSTER_SIZE (2 * BITS_PER_LONG)
> +static void pv_send_ipi_mask(const struct cpumask *mask, unsigned int action)
> +{
> + unsigned int cpu, i, min = 0, max = 0, old;
> + __uint128_t bitmap = 0;
> + irq_cpustat_t *info;
> +
> + if (cpumask_empty(mask))
> + return;
> +
> + action = BIT(action);
> + for_each_cpu(i, mask) {
> + info = &per_cpu(irq_stat, i);
> + old = atomic_fetch_or(action, &info->message);
> + if (old)
> + continue;
> +
> + cpu = cpu_logical_map(i);
> + if (!bitmap) {
> + min = max = cpu;
> + } else if (cpu > min && cpu < min + KVM_IPI_CLUSTER_SIZE) {
> + max = cpu > max ? cpu : max;
> + } else if (cpu < min && (max - cpu) < KVM_IPI_CLUSTER_SIZE) {
> + bitmap <<= min - cpu;
> + min = cpu;
> + } else {
> + /*
> + * Physical cpuid is sorted in ascending order ascend
> + * for the next mask calculation, send IPI here
> + * directly and skip the remainding cpus
> + */
> + kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI,
> + (unsigned long)bitmap,
> + (unsigned long)(bitmap >> BITS_PER_LONG), min);
> + min = max = cpu;
> + bitmap = 0;
> + }
I have changed the logic and comments when I apply, you can double
check whether it is correct.

Huacai

> + __set_bit(cpu - min, (unsigned long *)&bitmap);
> + }
> +
> + if (bitmap)
> + kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI, (unsigned long)bitmap,
> + (unsigned long)(bitmap >> BITS_PER_LONG), min);
> +}
> +
> +static irqreturn_t loongson_do_swi(int irq, void *dev)
> +{
> + irq_cpustat_t *info;
> + long action;
> +
> + /* Clear swi interrupt */
> + clear_csr_estat(1 << INT_SWI0);
> + info = this_cpu_ptr(&irq_stat);
> + action = atomic_xchg(&info->message, 0);
> + if (action & SMP_CALL_FUNCTION) {
> + generic_smp_call_function_interrupt();
> + info->ipi_irqs[IPI_CALL_FUNCTION]++;
> + }
> +
> + if (action & SMP_RESCHEDULE) {
> + scheduler_ipi();
> + info->ipi_irqs[IPI_RESCHEDULE]++;
> + }
> +
> + return IRQ_HANDLED;
> +}
> +
> +static void pv_init_ipi(void)
> +{
> + int r, swi0;
> +
> + swi0 = get_percpu_irq(INT_SWI0);
> + if (swi0 < 0)
> + panic("SWI0 IRQ mapping failed\n");
> + irq_set_percpu_devid(swi0);
> + r = request_percpu_irq(swi0, loongson_do_swi, "SWI0", &irq_stat);
> + if (r < 0)
> + panic("SWI0 IRQ request failed\n");
> +}
> +#endif
> +
> +static bool kvm_para_available(void)
> +{
> + static int hypervisor_type;
> + int config;
> +
> + if (!hypervisor_type) {
> + config = read_cpucfg(CPUCFG_KVM_SIG);
> + if (!memcmp(&config, KVM_SIGNATURE, 4))
> + hypervisor_type = HYPERVISOR_KVM;
> + }
> +
> + return hypervisor_type == HYPERVISOR_KVM;
> +}
> +
> +int __init pv_ipi_init(void)
> +{
> + int feature;
> +
> + if (!cpu_has_hypervisor)
> + return 0;
> + if (!kvm_para_available())
> + return 0;
> +
> + /*
> + * check whether KVM hypervisor supports pv_ipi or not
> + */
> + feature = read_cpucfg(CPUCFG_KVM_FEATURE);
> +#ifdef CONFIG_SMP
> + if (feature & KVM_FEATURE_PV_IPI) {
> + smp_ops.init_ipi = pv_init_ipi;
> + smp_ops.send_ipi_single = pv_send_ipi_single;
> + smp_ops.send_ipi_mask = pv_send_ipi_mask;
> + }
> +#endif
> +
> + return 1;
> +}
> diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
> index 1fce775be4f6..9eff7aa4c552 100644
> --- a/arch/loongarch/kernel/smp.c
> +++ b/arch/loongarch/kernel/smp.c
> @@ -29,6 +29,7 @@
> #include <asm/loongson.h>
> #include <asm/mmu_context.h>
> #include <asm/numa.h>
> +#include <asm/paravirt.h>
> #include <asm/processor.h>
> #include <asm/setup.h>
> #include <asm/time.h>
> @@ -309,6 +310,7 @@ void __init loongson_smp_setup(void)
> cpu_data[0].core = cpu_logical_map(0) % loongson_sysconf.cores_per_package;
> cpu_data[0].package = cpu_logical_map(0) / loongson_sysconf.cores_per_package;
>
> + pv_ipi_init();
> iocsr_write32(0xffffffff, LOONGARCH_IOCSR_IPI_EN);
> pr_info("Detected %i available CPU(s)\n", loongson_sysconf.nr_cpus);
> }
> @@ -352,7 +354,7 @@ void loongson_boot_secondary(int cpu, struct task_struct *idle)
> void loongson_init_secondary(void)
> {
> unsigned int cpu = smp_processor_id();
> - unsigned int imask = ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 |
> + unsigned int imask = ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 |
> ECFGF_IPI | ECFGF_PMC | ECFGF_TIMER;
>
> change_csr_ecfg(ECFG0_IM, imask);
> --
> 2.39.3
>

2024-05-06 01:54:22

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v8 2/6] LoongArch: KVM: Add hypercall instruction emulation support

Hi, Bibo,

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
>
> On LoongArch system, there is hypercall instruction special for
> virtualization. When system executes this instruction on host side,
> there is illegal instruction exception reported, however it will
> trap into host when it is executed in VM mode.
>
> When hypercall is emulated, A0 register is set with value
> KVM_HCALL_INVALID_CODE, rather than inject EXCCODE_INE invalid
> instruction exception. So VM can continue to executing the next code.
>
> Signed-off-by: Bibo Mao <[email protected]>
> ---
> arch/loongarch/include/asm/Kbuild | 1 -
> arch/loongarch/include/asm/kvm_para.h | 26 ++++++++++++++++++++++++++
> arch/loongarch/include/uapi/asm/Kbuild | 2 --
> arch/loongarch/kvm/exit.c | 10 ++++++++++
> 4 files changed, 36 insertions(+), 3 deletions(-)
> create mode 100644 arch/loongarch/include/asm/kvm_para.h
> delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild
>
> diff --git a/arch/loongarch/include/asm/Kbuild b/arch/loongarch/include/asm/Kbuild
> index 2dbec7853ae8..c862672ed953 100644
> --- a/arch/loongarch/include/asm/Kbuild
> +++ b/arch/loongarch/include/asm/Kbuild
> @@ -26,4 +26,3 @@ generic-y += poll.h
> generic-y += param.h
> generic-y += posix_types.h
> generic-y += resource.h
> -generic-y += kvm_para.h
> diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h
> new file mode 100644
> index 000000000000..d48f993ae206
> --- /dev/null
> +++ b/arch/loongarch/include/asm/kvm_para.h
> @@ -0,0 +1,26 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_LOONGARCH_KVM_PARA_H
> +#define _ASM_LOONGARCH_KVM_PARA_H
> +
> +/*
> + * LoongArch hypercall return code
> + */
> +#define KVM_HCALL_STATUS_SUCCESS 0
> +#define KVM_HCALL_INVALID_CODE -1UL
> +#define KVM_HCALL_INVALID_PARAMETER -2UL
> +
> +static inline unsigned int kvm_arch_para_features(void)
> +{
> + return 0;
> +}
> +
> +static inline unsigned int kvm_arch_para_hints(void)
> +{
> + return 0;
> +}
> +
> +static inline bool kvm_check_and_clear_guest_paused(void)
> +{
> + return false;
> +}
> +#endif /* _ASM_LOONGARCH_KVM_PARA_H */
> diff --git a/arch/loongarch/include/uapi/asm/Kbuild b/arch/loongarch/include/uapi/asm/Kbuild
> deleted file mode 100644
> index 4aa680ca2e5f..000000000000
> --- a/arch/loongarch/include/uapi/asm/Kbuild
> +++ /dev/null
> @@ -1,2 +0,0 @@
> -# SPDX-License-Identifier: GPL-2.0
> -generic-y += kvm_para.h
This file shouldn't be removed.

Huacai

> diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
> index ed1d89d53e2e..923bbca9bd22 100644
> --- a/arch/loongarch/kvm/exit.c
> +++ b/arch/loongarch/kvm/exit.c
> @@ -685,6 +685,15 @@ static int kvm_handle_lasx_disabled(struct kvm_vcpu *vcpu)
> return RESUME_GUEST;
> }
>
> +static int kvm_handle_hypercall(struct kvm_vcpu *vcpu)
> +{
> + update_pc(&vcpu->arch);
> +
> + /* Treat it as noop intruction, only set return value */
> + vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HCALL_INVALID_CODE;
> + return RESUME_GUEST;
> +}
> +
> /*
> * LoongArch KVM callback handling for unimplemented guest exiting
> */
> @@ -716,6 +725,7 @@ static exit_handle_fn kvm_fault_tables[EXCCODE_INT_START] = {
> [EXCCODE_LSXDIS] = kvm_handle_lsx_disabled,
> [EXCCODE_LASXDIS] = kvm_handle_lasx_disabled,
> [EXCCODE_GSPR] = kvm_handle_gspr,
> + [EXCCODE_HVC] = kvm_handle_hypercall,
> };
>
> int kvm_handle_fault(struct kvm_vcpu *vcpu, int fault)
> --
> 2.39.3
>
>

2024-05-06 02:30:08

by Bibo Mao

[permalink] [raw]
Subject: Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid

Huacai,

Many thanks for reviewing pv ipi patchset.
And I reply inline.

On 2024/5/6 上午9:49, Huacai Chen wrote:
> Hi, Bibo,
>
> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
>>
>> Physical cpuid is used for interrupt routing for irqchips such as
>> ipi/msi/extioi interrupt controller. And physical cpuid is stored
>> at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
>> is created and physical cpuid of two vcpus cannot be the same.
>>
>> Different irqchips have different size declaration about physical cpuid,
>> max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
>> supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
>> for MSI irqchip.
>>
>> The smallest value from all interrupt controllers is selected now,
>> and the max cpuid size is defines as 256 by KVM which comes from
>> extioi irqchip.
>>
>> Signed-off-by: Bibo Mao <[email protected]>
>> ---
>> arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
>> arch/loongarch/include/asm/kvm_vcpu.h | 1 +
>> arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
>> arch/loongarch/kvm/vm.c | 11 ++++
>> 4 files changed, 130 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
>> index 2d62f7b0d377..3ba16ef1fe69 100644
>> --- a/arch/loongarch/include/asm/kvm_host.h
>> +++ b/arch/loongarch/include/asm/kvm_host.h
>> @@ -64,6 +64,30 @@ struct kvm_world_switch {
>>
>> #define MAX_PGTABLE_LEVELS 4
>>
>> +/*
>> + * Physical cpu id is used for interrupt routing, there are different
>> + * definitions about physical cpuid on different hardwares.
>> + * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
>> + * For IPI HW, max dest CPUID size 1024
>> + * For extioi interrupt controller, max dest CPUID size is 256
>> + * For MSI interrupt controller, max supported CPUID size is 65536
>> + *
>> + * Currently max CPUID is defined as 256 for KVM hypervisor, in future
>> + * it will be expanded to 4096, including 16 packages at most. And every
>> + * package supports at most 256 vcpus
>> + */
>> +#define KVM_MAX_PHYID 256
>> +
>> +struct kvm_phyid_info {
>> + struct kvm_vcpu *vcpu;
>> + bool enabled;
>> +};
>> +
>> +struct kvm_phyid_map {
>> + int max_phyid;
>> + struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
>> +};
>> +
>> struct kvm_arch {
>> /* Guest physical mm */
>> kvm_pte_t *pgd;
>> @@ -71,6 +95,8 @@ struct kvm_arch {
>> unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
>> unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
>> unsigned int root_level;
>> + spinlock_t phyid_map_lock;
>> + struct kvm_phyid_map *phyid_map;
>>
>> s64 time_offset;
>> struct kvm_context __percpu *vmcs;
>> diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
>> index 0cb4fdb8a9b5..9f53950959da 100644
>> --- a/arch/loongarch/include/asm/kvm_vcpu.h
>> +++ b/arch/loongarch/include/asm/kvm_vcpu.h
>> @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
>> void kvm_restore_timer(struct kvm_vcpu *vcpu);
>>
>> int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
>>
>> /*
>> * Loongarch KVM guest interrupt handling
>> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
>> index 3a8779065f73..b633fd28b8db 100644
>> --- a/arch/loongarch/kvm/vcpu.c
>> +++ b/arch/loongarch/kvm/vcpu.c
>> @@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
>> return 0;
>> }
>>
>> +static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
>> +{
>> + int cpuid;
>> + struct loongarch_csrs *csr = vcpu->arch.csr;
>> + struct kvm_phyid_map *map;
>> +
>> + if (val >= KVM_MAX_PHYID)
>> + return -EINVAL;
>> +
>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
>> + map = vcpu->kvm->arch.phyid_map;
>> + spin_lock(&vcpu->kvm->arch.phyid_map_lock);
>> + if (map->phys_map[cpuid].enabled) {
>> + /*
>> + * Cpuid is already set before
>> + * Forbid changing different cpuid at runtime
>> + */
>> + if (cpuid != val) {
>> + /*
>> + * Cpuid 0 is initial value for vcpu, maybe invalid
>> + * unset value for vcpu
>> + */
>> + if (cpuid) {
>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>> + return -EINVAL;
>> + }
>> + } else {
>> + /* Discard duplicated cpuid set */
>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>> + return 0;
>> + }
>> + }
> I have changed the logic and comments when I apply, you can double
> check whether it is correct.
Will do.

>
>> +
>> + if (map->phys_map[val].enabled) {
>> + /*
>> + * New cpuid is already set with other vcpu
>> + * Forbid sharing the same cpuid between different vcpus
>> + */
>> + if (map->phys_map[val].vcpu != vcpu) {
>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>> + return -EINVAL;
>> + }
>> +
>> + /* Discard duplicated cpuid set operation*/
>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>> + return 0;
>> + }
>> +
>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
>> + map->phys_map[val].enabled = true;
>> + map->phys_map[val].vcpu = vcpu;
>> + if (map->max_phyid < val)
>> + map->max_phyid = val;
>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>> + return 0;
>> +}
>> +
>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
>> +{
>> + struct kvm_phyid_map *map;
>> +
>> + if (cpuid >= KVM_MAX_PHYID)
>> + return NULL;
>> +
>> + map = kvm->arch.phyid_map;
>> + if (map->phys_map[cpuid].enabled)
>> + return map->phys_map[cpuid].vcpu;
>> +
>> + return NULL;
>> +}
>> +
>> +static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
>> +{
>> + int cpuid;
>> + struct loongarch_csrs *csr = vcpu->arch.csr;
>> + struct kvm_phyid_map *map;
>> +
>> + map = vcpu->kvm->arch.phyid_map;
>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
>> + if (cpuid >= KVM_MAX_PHYID)
>> + return;
>> +
>> + if (map->phys_map[cpuid].enabled) {
>> + map->phys_map[cpuid].vcpu = NULL;
>> + map->phys_map[cpuid].enabled = false;
>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
>> + }
>> +}
> While kvm_set_cpuid() is protected by a spinlock, do kvm_drop_cpuid()
> and kvm_get_vcpu_by_cpuid() also need it?
When VM is power-on, vcpu thread can run at the same time, so there is
spinlock for kvm_set_cpuid(). And kvm_drop_cpuid() is called when vcpu
is destroyed, such VM destroy or vcpu hot removed.

I think that it is impossible to send IPI to hot removed cpu, guest
kernel should assure this.

Need double check whether it is possible that cpu hot-add can be in
parallel with hot-removed. We can investigate and add this after
LoongArch cpu hotplug is supported.

>
>> +
>> static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
>> {
>> int ret = 0, gintc;
>> @@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
>> kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
>>
>> return ret;
>> - }
>> + } else if (id == LOONGARCH_CSR_CPUID)
>> + return kvm_set_cpuid(vcpu, val);
>>
>> kvm_write_sw_gcsr(csr, id, val);
>>
>> @@ -943,6 +1033,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>> hrtimer_cancel(&vcpu->arch.swtimer);
>> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
>> kfree(vcpu->arch.csr);
>> + kvm_drop_cpuid(vcpu);
> I think this line should be before the above kfree(), otherwise you
> get a "use after free".
yes, that is a problem. kvm_drop_cpuid() should be put before kfree.

Regards
Bibo Mao
>
> Huacai
>
>>
>> /*
>> * If the vCPU is freed and reused as another vCPU, we don't want the
>> diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
>> index 0a37f6fa8f2d..6006a28653ad 100644
>> --- a/arch/loongarch/kvm/vm.c
>> +++ b/arch/loongarch/kvm/vm.c
>> @@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>> if (!kvm->arch.pgd)
>> return -ENOMEM;
>>
>> + kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
>> + GFP_KERNEL_ACCOUNT);
>> + if (!kvm->arch.phyid_map) {
>> + free_page((unsigned long)kvm->arch.pgd);
>> + kvm->arch.pgd = NULL;
>> + return -ENOMEM;
>> + }
>> +
>> kvm_init_vmcs(kvm);
>> kvm->arch.gpa_size = BIT(cpu_vabits - 1);
>> kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
>> @@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>> for (i = 0; i <= kvm->arch.root_level; i++)
>> kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);
>>
>> + spin_lock_init(&kvm->arch.phyid_map_lock);
>> return 0;
>> }
>>
>> @@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>> {
>> kvm_destroy_vcpus(kvm);
>> free_page((unsigned long)kvm->arch.pgd);
>> + kvfree(kvm->arch.phyid_map);
>> kvm->arch.pgd = NULL;
>> + kvm->arch.phyid_map = NULL;
>> }
>>
>> int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> --
>> 2.39.3
>>


2024-05-06 02:41:32

by Bibo Mao

[permalink] [raw]
Subject: Re: [PATCH v8 2/6] LoongArch: KVM: Add hypercall instruction emulation support



On 2024/5/6 上午9:54, Huacai Chen wrote:
> Hi, Bibo,
>
> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
>>
>> On LoongArch system, there is hypercall instruction special for
>> virtualization. When system executes this instruction on host side,
>> there is illegal instruction exception reported, however it will
>> trap into host when it is executed in VM mode.
>>
>> When hypercall is emulated, A0 register is set with value
>> KVM_HCALL_INVALID_CODE, rather than inject EXCCODE_INE invalid
>> instruction exception. So VM can continue to executing the next code.
>>
>> Signed-off-by: Bibo Mao <[email protected]>
>> ---
>> arch/loongarch/include/asm/Kbuild | 1 -
>> arch/loongarch/include/asm/kvm_para.h | 26 ++++++++++++++++++++++++++
>> arch/loongarch/include/uapi/asm/Kbuild | 2 --
>> arch/loongarch/kvm/exit.c | 10 ++++++++++
>> 4 files changed, 36 insertions(+), 3 deletions(-)
>> create mode 100644 arch/loongarch/include/asm/kvm_para.h
>> delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild
>>
>> diff --git a/arch/loongarch/include/asm/Kbuild b/arch/loongarch/include/asm/Kbuild
>> index 2dbec7853ae8..c862672ed953 100644
>> --- a/arch/loongarch/include/asm/Kbuild
>> +++ b/arch/loongarch/include/asm/Kbuild
>> @@ -26,4 +26,3 @@ generic-y += poll.h
>> generic-y += param.h
>> generic-y += posix_types.h
>> generic-y += resource.h
>> -generic-y += kvm_para.h
>> diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h
>> new file mode 100644
>> index 000000000000..d48f993ae206
>> --- /dev/null
>> +++ b/arch/loongarch/include/asm/kvm_para.h
>> @@ -0,0 +1,26 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _ASM_LOONGARCH_KVM_PARA_H
>> +#define _ASM_LOONGARCH_KVM_PARA_H
>> +
>> +/*
>> + * LoongArch hypercall return code
>> + */
>> +#define KVM_HCALL_STATUS_SUCCESS 0
>> +#define KVM_HCALL_INVALID_CODE -1UL
>> +#define KVM_HCALL_INVALID_PARAMETER -2UL
>> +
>> +static inline unsigned int kvm_arch_para_features(void)
>> +{
>> + return 0;
>> +}
>> +
>> +static inline unsigned int kvm_arch_para_hints(void)
>> +{
>> + return 0;
>> +}
>> +
>> +static inline bool kvm_check_and_clear_guest_paused(void)
>> +{
>> + return false;
>> +}
>> +#endif /* _ASM_LOONGARCH_KVM_PARA_H */
>> diff --git a/arch/loongarch/include/uapi/asm/Kbuild b/arch/loongarch/include/uapi/asm/Kbuild
>> deleted file mode 100644
>> index 4aa680ca2e5f..000000000000
>> --- a/arch/loongarch/include/uapi/asm/Kbuild
>> +++ /dev/null
>> @@ -1,2 +0,0 @@
>> -# SPDX-License-Identifier: GPL-2.0
>> -generic-y += kvm_para.h
> This file shouldn't be removed.
yes, uapi kvm_param.h is needed for Loongarch, and there will be problem
if it is removed. And it should kept unchanged.

Regards
Bibo Mao
>
> Huacai
>
>> diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
>> index ed1d89d53e2e..923bbca9bd22 100644
>> --- a/arch/loongarch/kvm/exit.c
>> +++ b/arch/loongarch/kvm/exit.c
>> @@ -685,6 +685,15 @@ static int kvm_handle_lasx_disabled(struct kvm_vcpu *vcpu)
>> return RESUME_GUEST;
>> }
>>
>> +static int kvm_handle_hypercall(struct kvm_vcpu *vcpu)
>> +{
>> + update_pc(&vcpu->arch);
>> +
>> + /* Treat it as noop intruction, only set return value */
>> + vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HCALL_INVALID_CODE;
>> + return RESUME_GUEST;
>> +}
>> +
>> /*
>> * LoongArch KVM callback handling for unimplemented guest exiting
>> */
>> @@ -716,6 +725,7 @@ static exit_handle_fn kvm_fault_tables[EXCCODE_INT_START] = {
>> [EXCCODE_LSXDIS] = kvm_handle_lsx_disabled,
>> [EXCCODE_LASXDIS] = kvm_handle_lasx_disabled,
>> [EXCCODE_GSPR] = kvm_handle_gspr,
>> + [EXCCODE_HVC] = kvm_handle_hypercall,
>> };
>>
>> int kvm_handle_fault(struct kvm_vcpu *vcpu, int fault)
>> --
>> 2.39.3
>>
>>


2024-05-06 06:32:32

by Bibo Mao

[permalink] [raw]
Subject: Re: [PATCH v8 0/6] LoongArch: Add pv ipi support on LoongArch VM



On 2024/5/6 上午9:45, Huacai Chen wrote:
> Hi, Bibo,
>
> I have done an off-list discussion with some KVM experts, and they
> think user-space have its right to know PV features, so cpucfg
> solution is acceptable.
>
> And I applied this series with some modifications at
> https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/log/?h=loongarch-kvm
> You can test it now. But it seems the upstream qemu cannot enable PV IPI now.
VM with 128/256 vcpus boots with this series in loongarch-kvm branch.
And pv ipi works by information "cat /proc/interrupts". There need small
modification with qemu like this, and we
will submit the patch to qemu after it is merged.

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 441d764843..9f7556cd93 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -15,6 +15,8 @@
#include "sysemu/runstate.h"
#include "sysemu/reset.h"
#include "sysemu/rtc.h"
+#include "sysemu/tcg.h"
+#include "sysemu/kvm.h"
#include "hw/loongarch/virt.h"
#include "exec/address-spaces.h"
#include "hw/irq.h"
@@ -786,12 +788,18 @@ static void loongarch_qemu_write(void *opaque,
hwaddr addr,

static uint64_t loongarch_qemu_read(void *opaque, hwaddr addr,
unsigned size)
{
+ uint64_t ret = 0;
+
switch (addr) {
case VERSION_REG:
return 0x11ULL;
case FEATURE_REG:
- return 1ULL << IOCSRF_MSI | 1ULL << IOCSRF_EXTIOI |
+ ret =1ULL << IOCSRF_MSI | 1ULL << IOCSRF_EXTIOI |
1ULL << IOCSRF_CSRIPI;
+ if (kvm_enabled()) {
+ ret |= 1ULL << IOCSRF_VM;
+ }
+ return ret;
case VENDOR_REG:
return 0x6e6f73676e6f6f4cULL; /* "Loongson" */
case CPUNAME_REG:


Regards
Bibo Mao
>
> I will reply to other patches about my modifications.
>
> Huacai
>
> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
>>
>> On physical machine, ipi HW uses IOCSR registers, however there is trap
>> into hypervisor when vcpu accesses IOCSR registers if system is in VM
>> mode. SWI is a interrupt mechanism like SGI on ARM, software can send
>> interrupt to CPU, only that on LoongArch SWI can only be sent to local CPU
>> now. So SWI can not used for IPI on real HW system, however it can be used
>> on VM when combined with hypercall method. IPI can be sent with hypercall
>> method and SWI interrupt is injected to vcpu, vcpu can treat SWI
>> interrupt as IPI.
>>
>> With PV IPI supported, there is one trap with IPI sending, however with IPI
>> receiving there is no trap. with IOCSR HW ipi method, there will be one
>> trap with IPI sending and two trap with ipi receiving.
>>
>> Also IPI multicast support is added for VM, the idea comes from x86 PV ipi.
>> IPI can be sent to 128 vcpus in one time. With IPI multicast support, trap
>> will be reduced greatly.
>>
>> Here is the microbenchmarck data with "perf bench futex wake" testcase on
>> 3C5000 single-way machine, there are 16 cpus on 3C5000 single-way machine,
>> VM has 16 vcpus also. The benchmark data is ms time unit to wakeup 16
>> threads, the performance is better if data is smaller.
>>
>> physical machine 0.0176 ms
>> VM original 0.1140 ms
>> VM with pv ipi patch 0.0481 ms
>>
>> It passes to boot with 128/256 vcpus, and passes to run runltp command
>> with package ltp-20230516.
>>
>> ---
>> v7 --- v8:
>> 1. Remove kernel PLV mode checking with cpucfg emulation for hypervisor
>> feature inquiry.
>> 2. Remove document about loongarch hypercall ABI per request of huacai,
>> will add English/Chinese doc at the same time in later.
>>
>> v6 --- v7:
>> 1. Refine LoongArch virt document by review comments.
>> 2. Add function kvm_read_reg()/kvm_write_reg() in hypercall emulation,
>> and later it can be used for other trap emulations.
>>
>> v5 --- v6:
>> 1. Add privilege checking when emulating cpucfg at index 0x4000000 --
>> 0x400000FF, return 0 if not executed at kernel mode.
>> 2. Add document about LoongArch pv ipi with new creatly directory
>> Documentation/virt/kvm/loongarch/
>> 3. Fix pv ipi handling in kvm backend function kvm_pv_send_ipi(),
>> where min should plus BITS_PER_LONG with second bitmap, otherwise
>> VM with more than 64 vpus fails to boot.
>> 4. Adjust patch order and code refine with review comments.
>>
>> v4 --- v5:
>> 1. Refresh function/macro name from review comments.
>>
>> v3 --- v4:
>> 1. Modfiy pv ipi hook function name call_func_ipi() and
>> call_func_single_ipi() with send_ipi_mask()/send_ipi_single(), since pv
>> ipi is used for both remote function call and reschedule notification.
>> 2. Refresh changelog.
>>
>> v2 --- v3:
>> 1. Add 128 vcpu ipi multicast support like x86
>> 2. Change cpucfg base address from 0x10000000 to 0x40000000, in order
>> to avoid confliction with future hw usage
>> 3. Adjust patch order in this patchset, move patch
>> Refine-ipi-ops-on-LoongArch-platform to the first one.
>>
>> v1 --- v2:
>> 1. Add hw cpuid map support since ipi routing uses hw cpuid
>> 2. Refine changelog description
>> 3. Add hypercall statistic support for vcpu
>> 4. Set percpu pv ipi message buffer aligned with cacheline
>> 5. Refine pv ipi send logic, do not send ipi message with if there is
>> pending ipi message.
>> ---
>> Bibo Mao (6):
>> LoongArch/smp: Refine some ipi functions on LoongArch platform
>> LoongArch: KVM: Add hypercall instruction emulation support
>> LoongArch: KVM: Add cpucfg area for kvm hypervisor
>> LoongArch: KVM: Add vcpu search support from physical cpuid
>> LoongArch: KVM: Add pv ipi support on kvm side
>> LoongArch: Add pv ipi support on guest kernel side
>>
>> arch/loongarch/Kconfig | 9 +
>> arch/loongarch/include/asm/Kbuild | 1 -
>> arch/loongarch/include/asm/hardirq.h | 5 +
>> arch/loongarch/include/asm/inst.h | 1 +
>> arch/loongarch/include/asm/irq.h | 10 +-
>> arch/loongarch/include/asm/kvm_host.h | 27 +++
>> arch/loongarch/include/asm/kvm_para.h | 155 ++++++++++++++++++
>> arch/loongarch/include/asm/kvm_vcpu.h | 11 ++
>> arch/loongarch/include/asm/loongarch.h | 11 ++
>> arch/loongarch/include/asm/paravirt.h | 27 +++
>> .../include/asm/paravirt_api_clock.h | 1 +
>> arch/loongarch/include/asm/smp.h | 31 ++--
>> arch/loongarch/include/uapi/asm/Kbuild | 2 -
>> arch/loongarch/kernel/Makefile | 1 +
>> arch/loongarch/kernel/irq.c | 24 +--
>> arch/loongarch/kernel/paravirt.c | 151 +++++++++++++++++
>> arch/loongarch/kernel/perf_event.c | 14 +-
>> arch/loongarch/kernel/smp.c | 62 ++++---
>> arch/loongarch/kernel/time.c | 12 +-
>> arch/loongarch/kvm/exit.c | 132 +++++++++++++--
>> arch/loongarch/kvm/vcpu.c | 94 ++++++++++-
>> arch/loongarch/kvm/vm.c | 11 ++
>> 22 files changed, 690 insertions(+), 102 deletions(-)
>> create mode 100644 arch/loongarch/include/asm/kvm_para.h
>> create mode 100644 arch/loongarch/include/asm/paravirt.h
>> create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
>> delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild
>> create mode 100644 arch/loongarch/kernel/paravirt.c
>>
>>
>> base-commit: 5eb4573ea63d0c83bf58fb7c243fc2c2b6966c02
>> --
>> 2.39.3
>>
>>


2024-05-06 06:37:19

by Bibo Mao

[permalink] [raw]
Subject: Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid



On 2024/5/6 上午9:49, Huacai Chen wrote:
> Hi, Bibo,
>
> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
>>
>> Physical cpuid is used for interrupt routing for irqchips such as
>> ipi/msi/extioi interrupt controller. And physical cpuid is stored
>> at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
>> is created and physical cpuid of two vcpus cannot be the same.
>>
>> Different irqchips have different size declaration about physical cpuid,
>> max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
>> supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
>> for MSI irqchip.
>>
>> The smallest value from all interrupt controllers is selected now,
>> and the max cpuid size is defines as 256 by KVM which comes from
>> extioi irqchip.
>>
>> Signed-off-by: Bibo Mao <[email protected]>
>> ---
>> arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
>> arch/loongarch/include/asm/kvm_vcpu.h | 1 +
>> arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
>> arch/loongarch/kvm/vm.c | 11 ++++
>> 4 files changed, 130 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
>> index 2d62f7b0d377..3ba16ef1fe69 100644
>> --- a/arch/loongarch/include/asm/kvm_host.h
>> +++ b/arch/loongarch/include/asm/kvm_host.h
>> @@ -64,6 +64,30 @@ struct kvm_world_switch {
>>
>> #define MAX_PGTABLE_LEVELS 4
>>
>> +/*
>> + * Physical cpu id is used for interrupt routing, there are different
>> + * definitions about physical cpuid on different hardwares.
>> + * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
>> + * For IPI HW, max dest CPUID size 1024
>> + * For extioi interrupt controller, max dest CPUID size is 256
>> + * For MSI interrupt controller, max supported CPUID size is 65536
>> + *
>> + * Currently max CPUID is defined as 256 for KVM hypervisor, in future
>> + * it will be expanded to 4096, including 16 packages at most. And every
>> + * package supports at most 256 vcpus
>> + */
>> +#define KVM_MAX_PHYID 256
>> +
>> +struct kvm_phyid_info {
>> + struct kvm_vcpu *vcpu;
>> + bool enabled;
>> +};
>> +
>> +struct kvm_phyid_map {
>> + int max_phyid;
>> + struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
>> +};
>> +
>> struct kvm_arch {
>> /* Guest physical mm */
>> kvm_pte_t *pgd;
>> @@ -71,6 +95,8 @@ struct kvm_arch {
>> unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
>> unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
>> unsigned int root_level;
>> + spinlock_t phyid_map_lock;
>> + struct kvm_phyid_map *phyid_map;
>>
>> s64 time_offset;
>> struct kvm_context __percpu *vmcs;
>> diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
>> index 0cb4fdb8a9b5..9f53950959da 100644
>> --- a/arch/loongarch/include/asm/kvm_vcpu.h
>> +++ b/arch/loongarch/include/asm/kvm_vcpu.h
>> @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
>> void kvm_restore_timer(struct kvm_vcpu *vcpu);
>>
>> int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
>>
>> /*
>> * Loongarch KVM guest interrupt handling
>> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
>> index 3a8779065f73..b633fd28b8db 100644
>> --- a/arch/loongarch/kvm/vcpu.c
>> +++ b/arch/loongarch/kvm/vcpu.c
>> @@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
>> return 0;
>> }
>>
>> +static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
>> +{
>> + int cpuid;
>> + struct loongarch_csrs *csr = vcpu->arch.csr;
>> + struct kvm_phyid_map *map;
>> +
>> + if (val >= KVM_MAX_PHYID)
>> + return -EINVAL;
>> +
>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
>> + map = vcpu->kvm->arch.phyid_map;
>> + spin_lock(&vcpu->kvm->arch.phyid_map_lock);
>> + if (map->phys_map[cpuid].enabled) {
>> + /*
>> + * Cpuid is already set before
>> + * Forbid changing different cpuid at runtime
>> + */
>> + if (cpuid != val) {
>> + /*
>> + * Cpuid 0 is initial value for vcpu, maybe invalid
>> + * unset value for vcpu
>> + */
>> + if (cpuid) {
>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>> + return -EINVAL;
>> + }
>> + } else {
>> + /* Discard duplicated cpuid set */
>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>> + return 0;
>> + }
>> + }
> I have changed the logic and comments when I apply, you can double
> check whether it is correct.
I checkout the latest version, the modification in function
kvm_set_cpuid() is good for me.
>
>> +
>> + if (map->phys_map[val].enabled) {
>> + /*
>> + * New cpuid is already set with other vcpu
>> + * Forbid sharing the same cpuid between different vcpus
>> + */
>> + if (map->phys_map[val].vcpu != vcpu) {
>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>> + return -EINVAL;
>> + }
>> +
>> + /* Discard duplicated cpuid set operation*/
>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>> + return 0;
>> + }
>> +
>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
>> + map->phys_map[val].enabled = true;
>> + map->phys_map[val].vcpu = vcpu;
>> + if (map->max_phyid < val)
>> + map->max_phyid = val;
>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>> + return 0;
>> +}
>> +
>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
>> +{
>> + struct kvm_phyid_map *map;
>> +
>> + if (cpuid >= KVM_MAX_PHYID)
>> + return NULL;
>> +
>> + map = kvm->arch.phyid_map;
>> + if (map->phys_map[cpuid].enabled)
>> + return map->phys_map[cpuid].vcpu;
>> +
>> + return NULL;
>> +}
>> +
>> +static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
>> +{
>> + int cpuid;
>> + struct loongarch_csrs *csr = vcpu->arch.csr;
>> + struct kvm_phyid_map *map;
>> +
>> + map = vcpu->kvm->arch.phyid_map;
>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
>> + if (cpuid >= KVM_MAX_PHYID)
>> + return;
>> +
>> + if (map->phys_map[cpuid].enabled) {
>> + map->phys_map[cpuid].vcpu = NULL;
>> + map->phys_map[cpuid].enabled = false;
>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
>> + }
>> +}
> While kvm_set_cpuid() is protected by a spinlock, do kvm_drop_cpuid()
> and kvm_get_vcpu_by_cpuid() also need it?
>
It is good to me that spinlock is added in function kvm_drop_cpuid().
And thinks for the efforts.

Regards
Bibo Mao
>> +
>> static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
>> {
>> int ret = 0, gintc;
>> @@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
>> kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
>>
>> return ret;
>> - }
>> + } else if (id == LOONGARCH_CSR_CPUID)
>> + return kvm_set_cpuid(vcpu, val);
>>
>> kvm_write_sw_gcsr(csr, id, val);
>>
>> @@ -943,6 +1033,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>> hrtimer_cancel(&vcpu->arch.swtimer);
>> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
>> kfree(vcpu->arch.csr);
>> + kvm_drop_cpuid(vcpu);
> I think this line should be before the above kfree(), otherwise you
> get a "use after free".
>
> Huacai
>
>>
>> /*
>> * If the vCPU is freed and reused as another vCPU, we don't want the
>> diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
>> index 0a37f6fa8f2d..6006a28653ad 100644
>> --- a/arch/loongarch/kvm/vm.c
>> +++ b/arch/loongarch/kvm/vm.c
>> @@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>> if (!kvm->arch.pgd)
>> return -ENOMEM;
>>
>> + kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
>> + GFP_KERNEL_ACCOUNT);
>> + if (!kvm->arch.phyid_map) {
>> + free_page((unsigned long)kvm->arch.pgd);
>> + kvm->arch.pgd = NULL;
>> + return -ENOMEM;
>> + }
>> +
>> kvm_init_vmcs(kvm);
>> kvm->arch.gpa_size = BIT(cpu_vabits - 1);
>> kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
>> @@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>> for (i = 0; i <= kvm->arch.root_level; i++)
>> kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);
>>
>> + spin_lock_init(&kvm->arch.phyid_map_lock);
>> return 0;
>> }
>>
>> @@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>> {
>> kvm_destroy_vcpus(kvm);
>> free_page((unsigned long)kvm->arch.pgd);
>> + kvfree(kvm->arch.phyid_map);
>> kvm->arch.pgd = NULL;
>> + kvm->arch.phyid_map = NULL;
>> }
>>
>> int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> --
>> 2.39.3
>>


2024-05-06 07:00:37

by Bibo Mao

[permalink] [raw]
Subject: Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side



On 2024/5/6 上午9:53, Huacai Chen wrote:
> Hi, Bibo,
>
> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
>>
>> PARAVIRT option and pv ipi is added on guest kernel side, function
>> pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
>> firstly checks whether system runs on VM mode. If kernel runs on VM mode,
>> it will call function kvm_para_available() to detect current hypervirsor
>> type. Now only KVM type detection is supported, the paravirt function can
>> work only if current hypervisor type is KVM, since there is only KVM
>> supported on LoongArch now.
>>
>> PV IPI uses virtual IPI sender and virtual IPI receiver function. With
>> virutal IPI sender, ipi message is stored in DDR memory rather than
>> emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
>> at the same time like X86 KVM method. Hypercall method is used for IPI
>> sending.
>>
>> With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
>> VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
>> acknowledge. And IPI message is stored in DDR, no trap in get IPI message.
>>
>> Signed-off-by: Bibo Mao <[email protected]>
>> ---
>> arch/loongarch/Kconfig | 9 ++
>> arch/loongarch/include/asm/hardirq.h | 1 +
>> arch/loongarch/include/asm/paravirt.h | 27 ++++
>> .../include/asm/paravirt_api_clock.h | 1 +
>> arch/loongarch/kernel/Makefile | 1 +
>> arch/loongarch/kernel/irq.c | 2 +-
>> arch/loongarch/kernel/paravirt.c | 151 ++++++++++++++++++
>> arch/loongarch/kernel/smp.c | 4 +-
>> 8 files changed, 194 insertions(+), 2 deletions(-)
>> create mode 100644 arch/loongarch/include/asm/paravirt.h
>> create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
>> create mode 100644 arch/loongarch/kernel/paravirt.c
>>
>> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
>> index 54ad04dacdee..0a1540a8853e 100644
>> --- a/arch/loongarch/Kconfig
>> +++ b/arch/loongarch/Kconfig
>> @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
>> bool
>> default y
>>
>> +config PARAVIRT
>> + bool "Enable paravirtualization code"
>> + depends on AS_HAS_LVZ_EXTENSION
>> + help
>> + This changes the kernel so it can modify itself when it is run
>> + under a hypervisor, potentially improving performance significantly
>> + over full virtualization. However, when run without a hypervisor
>> + the kernel is theoretically slower and slightly larger.
>> +
>> config ARCH_SUPPORTS_KEXEC
>> def_bool y
>>
>> diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h
>> index 9f0038e19c7f..b26d596a73aa 100644
>> --- a/arch/loongarch/include/asm/hardirq.h
>> +++ b/arch/loongarch/include/asm/hardirq.h
>> @@ -21,6 +21,7 @@ enum ipi_msg_type {
>> typedef struct {
>> unsigned int ipi_irqs[NR_IPI];
>> unsigned int __softirq_pending;
>> + atomic_t message ____cacheline_aligned_in_smp;
>> } ____cacheline_aligned irq_cpustat_t;
>>
>> DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
>> diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h
>> new file mode 100644
>> index 000000000000..58f7b7b89f2c
>> --- /dev/null
>> +++ b/arch/loongarch/include/asm/paravirt.h
>> @@ -0,0 +1,27 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
>> +#define _ASM_LOONGARCH_PARAVIRT_H
>> +
>> +#ifdef CONFIG_PARAVIRT
>> +#include <linux/static_call_types.h>
>> +struct static_key;
>> +extern struct static_key paravirt_steal_enabled;
>> +extern struct static_key paravirt_steal_rq_enabled;
>> +
>> +u64 dummy_steal_clock(int cpu);
>> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
>> +
>> +static inline u64 paravirt_steal_clock(int cpu)
>> +{
>> + return static_call(pv_steal_clock)(cpu);
>> +}
>> +
>> +int pv_ipi_init(void);
>> +#else
>> +static inline int pv_ipi_init(void)
>> +{
>> + return 0;
>> +}
>> +
>> +#endif // CONFIG_PARAVIRT
>> +#endif
>> diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h
>> new file mode 100644
>> index 000000000000..65ac7cee0dad
>> --- /dev/null
>> +++ b/arch/loongarch/include/asm/paravirt_api_clock.h
>> @@ -0,0 +1 @@
>> +#include <asm/paravirt.h>
>> diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
>> index 3a7620b66bc6..c9bfeda89e40 100644
>> --- a/arch/loongarch/kernel/Makefile
>> +++ b/arch/loongarch/kernel/Makefile
>> @@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
>> obj-$(CONFIG_STACKTRACE) += stacktrace.o
>>
>> obj-$(CONFIG_PROC_FS) += proc.o
>> +obj-$(CONFIG_PARAVIRT) += paravirt.o
>>
>> obj-$(CONFIG_SMP) += smp.o
>>
>> diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
>> index ce36897d1e5a..4863e6c1b739 100644
>> --- a/arch/loongarch/kernel/irq.c
>> +++ b/arch/loongarch/kernel/irq.c
>> @@ -113,5 +113,5 @@ void __init init_IRQ(void)
>> per_cpu(irq_stack, i), per_cpu(irq_stack, i) + IRQ_STACK_SIZE);
>> }
>>
>> - set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
>> + set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
>> }
>> diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
>> new file mode 100644
>> index 000000000000..9044ed62045c
>> --- /dev/null
>> +++ b/arch/loongarch/kernel/paravirt.c
>> @@ -0,0 +1,151 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +#include <linux/export.h>
>> +#include <linux/types.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/jump_label.h>
>> +#include <linux/kvm_para.h>
>> +#include <asm/paravirt.h>
>> +#include <linux/static_call.h>
>> +
>> +struct static_key paravirt_steal_enabled;
>> +struct static_key paravirt_steal_rq_enabled;
>> +
>> +static u64 native_steal_clock(int cpu)
>> +{
>> + return 0;
>> +}
>> +
>> +DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
>> +
>> +#ifdef CONFIG_SMP
>> +static void pv_send_ipi_single(int cpu, unsigned int action)
>> +{
>> + unsigned int min, old;
>> + irq_cpustat_t *info = &per_cpu(irq_stat, cpu);
>> +
>> + old = atomic_fetch_or(BIT(action), &info->message);
>> + if (old)
>> + return;
>> +
>> + min = cpu_logical_map(cpu);
>> + kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI, 1, 0, min);
>> +}
>> +
>> +#define KVM_IPI_CLUSTER_SIZE (2 * BITS_PER_LONG)
>> +static void pv_send_ipi_mask(const struct cpumask *mask, unsigned int action)
>> +{
>> + unsigned int cpu, i, min = 0, max = 0, old;
>> + __uint128_t bitmap = 0;
>> + irq_cpustat_t *info;
>> +
>> + if (cpumask_empty(mask))
>> + return;
>> +
>> + action = BIT(action);
>> + for_each_cpu(i, mask) {
>> + info = &per_cpu(irq_stat, i);
>> + old = atomic_fetch_or(action, &info->message);
>> + if (old)
>> + continue;
>> +
>> + cpu = cpu_logical_map(i);
>> + if (!bitmap) {
>> + min = max = cpu;
>> + } else if (cpu > min && cpu < min + KVM_IPI_CLUSTER_SIZE) {
>> + max = cpu > max ? cpu : max;
>> + } else if (cpu < min && (max - cpu) < KVM_IPI_CLUSTER_SIZE) {
>> + bitmap <<= min - cpu;
>> + min = cpu;
>> + } else {
>> + /*
>> + * Physical cpuid is sorted in ascending order ascend
>> + * for the next mask calculation, send IPI here
>> + * directly and skip the remainding cpus
>> + */
>> + kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI,
>> + (unsigned long)bitmap,
>> + (unsigned long)(bitmap >> BITS_PER_LONG), min);
>> + min = max = cpu;
>> + bitmap = 0;
>> + }
> I have changed the logic and comments when I apply, you can double
> check whether it is correct.
There is modification like this:
if (!bitmap) {
min = max = cpu;
} else if (cpu < min && cpu > (max -
KVM_IPI_CLUSTER_SIZE)) {
...

By test there will be problem if value of max is smaller than
KVM_IPI_CLUSTER_SIZE, since type of cpu/max is "unsigned int".

How about define the variable as int? the patch is like this:
--- a/arch/loongarch/kernel/paravirt.c
+++ b/arch/loongarch/kernel/paravirt.c
@@ -35,7 +35,7 @@ static void pv_send_ipi_single(int cpu, unsigned int
action)

static void pv_send_ipi_mask(const struct cpumask *mask, unsigned int
action)
{
- unsigned int cpu, i, min = 0, max = 0, old;
+ int cpu, i, min = 0, max = 0, old;
__uint128_t bitmap = 0;
irq_cpustat_t *info;


Regards
Bibo Mao
>
> Huacai
>
>> + __set_bit(cpu - min, (unsigned long *)&bitmap);
>> + }
>> +
>> + if (bitmap)
>> + kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI, (unsigned long)bitmap,
>> + (unsigned long)(bitmap >> BITS_PER_LONG), min);
>> +}
>> +
>> +static irqreturn_t loongson_do_swi(int irq, void *dev)
>> +{
>> + irq_cpustat_t *info;
>> + long action;
>> +
>> + /* Clear swi interrupt */
>> + clear_csr_estat(1 << INT_SWI0);
>> + info = this_cpu_ptr(&irq_stat);
>> + action = atomic_xchg(&info->message, 0);
>> + if (action & SMP_CALL_FUNCTION) {
>> + generic_smp_call_function_interrupt();
>> + info->ipi_irqs[IPI_CALL_FUNCTION]++;
>> + }
>> +
>> + if (action & SMP_RESCHEDULE) {
>> + scheduler_ipi();
>> + info->ipi_irqs[IPI_RESCHEDULE]++;
>> + }
>> +
>> + return IRQ_HANDLED;
>> +}
>> +
>> +static void pv_init_ipi(void)
>> +{
>> + int r, swi0;
>> +
>> + swi0 = get_percpu_irq(INT_SWI0);
>> + if (swi0 < 0)
>> + panic("SWI0 IRQ mapping failed\n");
>> + irq_set_percpu_devid(swi0);
>> + r = request_percpu_irq(swi0, loongson_do_swi, "SWI0", &irq_stat);
>> + if (r < 0)
>> + panic("SWI0 IRQ request failed\n");
>> +}
>> +#endif
>> +
>> +static bool kvm_para_available(void)
>> +{
>> + static int hypervisor_type;
>> + int config;
>> +
>> + if (!hypervisor_type) {
>> + config = read_cpucfg(CPUCFG_KVM_SIG);
>> + if (!memcmp(&config, KVM_SIGNATURE, 4))
>> + hypervisor_type = HYPERVISOR_KVM;
>> + }
>> +
>> + return hypervisor_type == HYPERVISOR_KVM;
>> +}
>> +
>> +int __init pv_ipi_init(void)
>> +{
>> + int feature;
>> +
>> + if (!cpu_has_hypervisor)
>> + return 0;
>> + if (!kvm_para_available())
>> + return 0;
>> +
>> + /*
>> + * check whether KVM hypervisor supports pv_ipi or not
>> + */
>> + feature = read_cpucfg(CPUCFG_KVM_FEATURE);
>> +#ifdef CONFIG_SMP
>> + if (feature & KVM_FEATURE_PV_IPI) {
>> + smp_ops.init_ipi = pv_init_ipi;
>> + smp_ops.send_ipi_single = pv_send_ipi_single;
>> + smp_ops.send_ipi_mask = pv_send_ipi_mask;
>> + }
>> +#endif
>> +
>> + return 1;
>> +}
>> diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
>> index 1fce775be4f6..9eff7aa4c552 100644
>> --- a/arch/loongarch/kernel/smp.c
>> +++ b/arch/loongarch/kernel/smp.c
>> @@ -29,6 +29,7 @@
>> #include <asm/loongson.h>
>> #include <asm/mmu_context.h>
>> #include <asm/numa.h>
>> +#include <asm/paravirt.h>
>> #include <asm/processor.h>
>> #include <asm/setup.h>
>> #include <asm/time.h>
>> @@ -309,6 +310,7 @@ void __init loongson_smp_setup(void)
>> cpu_data[0].core = cpu_logical_map(0) % loongson_sysconf.cores_per_package;
>> cpu_data[0].package = cpu_logical_map(0) / loongson_sysconf.cores_per_package;
>>
>> + pv_ipi_init();
>> iocsr_write32(0xffffffff, LOONGARCH_IOCSR_IPI_EN);
>> pr_info("Detected %i available CPU(s)\n", loongson_sysconf.nr_cpus);
>> }
>> @@ -352,7 +354,7 @@ void loongson_boot_secondary(int cpu, struct task_struct *idle)
>> void loongson_init_secondary(void)
>> {
>> unsigned int cpu = smp_processor_id();
>> - unsigned int imask = ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 |
>> + unsigned int imask = ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 |
>> ECFGF_IPI | ECFGF_PMC | ECFGF_TIMER;
>>
>> change_csr_ecfg(ECFG0_IM, imask);
>> --
>> 2.39.3
>>


2024-05-06 07:04:56

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side

On Mon, May 6, 2024 at 3:00 PM maobibo <[email protected]> wrote:
>
>
>
> On 2024/5/6 上午9:53, Huacai Chen wrote:
> > Hi, Bibo,
> >
> > On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
> >>
> >> PARAVIRT option and pv ipi is added on guest kernel side, function
> >> pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
> >> firstly checks whether system runs on VM mode. If kernel runs on VM mode,
> >> it will call function kvm_para_available() to detect current hypervirsor
> >> type. Now only KVM type detection is supported, the paravirt function can
> >> work only if current hypervisor type is KVM, since there is only KVM
> >> supported on LoongArch now.
> >>
> >> PV IPI uses virtual IPI sender and virtual IPI receiver function. With
> >> virutal IPI sender, ipi message is stored in DDR memory rather than
> >> emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
> >> at the same time like X86 KVM method. Hypercall method is used for IPI
> >> sending.
> >>
> >> With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
> >> VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
> >> acknowledge. And IPI message is stored in DDR, no trap in get IPI message.
> >>
> >> Signed-off-by: Bibo Mao <[email protected]>
> >> ---
> >> arch/loongarch/Kconfig | 9 ++
> >> arch/loongarch/include/asm/hardirq.h | 1 +
> >> arch/loongarch/include/asm/paravirt.h | 27 ++++
> >> .../include/asm/paravirt_api_clock.h | 1 +
> >> arch/loongarch/kernel/Makefile | 1 +
> >> arch/loongarch/kernel/irq.c | 2 +-
> >> arch/loongarch/kernel/paravirt.c | 151 ++++++++++++++++++
> >> arch/loongarch/kernel/smp.c | 4 +-
> >> 8 files changed, 194 insertions(+), 2 deletions(-)
> >> create mode 100644 arch/loongarch/include/asm/paravirt.h
> >> create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
> >> create mode 100644 arch/loongarch/kernel/paravirt.c
> >>
> >> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> >> index 54ad04dacdee..0a1540a8853e 100644
> >> --- a/arch/loongarch/Kconfig
> >> +++ b/arch/loongarch/Kconfig
> >> @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
> >> bool
> >> default y
> >>
> >> +config PARAVIRT
> >> + bool "Enable paravirtualization code"
> >> + depends on AS_HAS_LVZ_EXTENSION
> >> + help
> >> + This changes the kernel so it can modify itself when it is run
> >> + under a hypervisor, potentially improving performance significantly
> >> + over full virtualization. However, when run without a hypervisor
> >> + the kernel is theoretically slower and slightly larger.
> >> +
> >> config ARCH_SUPPORTS_KEXEC
> >> def_bool y
> >>
> >> diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h
> >> index 9f0038e19c7f..b26d596a73aa 100644
> >> --- a/arch/loongarch/include/asm/hardirq.h
> >> +++ b/arch/loongarch/include/asm/hardirq.h
> >> @@ -21,6 +21,7 @@ enum ipi_msg_type {
> >> typedef struct {
> >> unsigned int ipi_irqs[NR_IPI];
> >> unsigned int __softirq_pending;
> >> + atomic_t message ____cacheline_aligned_in_smp;
> >> } ____cacheline_aligned irq_cpustat_t;
> >>
> >> DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
> >> diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h
> >> new file mode 100644
> >> index 000000000000..58f7b7b89f2c
> >> --- /dev/null
> >> +++ b/arch/loongarch/include/asm/paravirt.h
> >> @@ -0,0 +1,27 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> >> +#define _ASM_LOONGARCH_PARAVIRT_H
> >> +
> >> +#ifdef CONFIG_PARAVIRT
> >> +#include <linux/static_call_types.h>
> >> +struct static_key;
> >> +extern struct static_key paravirt_steal_enabled;
> >> +extern struct static_key paravirt_steal_rq_enabled;
> >> +
> >> +u64 dummy_steal_clock(int cpu);
> >> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> >> +
> >> +static inline u64 paravirt_steal_clock(int cpu)
> >> +{
> >> + return static_call(pv_steal_clock)(cpu);
> >> +}
> >> +
> >> +int pv_ipi_init(void);
> >> +#else
> >> +static inline int pv_ipi_init(void)
> >> +{
> >> + return 0;
> >> +}
> >> +
> >> +#endif // CONFIG_PARAVIRT
> >> +#endif
> >> diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h
> >> new file mode 100644
> >> index 000000000000..65ac7cee0dad
> >> --- /dev/null
> >> +++ b/arch/loongarch/include/asm/paravirt_api_clock.h
> >> @@ -0,0 +1 @@
> >> +#include <asm/paravirt.h>
> >> diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
> >> index 3a7620b66bc6..c9bfeda89e40 100644
> >> --- a/arch/loongarch/kernel/Makefile
> >> +++ b/arch/loongarch/kernel/Makefile
> >> @@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
> >> obj-$(CONFIG_STACKTRACE) += stacktrace.o
> >>
> >> obj-$(CONFIG_PROC_FS) += proc.o
> >> +obj-$(CONFIG_PARAVIRT) += paravirt.o
> >>
> >> obj-$(CONFIG_SMP) += smp.o
> >>
> >> diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
> >> index ce36897d1e5a..4863e6c1b739 100644
> >> --- a/arch/loongarch/kernel/irq.c
> >> +++ b/arch/loongarch/kernel/irq.c
> >> @@ -113,5 +113,5 @@ void __init init_IRQ(void)
> >> per_cpu(irq_stack, i), per_cpu(irq_stack, i) + IRQ_STACK_SIZE);
> >> }
> >>
> >> - set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
> >> + set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
> >> }
> >> diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
> >> new file mode 100644
> >> index 000000000000..9044ed62045c
> >> --- /dev/null
> >> +++ b/arch/loongarch/kernel/paravirt.c
> >> @@ -0,0 +1,151 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +#include <linux/export.h>
> >> +#include <linux/types.h>
> >> +#include <linux/interrupt.h>
> >> +#include <linux/jump_label.h>
> >> +#include <linux/kvm_para.h>
> >> +#include <asm/paravirt.h>
> >> +#include <linux/static_call.h>
> >> +
> >> +struct static_key paravirt_steal_enabled;
> >> +struct static_key paravirt_steal_rq_enabled;
> >> +
> >> +static u64 native_steal_clock(int cpu)
> >> +{
> >> + return 0;
> >> +}
> >> +
> >> +DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
> >> +
> >> +#ifdef CONFIG_SMP
> >> +static void pv_send_ipi_single(int cpu, unsigned int action)
> >> +{
> >> + unsigned int min, old;
> >> + irq_cpustat_t *info = &per_cpu(irq_stat, cpu);
> >> +
> >> + old = atomic_fetch_or(BIT(action), &info->message);
> >> + if (old)
> >> + return;
> >> +
> >> + min = cpu_logical_map(cpu);
> >> + kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI, 1, 0, min);
> >> +}
> >> +
> >> +#define KVM_IPI_CLUSTER_SIZE (2 * BITS_PER_LONG)
> >> +static void pv_send_ipi_mask(const struct cpumask *mask, unsigned int action)
> >> +{
> >> + unsigned int cpu, i, min = 0, max = 0, old;
> >> + __uint128_t bitmap = 0;
> >> + irq_cpustat_t *info;
> >> +
> >> + if (cpumask_empty(mask))
> >> + return;
> >> +
> >> + action = BIT(action);
> >> + for_each_cpu(i, mask) {
> >> + info = &per_cpu(irq_stat, i);
> >> + old = atomic_fetch_or(action, &info->message);
> >> + if (old)
> >> + continue;
> >> +
> >> + cpu = cpu_logical_map(i);
> >> + if (!bitmap) {
> >> + min = max = cpu;
> >> + } else if (cpu > min && cpu < min + KVM_IPI_CLUSTER_SIZE) {
> >> + max = cpu > max ? cpu : max;
> >> + } else if (cpu < min && (max - cpu) < KVM_IPI_CLUSTER_SIZE) {
> >> + bitmap <<= min - cpu;
> >> + min = cpu;
> >> + } else {
> >> + /*
> >> + * Physical cpuid is sorted in ascending order ascend
> >> + * for the next mask calculation, send IPI here
> >> + * directly and skip the remainding cpus
> >> + */
> >> + kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI,
> >> + (unsigned long)bitmap,
> >> + (unsigned long)(bitmap >> BITS_PER_LONG), min);
> >> + min = max = cpu;
> >> + bitmap = 0;
> >> + }
> > I have changed the logic and comments when I apply, you can double
> > check whether it is correct.
> There is modification like this:
> if (!bitmap) {
> min = max = cpu;
> } else if (cpu < min && cpu > (max -
> KVM_IPI_CLUSTER_SIZE)) {
> ...
>
> By test there will be problem if value of max is smaller than
> KVM_IPI_CLUSTER_SIZE, since type of cpu/max is "unsigned int".
>
> How about define the variable as int? the patch is like this:
> --- a/arch/loongarch/kernel/paravirt.c
> +++ b/arch/loongarch/kernel/paravirt.c
> @@ -35,7 +35,7 @@ static void pv_send_ipi_single(int cpu, unsigned int
> action)
>
> static void pv_send_ipi_mask(const struct cpumask *mask, unsigned int
> action)
> {
> - unsigned int cpu, i, min = 0, max = 0, old;
> + int cpu, i, min = 0, max = 0, old;
> __uint128_t bitmap = 0;
> irq_cpustat_t *info;
Make sense, I will update this line.

Huacai

>
>
> Regards
> Bibo Mao
> >
> > Huacai
> >
> >> + __set_bit(cpu - min, (unsigned long *)&bitmap);
> >> + }
> >> +
> >> + if (bitmap)
> >> + kvm_hypercall3(KVM_HCALL_FUNC_PV_IPI, (unsigned long)bitmap,
> >> + (unsigned long)(bitmap >> BITS_PER_LONG), min);
> >> +}
> >> +
> >> +static irqreturn_t loongson_do_swi(int irq, void *dev)
> >> +{
> >> + irq_cpustat_t *info;
> >> + long action;
> >> +
> >> + /* Clear swi interrupt */
> >> + clear_csr_estat(1 << INT_SWI0);
> >> + info = this_cpu_ptr(&irq_stat);
> >> + action = atomic_xchg(&info->message, 0);
> >> + if (action & SMP_CALL_FUNCTION) {
> >> + generic_smp_call_function_interrupt();
> >> + info->ipi_irqs[IPI_CALL_FUNCTION]++;
> >> + }
> >> +
> >> + if (action & SMP_RESCHEDULE) {
> >> + scheduler_ipi();
> >> + info->ipi_irqs[IPI_RESCHEDULE]++;
> >> + }
> >> +
> >> + return IRQ_HANDLED;
> >> +}
> >> +
> >> +static void pv_init_ipi(void)
> >> +{
> >> + int r, swi0;
> >> +
> >> + swi0 = get_percpu_irq(INT_SWI0);
> >> + if (swi0 < 0)
> >> + panic("SWI0 IRQ mapping failed\n");
> >> + irq_set_percpu_devid(swi0);
> >> + r = request_percpu_irq(swi0, loongson_do_swi, "SWI0", &irq_stat);
> >> + if (r < 0)
> >> + panic("SWI0 IRQ request failed\n");
> >> +}
> >> +#endif
> >> +
> >> +static bool kvm_para_available(void)
> >> +{
> >> + static int hypervisor_type;
> >> + int config;
> >> +
> >> + if (!hypervisor_type) {
> >> + config = read_cpucfg(CPUCFG_KVM_SIG);
> >> + if (!memcmp(&config, KVM_SIGNATURE, 4))
> >> + hypervisor_type = HYPERVISOR_KVM;
> >> + }
> >> +
> >> + return hypervisor_type == HYPERVISOR_KVM;
> >> +}
> >> +
> >> +int __init pv_ipi_init(void)
> >> +{
> >> + int feature;
> >> +
> >> + if (!cpu_has_hypervisor)
> >> + return 0;
> >> + if (!kvm_para_available())
> >> + return 0;
> >> +
> >> + /*
> >> + * check whether KVM hypervisor supports pv_ipi or not
> >> + */
> >> + feature = read_cpucfg(CPUCFG_KVM_FEATURE);
> >> +#ifdef CONFIG_SMP
> >> + if (feature & KVM_FEATURE_PV_IPI) {
> >> + smp_ops.init_ipi = pv_init_ipi;
> >> + smp_ops.send_ipi_single = pv_send_ipi_single;
> >> + smp_ops.send_ipi_mask = pv_send_ipi_mask;
> >> + }
> >> +#endif
> >> +
> >> + return 1;
> >> +}
> >> diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
> >> index 1fce775be4f6..9eff7aa4c552 100644
> >> --- a/arch/loongarch/kernel/smp.c
> >> +++ b/arch/loongarch/kernel/smp.c
> >> @@ -29,6 +29,7 @@
> >> #include <asm/loongson.h>
> >> #include <asm/mmu_context.h>
> >> #include <asm/numa.h>
> >> +#include <asm/paravirt.h>
> >> #include <asm/processor.h>
> >> #include <asm/setup.h>
> >> #include <asm/time.h>
> >> @@ -309,6 +310,7 @@ void __init loongson_smp_setup(void)
> >> cpu_data[0].core = cpu_logical_map(0) % loongson_sysconf.cores_per_package;
> >> cpu_data[0].package = cpu_logical_map(0) / loongson_sysconfcores_per_package;
> >>
> >> + pv_ipi_init();
> >> iocsr_write32(0xffffffff, LOONGARCH_IOCSR_IPI_EN);
> >> pr_info("Detected %i available CPU(s)\n", loongson_sysconf.nr_cpus);
> >> }
> >> @@ -352,7 +354,7 @@ void loongson_boot_secondary(int cpu, struct task_struct *idle)
> >> void loongson_init_secondary(void)
> >> {
> >> unsigned int cpu = smp_processor_id();
> >> - unsigned int imask = ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 |
> >> + unsigned int imask = ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 |
> >> ECFGF_IPI | ECFGF_PMC | ECFGF_TIMER;
> >>
> >> change_csr_ecfg(ECFG0_IM, imask);
> >> --
> >> 2.39.3
> >>
>
>

2024-05-06 07:07:16

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid

Hi, Bibo,

On Mon, May 6, 2024 at 2:36 PM maobibo <[email protected]> wrote:
>
>
>
> On 2024/5/6 上午9:49, Huacai Chen wrote:
> > Hi, Bibo,
> >
> > On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
> >>
> >> Physical cpuid is used for interrupt routing for irqchips such as
> >> ipi/msi/extioi interrupt controller. And physical cpuid is stored
> >> at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
> >> is created and physical cpuid of two vcpus cannot be the same.
> >>
> >> Different irqchips have different size declaration about physical cpuid,
> >> max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
> >> supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
> >> for MSI irqchip.
> >>
> >> The smallest value from all interrupt controllers is selected now,
> >> and the max cpuid size is defines as 256 by KVM which comes from
> >> extioi irqchip.
> >>
> >> Signed-off-by: Bibo Mao <[email protected]>
> >> ---
> >> arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
> >> arch/loongarch/include/asm/kvm_vcpu.h | 1 +
> >> arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
> >> arch/loongarch/kvm/vm.c | 11 ++++
> >> 4 files changed, 130 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
> >> index 2d62f7b0d377..3ba16ef1fe69 100644
> >> --- a/arch/loongarch/include/asm/kvm_host.h
> >> +++ b/arch/loongarch/include/asm/kvm_host.h
> >> @@ -64,6 +64,30 @@ struct kvm_world_switch {
> >>
> >> #define MAX_PGTABLE_LEVELS 4
> >>
> >> +/*
> >> + * Physical cpu id is used for interrupt routing, there are different
> >> + * definitions about physical cpuid on different hardwares.
> >> + * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
> >> + * For IPI HW, max dest CPUID size 1024
> >> + * For extioi interrupt controller, max dest CPUID size is 256
> >> + * For MSI interrupt controller, max supported CPUID size is 65536
> >> + *
> >> + * Currently max CPUID is defined as 256 for KVM hypervisor, in future
> >> + * it will be expanded to 4096, including 16 packages at most. And every
> >> + * package supports at most 256 vcpus
> >> + */
> >> +#define KVM_MAX_PHYID 256
> >> +
> >> +struct kvm_phyid_info {
> >> + struct kvm_vcpu *vcpu;
> >> + bool enabled;
> >> +};
> >> +
> >> +struct kvm_phyid_map {
> >> + int max_phyid;
> >> + struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
> >> +};
> >> +
> >> struct kvm_arch {
> >> /* Guest physical mm */
> >> kvm_pte_t *pgd;
> >> @@ -71,6 +95,8 @@ struct kvm_arch {
> >> unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
> >> unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
> >> unsigned int root_level;
> >> + spinlock_t phyid_map_lock;
> >> + struct kvm_phyid_map *phyid_map;
> >>
> >> s64 time_offset;
> >> struct kvm_context __percpu *vmcs;
> >> diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
> >> index 0cb4fdb8a9b5..9f53950959da 100644
> >> --- a/arch/loongarch/include/asm/kvm_vcpu.h
> >> +++ b/arch/loongarch/include/asm/kvm_vcpu.h
> >> @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
> >> void kvm_restore_timer(struct kvm_vcpu *vcpu);
> >>
> >> int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
> >> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
> >>
> >> /*
> >> * Loongarch KVM guest interrupt handling
> >> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
> >> index 3a8779065f73..b633fd28b8db 100644
> >> --- a/arch/loongarch/kvm/vcpu.c
> >> +++ b/arch/loongarch/kvm/vcpu.c
> >> @@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
> >> return 0;
> >> }
> >>
> >> +static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
> >> +{
> >> + int cpuid;
> >> + struct loongarch_csrs *csr = vcpu->arch.csr;
> >> + struct kvm_phyid_map *map;
> >> +
> >> + if (val >= KVM_MAX_PHYID)
> >> + return -EINVAL;
> >> +
> >> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
> >> + map = vcpu->kvm->arch.phyid_map;
> >> + spin_lock(&vcpu->kvm->arch.phyid_map_lock);
> >> + if (map->phys_map[cpuid].enabled) {
> >> + /*
> >> + * Cpuid is already set before
> >> + * Forbid changing different cpuid at runtime
> >> + */
> >> + if (cpuid != val) {
> >> + /*
> >> + * Cpuid 0 is initial value for vcpu, maybe invalid
> >> + * unset value for vcpu
> >> + */
> >> + if (cpuid) {
> >> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >> + return -EINVAL;
> >> + }
> >> + } else {
> >> + /* Discard duplicated cpuid set */
> >> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >> + return 0;
> >> + }
> >> + }
> > I have changed the logic and comments when I apply, you can double
> > check whether it is correct.
> I checkout the latest version, the modification in function
> kvm_set_cpuid() is good for me.
Now the modified version is like this:

+ if (map->phys_map[cpuid].enabled) {
+ /* Discard duplicated CPUID set operation */
+ if (cpuid == val) {
+ spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+ return 0;
+ }
+
+ /*
+ * CPUID is already set before
+ * Forbid changing different CPUID at runtime
+ * But CPUID 0 is the initial value for vcpu, so allow
+ * changing from 0 to others
+ */
+ if (cpuid) {
+ spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+ return -EINVAL;
+ }
+ }
But I still doubt whether we should allow changing from 0 to others
while map->phys_map[cpuid].enabled is 1.

Huacai

> >
> >> +
> >> + if (map->phys_map[val].enabled) {
> >> + /*
> >> + * New cpuid is already set with other vcpu
> >> + * Forbid sharing the same cpuid between different vcpus
> >> + */
> >> + if (map->phys_map[val].vcpu != vcpu) {
> >> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >> + return -EINVAL;
> >> + }
> >> +
> >> + /* Discard duplicated cpuid set operation*/
> >> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >> + return 0;
> >> + }
> >> +
> >> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
> >> + map->phys_map[val].enabled = true;
> >> + map->phys_map[val].vcpu = vcpu;
> >> + if (map->max_phyid < val)
> >> + map->max_phyid = val;
> >> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >> + return 0;
> >> +}
> >> +
> >> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
> >> +{
> >> + struct kvm_phyid_map *map;
> >> +
> >> + if (cpuid >= KVM_MAX_PHYID)
> >> + return NULL;
> >> +
> >> + map = kvm->arch.phyid_map;
> >> + if (map->phys_map[cpuid].enabled)
> >> + return map->phys_map[cpuid].vcpu;
> >> +
> >> + return NULL;
> >> +}
> >> +
> >> +static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
> >> +{
> >> + int cpuid;
> >> + struct loongarch_csrs *csr = vcpu->arch.csr;
> >> + struct kvm_phyid_map *map;
> >> +
> >> + map = vcpu->kvm->arch.phyid_map;
> >> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
> >> + if (cpuid >= KVM_MAX_PHYID)
> >> + return;
> >> +
> >> + if (map->phys_map[cpuid].enabled) {
> >> + map->phys_map[cpuid].vcpu = NULL;
> >> + map->phys_map[cpuid].enabled = false;
> >> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
> >> + }
> >> +}
> > While kvm_set_cpuid() is protected by a spinlock, do kvm_drop_cpuid()
> > and kvm_get_vcpu_by_cpuid() also need it?
> >
> It is good to me that spinlock is added in function kvm_drop_cpuid().
> And thinks for the efforts.
>
> Regards
> Bibo Mao
> >> +
> >> static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
> >> {
> >> int ret = 0, gintc;
> >> @@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
> >> kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
> >>
> >> return ret;
> >> - }
> >> + } else if (id == LOONGARCH_CSR_CPUID)
> >> + return kvm_set_cpuid(vcpu, val);
> >>
> >> kvm_write_sw_gcsr(csr, id, val);
> >>
> >> @@ -943,6 +1033,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> >> hrtimer_cancel(&vcpu->arch.swtimer);
> >> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> >> kfree(vcpu->arch.csr);
> >> + kvm_drop_cpuid(vcpu);
> > I think this line should be before the above kfree(), otherwise you
> > get a "use after free".
> >
> > Huacai
> >
> >>
> >> /*
> >> * If the vCPU is freed and reused as another vCPU, we don't want the
> >> diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
> >> index 0a37f6fa8f2d..6006a28653ad 100644
> >> --- a/arch/loongarch/kvm/vm.c
> >> +++ b/arch/loongarch/kvm/vm.c
> >> @@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >> if (!kvm->arch.pgd)
> >> return -ENOMEM;
> >>
> >> + kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
> >> + GFP_KERNEL_ACCOUNT);
> >> + if (!kvm->arch.phyid_map) {
> >> + free_page((unsigned long)kvm->arch.pgd);
> >> + kvm->arch.pgd = NULL;
> >> + return -ENOMEM;
> >> + }
> >> +
> >> kvm_init_vmcs(kvm);
> >> kvm->arch.gpa_size = BIT(cpu_vabits - 1);
> >> kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
> >> @@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >> for (i = 0; i <= kvm->arch.root_level; i++)
> >> kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);
> >>
> >> + spin_lock_init(&kvm->arch.phyid_map_lock);
> >> return 0;
> >> }
> >>
> >> @@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> >> {
> >> kvm_destroy_vcpus(kvm);
> >> free_page((unsigned long)kvm->arch.pgd);
> >> + kvfree(kvm->arch.phyid_map);
> >> kvm->arch.pgd = NULL;
> >> + kvm->arch.phyid_map = NULL;
> >> }
> >>
> >> int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >> --
> >> 2.39.3
> >>
>

2024-05-06 08:24:46

by Bibo Mao

[permalink] [raw]
Subject: Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid



On 2024/5/6 下午3:06, Huacai Chen wrote:
> Hi, Bibo,
>
> On Mon, May 6, 2024 at 2:36 PM maobibo <[email protected]> wrote:
>>
>>
>>
>> On 2024/5/6 上午9:49, Huacai Chen wrote:
>>> Hi, Bibo,
>>>
>>> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
>>>>
>>>> Physical cpuid is used for interrupt routing for irqchips such as
>>>> ipi/msi/extioi interrupt controller. And physical cpuid is stored
>>>> at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
>>>> is created and physical cpuid of two vcpus cannot be the same.
>>>>
>>>> Different irqchips have different size declaration about physical cpuid,
>>>> max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
>>>> supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
>>>> for MSI irqchip.
>>>>
>>>> The smallest value from all interrupt controllers is selected now,
>>>> and the max cpuid size is defines as 256 by KVM which comes from
>>>> extioi irqchip.
>>>>
>>>> Signed-off-by: Bibo Mao <[email protected]>
>>>> ---
>>>> arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
>>>> arch/loongarch/include/asm/kvm_vcpu.h | 1 +
>>>> arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
>>>> arch/loongarch/kvm/vm.c | 11 ++++
>>>> 4 files changed, 130 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
>>>> index 2d62f7b0d377..3ba16ef1fe69 100644
>>>> --- a/arch/loongarch/include/asm/kvm_host.h
>>>> +++ b/arch/loongarch/include/asm/kvm_host.h
>>>> @@ -64,6 +64,30 @@ struct kvm_world_switch {
>>>>
>>>> #define MAX_PGTABLE_LEVELS 4
>>>>
>>>> +/*
>>>> + * Physical cpu id is used for interrupt routing, there are different
>>>> + * definitions about physical cpuid on different hardwares.
>>>> + * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
>>>> + * For IPI HW, max dest CPUID size 1024
>>>> + * For extioi interrupt controller, max dest CPUID size is 256
>>>> + * For MSI interrupt controller, max supported CPUID size is 65536
>>>> + *
>>>> + * Currently max CPUID is defined as 256 for KVM hypervisor, in future
>>>> + * it will be expanded to 4096, including 16 packages at most. And every
>>>> + * package supports at most 256 vcpus
>>>> + */
>>>> +#define KVM_MAX_PHYID 256
>>>> +
>>>> +struct kvm_phyid_info {
>>>> + struct kvm_vcpu *vcpu;
>>>> + bool enabled;
>>>> +};
>>>> +
>>>> +struct kvm_phyid_map {
>>>> + int max_phyid;
>>>> + struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
>>>> +};
>>>> +
>>>> struct kvm_arch {
>>>> /* Guest physical mm */
>>>> kvm_pte_t *pgd;
>>>> @@ -71,6 +95,8 @@ struct kvm_arch {
>>>> unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
>>>> unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
>>>> unsigned int root_level;
>>>> + spinlock_t phyid_map_lock;
>>>> + struct kvm_phyid_map *phyid_map;
>>>>
>>>> s64 time_offset;
>>>> struct kvm_context __percpu *vmcs;
>>>> diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
>>>> index 0cb4fdb8a9b5..9f53950959da 100644
>>>> --- a/arch/loongarch/include/asm/kvm_vcpu.h
>>>> +++ b/arch/loongarch/include/asm/kvm_vcpu.h
>>>> @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
>>>> void kvm_restore_timer(struct kvm_vcpu *vcpu);
>>>>
>>>> int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
>>>>
>>>> /*
>>>> * Loongarch KVM guest interrupt handling
>>>> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
>>>> index 3a8779065f73..b633fd28b8db 100644
>>>> --- a/arch/loongarch/kvm/vcpu.c
>>>> +++ b/arch/loongarch/kvm/vcpu.c
>>>> @@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
>>>> return 0;
>>>> }
>>>>
>>>> +static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
>>>> +{
>>>> + int cpuid;
>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
>>>> + struct kvm_phyid_map *map;
>>>> +
>>>> + if (val >= KVM_MAX_PHYID)
>>>> + return -EINVAL;
>>>> +
>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
>>>> + map = vcpu->kvm->arch.phyid_map;
>>>> + spin_lock(&vcpu->kvm->arch.phyid_map_lock);
>>>> + if (map->phys_map[cpuid].enabled) {
>>>> + /*
>>>> + * Cpuid is already set before
>>>> + * Forbid changing different cpuid at runtime
>>>> + */
>>>> + if (cpuid != val) {
>>>> + /*
>>>> + * Cpuid 0 is initial value for vcpu, maybe invalid
>>>> + * unset value for vcpu
>>>> + */
>>>> + if (cpuid) {
>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>> + return -EINVAL;
>>>> + }
>>>> + } else {
>>>> + /* Discard duplicated cpuid set */
>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>> + return 0;
>>>> + }
>>>> + }
>>> I have changed the logic and comments when I apply, you can double
>>> check whether it is correct.
>> I checkout the latest version, the modification in function
>> kvm_set_cpuid() is good for me.
> Now the modified version is like this:
>
> + if (map->phys_map[cpuid].enabled) {
> + /* Discard duplicated CPUID set operation */
> + if (cpuid == val) {
> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> + return 0;
> + }
> +
> + /*
> + * CPUID is already set before
> + * Forbid changing different CPUID at runtime
> + * But CPUID 0 is the initial value for vcpu, so allow
> + * changing from 0 to others
> + */
> + if (cpuid) {
> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> + return -EINVAL;
> + }
> + }
> But I still doubt whether we should allow changing from 0 to others
> while map->phys_map[cpuid].enabled is 1.
It is necessary since the default sw cpuid is zero :-( And we can
optimize it in later, such as set INVALID cpuid in function
kvm_arch_vcpu_create() and logic will be simple in function kvm_set_cpuid().

Regards
Bibo Mao

>
> Huacai
>
>>>
>>>> +
>>>> + if (map->phys_map[val].enabled) {
>>>> + /*
>>>> + * New cpuid is already set with other vcpu
>>>> + * Forbid sharing the same cpuid between different vcpus
>>>> + */
>>>> + if (map->phys_map[val].vcpu != vcpu) {
>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + /* Discard duplicated cpuid set operation*/
>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>> + return 0;
>>>> + }
>>>> +
>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
>>>> + map->phys_map[val].enabled = true;
>>>> + map->phys_map[val].vcpu = vcpu;
>>>> + if (map->max_phyid < val)
>>>> + map->max_phyid = val;
>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>> + return 0;
>>>> +}
>>>> +
>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
>>>> +{
>>>> + struct kvm_phyid_map *map;
>>>> +
>>>> + if (cpuid >= KVM_MAX_PHYID)
>>>> + return NULL;
>>>> +
>>>> + map = kvm->arch.phyid_map;
>>>> + if (map->phys_map[cpuid].enabled)
>>>> + return map->phys_map[cpuid].vcpu;
>>>> +
>>>> + return NULL;
>>>> +}
>>>> +
>>>> +static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + int cpuid;
>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
>>>> + struct kvm_phyid_map *map;
>>>> +
>>>> + map = vcpu->kvm->arch.phyid_map;
>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
>>>> + if (cpuid >= KVM_MAX_PHYID)
>>>> + return;
>>>> +
>>>> + if (map->phys_map[cpuid].enabled) {
>>>> + map->phys_map[cpuid].vcpu = NULL;
>>>> + map->phys_map[cpuid].enabled = false;
>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
>>>> + }
>>>> +}
>>> While kvm_set_cpuid() is protected by a spinlock, do kvm_drop_cpuid()
>>> and kvm_get_vcpu_by_cpuid() also need it?
>>>
>> It is good to me that spinlock is added in function kvm_drop_cpuid().
>> And thinks for the efforts.
>>
>> Regards
>> Bibo Mao
>>>> +
>>>> static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
>>>> {
>>>> int ret = 0, gintc;
>>>> @@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
>>>> kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
>>>>
>>>> return ret;
>>>> - }
>>>> + } else if (id == LOONGARCH_CSR_CPUID)
>>>> + return kvm_set_cpuid(vcpu, val);
>>>>
>>>> kvm_write_sw_gcsr(csr, id, val);
>>>>
>>>> @@ -943,6 +1033,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>>>> hrtimer_cancel(&vcpu->arch.swtimer);
>>>> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
>>>> kfree(vcpu->arch.csr);
>>>> + kvm_drop_cpuid(vcpu);
>>> I think this line should be before the above kfree(), otherwise you
>>> get a "use after free".
>>>
>>> Huacai
>>>
>>>>
>>>> /*
>>>> * If the vCPU is freed and reused as another vCPU, we don't want the
>>>> diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
>>>> index 0a37f6fa8f2d..6006a28653ad 100644
>>>> --- a/arch/loongarch/kvm/vm.c
>>>> +++ b/arch/loongarch/kvm/vm.c
>>>> @@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>>> if (!kvm->arch.pgd)
>>>> return -ENOMEM;
>>>>
>>>> + kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
>>>> + GFP_KERNEL_ACCOUNT);
>>>> + if (!kvm->arch.phyid_map) {
>>>> + free_page((unsigned long)kvm->arch.pgd);
>>>> + kvm->arch.pgd = NULL;
>>>> + return -ENOMEM;
>>>> + }
>>>> +
>>>> kvm_init_vmcs(kvm);
>>>> kvm->arch.gpa_size = BIT(cpu_vabits - 1);
>>>> kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
>>>> @@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>>> for (i = 0; i <= kvm->arch.root_level; i++)
>>>> kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);
>>>>
>>>> + spin_lock_init(&kvm->arch.phyid_map_lock);
>>>> return 0;
>>>> }
>>>>
>>>> @@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>>>> {
>>>> kvm_destroy_vcpus(kvm);
>>>> free_page((unsigned long)kvm->arch.pgd);
>>>> + kvfree(kvm->arch.phyid_map);
>>>> kvm->arch.pgd = NULL;
>>>> + kvm->arch.phyid_map = NULL;
>>>> }
>>>>
>>>> int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>> --
>>>> 2.39.3
>>>>
>>


2024-05-06 09:03:08

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid

On Mon, May 6, 2024 at 4:18 PM maobibo <[email protected]> wrote:
>
>
>
> On 2024/5/6 下午3:06, Huacai Chen wrote:
> > Hi, Bibo,
> >
> > On Mon, May 6, 2024 at 2:36 PM maobibo <[email protected]> wrote:
> >>
> >>
> >>
> >> On 2024/5/6 上午9:49, Huacai Chen wrote:
> >>> Hi, Bibo,
> >>>
> >>> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
> >>>>
> >>>> Physical cpuid is used for interrupt routing for irqchips such as
> >>>> ipi/msi/extioi interrupt controller. And physical cpuid is stored
> >>>> at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
> >>>> is created and physical cpuid of two vcpus cannot be the same.
> >>>>
> >>>> Different irqchips have different size declaration about physical cpuid,
> >>>> max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
> >>>> supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
> >>>> for MSI irqchip.
> >>>>
> >>>> The smallest value from all interrupt controllers is selected now,
> >>>> and the max cpuid size is defines as 256 by KVM which comes from
> >>>> extioi irqchip.
> >>>>
> >>>> Signed-off-by: Bibo Mao <[email protected]>
> >>>> ---
> >>>> arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
> >>>> arch/loongarch/include/asm/kvm_vcpu.h | 1 +
> >>>> arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
> >>>> arch/loongarch/kvm/vm.c | 11 ++++
> >>>> 4 files changed, 130 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
> >>>> index 2d62f7b0d377..3ba16ef1fe69 100644
> >>>> --- a/arch/loongarch/include/asm/kvm_host.h
> >>>> +++ b/arch/loongarch/include/asm/kvm_host.h
> >>>> @@ -64,6 +64,30 @@ struct kvm_world_switch {
> >>>>
> >>>> #define MAX_PGTABLE_LEVELS 4
> >>>>
> >>>> +/*
> >>>> + * Physical cpu id is used for interrupt routing, there are different
> >>>> + * definitions about physical cpuid on different hardwares.
> >>>> + * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
> >>>> + * For IPI HW, max dest CPUID size 1024
> >>>> + * For extioi interrupt controller, max dest CPUID size is 256
> >>>> + * For MSI interrupt controller, max supported CPUID size is 65536
> >>>> + *
> >>>> + * Currently max CPUID is defined as 256 for KVM hypervisor, in future
> >>>> + * it will be expanded to 4096, including 16 packages at most. And every
> >>>> + * package supports at most 256 vcpus
> >>>> + */
> >>>> +#define KVM_MAX_PHYID 256
> >>>> +
> >>>> +struct kvm_phyid_info {
> >>>> + struct kvm_vcpu *vcpu;
> >>>> + bool enabled;
> >>>> +};
> >>>> +
> >>>> +struct kvm_phyid_map {
> >>>> + int max_phyid;
> >>>> + struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
> >>>> +};
> >>>> +
> >>>> struct kvm_arch {
> >>>> /* Guest physical mm */
> >>>> kvm_pte_t *pgd;
> >>>> @@ -71,6 +95,8 @@ struct kvm_arch {
> >>>> unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
> >>>> unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
> >>>> unsigned int root_level;
> >>>> + spinlock_t phyid_map_lock;
> >>>> + struct kvm_phyid_map *phyid_map;
> >>>>
> >>>> s64 time_offset;
> >>>> struct kvm_context __percpu *vmcs;
> >>>> diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
> >>>> index 0cb4fdb8a9b5..9f53950959da 100644
> >>>> --- a/arch/loongarch/include/asm/kvm_vcpu.h
> >>>> +++ b/arch/loongarch/include/asm/kvm_vcpu.h
> >>>> @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
> >>>> void kvm_restore_timer(struct kvm_vcpu *vcpu);
> >>>>
> >>>> int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
> >>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
> >>>>
> >>>> /*
> >>>> * Loongarch KVM guest interrupt handling
> >>>> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
> >>>> index 3a8779065f73..b633fd28b8db 100644
> >>>> --- a/arch/loongarch/kvm/vcpu.c
> >>>> +++ b/arch/loongarch/kvm/vcpu.c
> >>>> @@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
> >>>> return 0;
> >>>> }
> >>>>
> >>>> +static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
> >>>> +{
> >>>> + int cpuid;
> >>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
> >>>> + struct kvm_phyid_map *map;
> >>>> +
> >>>> + if (val >= KVM_MAX_PHYID)
> >>>> + return -EINVAL;
> >>>> +
> >>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
> >>>> + map = vcpu->kvm->arch.phyid_map;
> >>>> + spin_lock(&vcpu->kvm->arch.phyid_map_lock);
> >>>> + if (map->phys_map[cpuid].enabled) {
> >>>> + /*
> >>>> + * Cpuid is already set before
> >>>> + * Forbid changing different cpuid at runtime
> >>>> + */
> >>>> + if (cpuid != val) {
> >>>> + /*
> >>>> + * Cpuid 0 is initial value for vcpu, maybe invalid
> >>>> + * unset value for vcpu
> >>>> + */
> >>>> + if (cpuid) {
> >>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>> + return -EINVAL;
> >>>> + }
> >>>> + } else {
> >>>> + /* Discard duplicated cpuid set */
> >>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>> + return 0;
> >>>> + }
> >>>> + }
> >>> I have changed the logic and comments when I apply, you can double
> >>> check whether it is correct.
> >> I checkout the latest version, the modification in function
> >> kvm_set_cpuid() is good for me.
> > Now the modified version is like this:
> >
> > + if (map->phys_map[cpuid].enabled) {
> > + /* Discard duplicated CPUID set operation */
> > + if (cpuid == val) {
> > + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> > + return 0;
> > + }
> > +
> > + /*
> > + * CPUID is already set before
> > + * Forbid changing different CPUID at runtime
> > + * But CPUID 0 is the initial value for vcpu, so allow
> > + * changing from 0 to others
> > + */
> > + if (cpuid) {
> > + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> > + return -EINVAL;
> > + }
> > + }
> > But I still doubt whether we should allow changing from 0 to others
> > while map->phys_map[cpuid].enabled is 1.
> It is necessary since the default sw cpuid is zero :-( And we can
> optimize it in later, such as set INVALID cpuid in function
> kvm_arch_vcpu_create() and logic will be simple in function kvm_set_cpuid().
In my opinion, if a vcpu with a uninitialized default physid=0, then
map->phys_map[cpuid].enabled should be 0, then code won't come here.
And if a vcpu with a real physid=0, then map->phys_map[cpuid].enabled
is 1, but we shouldn't allow it to change physid in this case.

Huacai

>
> Regards
> Bibo Mao
>
> >
> > Huacai
> >
> >>>
> >>>> +
> >>>> + if (map->phys_map[val].enabled) {
> >>>> + /*
> >>>> + * New cpuid is already set with other vcpu
> >>>> + * Forbid sharing the same cpuid between different vcpus
> >>>> + */
> >>>> + if (map->phys_map[val].vcpu != vcpu) {
> >>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>> + return -EINVAL;
> >>>> + }
> >>>> +
> >>>> + /* Discard duplicated cpuid set operation*/
> >>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>> + return 0;
> >>>> + }
> >>>> +
> >>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
> >>>> + map->phys_map[val].enabled = true;
> >>>> + map->phys_map[val].vcpu = vcpu;
> >>>> + if (map->max_phyid < val)
> >>>> + map->max_phyid = val;
> >>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>> + return 0;
> >>>> +}
> >>>> +
> >>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
> >>>> +{
> >>>> + struct kvm_phyid_map *map;
> >>>> +
> >>>> + if (cpuid >= KVM_MAX_PHYID)
> >>>> + return NULL;
> >>>> +
> >>>> + map = kvm->arch.phyid_map;
> >>>> + if (map->phys_map[cpuid].enabled)
> >>>> + return map->phys_map[cpuid].vcpu;
> >>>> +
> >>>> + return NULL;
> >>>> +}
> >>>> +
> >>>> +static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> + int cpuid;
> >>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
> >>>> + struct kvm_phyid_map *map;
> >>>> +
> >>>> + map = vcpu->kvm->arch.phyid_map;
> >>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
> >>>> + if (cpuid >= KVM_MAX_PHYID)
> >>>> + return;
> >>>> +
> >>>> + if (map->phys_map[cpuid].enabled) {
> >>>> + map->phys_map[cpuid].vcpu = NULL;
> >>>> + map->phys_map[cpuid].enabled = false;
> >>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
> >>>> + }
> >>>> +}
> >>> While kvm_set_cpuid() is protected by a spinlock, do kvm_drop_cpuid()
> >>> and kvm_get_vcpu_by_cpuid() also need it?
> >>>
> >> It is good to me that spinlock is added in function kvm_drop_cpuid().
> >> And thinks for the efforts.
> >>
> >> Regards
> >> Bibo Mao
> >>>> +
> >>>> static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
> >>>> {
> >>>> int ret = 0, gintc;
> >>>> @@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
> >>>> kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
> >>>>
> >>>> return ret;
> >>>> - }
> >>>> + } else if (id == LOONGARCH_CSR_CPUID)
> >>>> + return kvm_set_cpuid(vcpu, val);
> >>>>
> >>>> kvm_write_sw_gcsr(csr, id, val);
> >>>>
> >>>> @@ -943,6 +1033,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> >>>> hrtimer_cancel(&vcpu->arch.swtimer);
> >>>> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> >>>> kfree(vcpu->arch.csr);
> >>>> + kvm_drop_cpuid(vcpu);
> >>> I think this line should be before the above kfree(), otherwise you
> >>> get a "use after free".
> >>>
> >>> Huacai
> >>>
> >>>>
> >>>> /*
> >>>> * If the vCPU is freed and reused as another vCPU, we don't want the
> >>>> diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
> >>>> index 0a37f6fa8f2d..6006a28653ad 100644
> >>>> --- a/arch/loongarch/kvm/vm.c
> >>>> +++ b/arch/loongarch/kvm/vm.c
> >>>> @@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >>>> if (!kvm->arch.pgd)
> >>>> return -ENOMEM;
> >>>>
> >>>> + kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
> >>>> + GFP_KERNEL_ACCOUNT);
> >>>> + if (!kvm->arch.phyid_map) {
> >>>> + free_page((unsigned long)kvm->arch.pgd);
> >>>> + kvm->arch.pgd = NULL;
> >>>> + return -ENOMEM;
> >>>> + }
> >>>> +
> >>>> kvm_init_vmcs(kvm);
> >>>> kvm->arch.gpa_size = BIT(cpu_vabits - 1);
> >>>> kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
> >>>> @@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >>>> for (i = 0; i <= kvm->arch.root_level; i++)
> >>>> kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);
> >>>>
> >>>> + spin_lock_init(&kvm->arch.phyid_map_lock);
> >>>> return 0;
> >>>> }
> >>>>
> >>>> @@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> >>>> {
> >>>> kvm_destroy_vcpus(kvm);
> >>>> free_page((unsigned long)kvm->arch.pgd);
> >>>> + kvfree(kvm->arch.phyid_map);
> >>>> kvm->arch.pgd = NULL;
> >>>> + kvm->arch.phyid_map = NULL;
> >>>> }
> >>>>
> >>>> int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >>>> --
> >>>> 2.39.3
> >>>>
> >>
>

2024-05-06 09:36:02

by Bibo Mao

[permalink] [raw]
Subject: Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid



On 2024/5/6 下午4:59, Huacai Chen wrote:
> On Mon, May 6, 2024 at 4:18 PM maobibo <[email protected]> wrote:
>>
>>
>>
>> On 2024/5/6 下午3:06, Huacai Chen wrote:
>>> Hi, Bibo,
>>>
>>> On Mon, May 6, 2024 at 2:36 PM maobibo <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On 2024/5/6 上午9:49, Huacai Chen wrote:
>>>>> Hi, Bibo,
>>>>>
>>>>> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
>>>>>>
>>>>>> Physical cpuid is used for interrupt routing for irqchips such as
>>>>>> ipi/msi/extioi interrupt controller. And physical cpuid is stored
>>>>>> at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
>>>>>> is created and physical cpuid of two vcpus cannot be the same.
>>>>>>
>>>>>> Different irqchips have different size declaration about physical cpuid,
>>>>>> max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
>>>>>> supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
>>>>>> for MSI irqchip.
>>>>>>
>>>>>> The smallest value from all interrupt controllers is selected now,
>>>>>> and the max cpuid size is defines as 256 by KVM which comes from
>>>>>> extioi irqchip.
>>>>>>
>>>>>> Signed-off-by: Bibo Mao <[email protected]>
>>>>>> ---
>>>>>> arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
>>>>>> arch/loongarch/include/asm/kvm_vcpu.h | 1 +
>>>>>> arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
>>>>>> arch/loongarch/kvm/vm.c | 11 ++++
>>>>>> 4 files changed, 130 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
>>>>>> index 2d62f7b0d377..3ba16ef1fe69 100644
>>>>>> --- a/arch/loongarch/include/asm/kvm_host.h
>>>>>> +++ b/arch/loongarch/include/asm/kvm_host.h
>>>>>> @@ -64,6 +64,30 @@ struct kvm_world_switch {
>>>>>>
>>>>>> #define MAX_PGTABLE_LEVELS 4
>>>>>>
>>>>>> +/*
>>>>>> + * Physical cpu id is used for interrupt routing, there are different
>>>>>> + * definitions about physical cpuid on different hardwares.
>>>>>> + * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
>>>>>> + * For IPI HW, max dest CPUID size 1024
>>>>>> + * For extioi interrupt controller, max dest CPUID size is 256
>>>>>> + * For MSI interrupt controller, max supported CPUID size is 65536
>>>>>> + *
>>>>>> + * Currently max CPUID is defined as 256 for KVM hypervisor, in future
>>>>>> + * it will be expanded to 4096, including 16 packages at most. And every
>>>>>> + * package supports at most 256 vcpus
>>>>>> + */
>>>>>> +#define KVM_MAX_PHYID 256
>>>>>> +
>>>>>> +struct kvm_phyid_info {
>>>>>> + struct kvm_vcpu *vcpu;
>>>>>> + bool enabled;
>>>>>> +};
>>>>>> +
>>>>>> +struct kvm_phyid_map {
>>>>>> + int max_phyid;
>>>>>> + struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
>>>>>> +};
>>>>>> +
>>>>>> struct kvm_arch {
>>>>>> /* Guest physical mm */
>>>>>> kvm_pte_t *pgd;
>>>>>> @@ -71,6 +95,8 @@ struct kvm_arch {
>>>>>> unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
>>>>>> unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
>>>>>> unsigned int root_level;
>>>>>> + spinlock_t phyid_map_lock;
>>>>>> + struct kvm_phyid_map *phyid_map;
>>>>>>
>>>>>> s64 time_offset;
>>>>>> struct kvm_context __percpu *vmcs;
>>>>>> diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
>>>>>> index 0cb4fdb8a9b5..9f53950959da 100644
>>>>>> --- a/arch/loongarch/include/asm/kvm_vcpu.h
>>>>>> +++ b/arch/loongarch/include/asm/kvm_vcpu.h
>>>>>> @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
>>>>>> void kvm_restore_timer(struct kvm_vcpu *vcpu);
>>>>>>
>>>>>> int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
>>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
>>>>>>
>>>>>> /*
>>>>>> * Loongarch KVM guest interrupt handling
>>>>>> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
>>>>>> index 3a8779065f73..b633fd28b8db 100644
>>>>>> --- a/arch/loongarch/kvm/vcpu.c
>>>>>> +++ b/arch/loongarch/kvm/vcpu.c
>>>>>> @@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> +static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
>>>>>> +{
>>>>>> + int cpuid;
>>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
>>>>>> + struct kvm_phyid_map *map;
>>>>>> +
>>>>>> + if (val >= KVM_MAX_PHYID)
>>>>>> + return -EINVAL;
>>>>>> +
>>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
>>>>>> + map = vcpu->kvm->arch.phyid_map;
>>>>>> + spin_lock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>> + if (map->phys_map[cpuid].enabled) {
>>>>>> + /*
>>>>>> + * Cpuid is already set before
>>>>>> + * Forbid changing different cpuid at runtime
>>>>>> + */
>>>>>> + if (cpuid != val) {
>>>>>> + /*
>>>>>> + * Cpuid 0 is initial value for vcpu, maybe invalid
>>>>>> + * unset value for vcpu
>>>>>> + */
>>>>>> + if (cpuid) {
>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>> + return -EINVAL;
>>>>>> + }
>>>>>> + } else {
>>>>>> + /* Discard duplicated cpuid set */
>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>> + return 0;
>>>>>> + }
>>>>>> + }
>>>>> I have changed the logic and comments when I apply, you can double
>>>>> check whether it is correct.
>>>> I checkout the latest version, the modification in function
>>>> kvm_set_cpuid() is good for me.
>>> Now the modified version is like this:
>>>
>>> + if (map->phys_map[cpuid].enabled) {
>>> + /* Discard duplicated CPUID set operation */
>>> + if (cpuid == val) {
>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>> + return 0;
>>> + }
>>> +
>>> + /*
>>> + * CPUID is already set before
>>> + * Forbid changing different CPUID at runtime
>>> + * But CPUID 0 is the initial value for vcpu, so allow
>>> + * changing from 0 to others
>>> + */
>>> + if (cpuid) {
>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>> + return -EINVAL;
>>> + }
>>> + }
>>> But I still doubt whether we should allow changing from 0 to others
>>> while map->phys_map[cpuid].enabled is 1.
>> It is necessary since the default sw cpuid is zero :-( And we can
>> optimize it in later, such as set INVALID cpuid in function
>> kvm_arch_vcpu_create() and logic will be simple in function kvm_set_cpuid().
> In my opinion, if a vcpu with a uninitialized default physid=0, then
> map->phys_map[cpuid].enabled should be 0, then code won't come here.
> And if a vcpu with a real physid=0, then map->phys_map[cpuid].enabled
> is 1, but we shouldn't allow it to change physid in this case.
yes, that is actually a problem.

vcpu0 firstly set physid=0, and vcpu0 set physid=1 again is not allowed.
vcpu0 firstly set physid=0, and vcpu1 set physid=1 is allowed.


>
> Huacai
>
>>
>> Regards
>> Bibo Mao
>>
>>>
>>> Huacai
>>>
>>>>>
>>>>>> +
>>>>>> + if (map->phys_map[val].enabled) {
>>>>>> + /*
>>>>>> + * New cpuid is already set with other vcpu
>>>>>> + * Forbid sharing the same cpuid between different vcpus
>>>>>> + */
>>>>>> + if (map->phys_map[val].vcpu != vcpu) {
>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>> + return -EINVAL;
>>>>>> + }
>>>>>> +
>>>>>> + /* Discard duplicated cpuid set operation*/
>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>> + return 0;
>>>>>> + }
>>>>>> +
>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
>>>>>> + map->phys_map[val].enabled = true;
>>>>>> + map->phys_map[val].vcpu = vcpu;
>>>>>> + if (map->max_phyid < val)
>>>>>> + map->max_phyid = val;
>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
>>>>>> +{
>>>>>> + struct kvm_phyid_map *map;
>>>>>> +
>>>>>> + if (cpuid >= KVM_MAX_PHYID)
>>>>>> + return NULL;
>>>>>> +
>>>>>> + map = kvm->arch.phyid_map;
>>>>>> + if (map->phys_map[cpuid].enabled)
>>>>>> + return map->phys_map[cpuid].vcpu;
>>>>>> +
>>>>>> + return NULL;
>>>>>> +}
>>>>>> +
>>>>>> +static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> + int cpuid;
>>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
>>>>>> + struct kvm_phyid_map *map;
>>>>>> +
>>>>>> + map = vcpu->kvm->arch.phyid_map;
>>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
>>>>>> + if (cpuid >= KVM_MAX_PHYID)
>>>>>> + return;
>>>>>> +
>>>>>> + if (map->phys_map[cpuid].enabled) {
>>>>>> + map->phys_map[cpuid].vcpu = NULL;
>>>>>> + map->phys_map[cpuid].enabled = false;
>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
>>>>>> + }
>>>>>> +}
>>>>> While kvm_set_cpuid() is protected by a spinlock, do kvm_drop_cpuid()
>>>>> and kvm_get_vcpu_by_cpuid() also need it?
>>>>>
>>>> It is good to me that spinlock is added in function kvm_drop_cpuid().
>>>> And thinks for the efforts.
>>>>
>>>> Regards
>>>> Bibo Mao
>>>>>> +
>>>>>> static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
>>>>>> {
>>>>>> int ret = 0, gintc;
>>>>>> @@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
>>>>>> kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
>>>>>>
>>>>>> return ret;
>>>>>> - }
>>>>>> + } else if (id == LOONGARCH_CSR_CPUID)
>>>>>> + return kvm_set_cpuid(vcpu, val);
>>>>>>
>>>>>> kvm_write_sw_gcsr(csr, id, val);
>>>>>>
>>>>>> @@ -943,6 +1033,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>>>>>> hrtimer_cancel(&vcpu->arch.swtimer);
>>>>>> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
>>>>>> kfree(vcpu->arch.csr);
>>>>>> + kvm_drop_cpuid(vcpu);
>>>>> I think this line should be before the above kfree(), otherwise you
>>>>> get a "use after free".
>>>>>
>>>>> Huacai
>>>>>
>>>>>>
>>>>>> /*
>>>>>> * If the vCPU is freed and reused as another vCPU, we don't want the
>>>>>> diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
>>>>>> index 0a37f6fa8f2d..6006a28653ad 100644
>>>>>> --- a/arch/loongarch/kvm/vm.c
>>>>>> +++ b/arch/loongarch/kvm/vm.c
>>>>>> @@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>>>>> if (!kvm->arch.pgd)
>>>>>> return -ENOMEM;
>>>>>>
>>>>>> + kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
>>>>>> + GFP_KERNEL_ACCOUNT);
>>>>>> + if (!kvm->arch.phyid_map) {
>>>>>> + free_page((unsigned long)kvm->arch.pgd);
>>>>>> + kvm->arch.pgd = NULL;
>>>>>> + return -ENOMEM;
>>>>>> + }
>>>>>> +
>>>>>> kvm_init_vmcs(kvm);
>>>>>> kvm->arch.gpa_size = BIT(cpu_vabits - 1);
>>>>>> kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
>>>>>> @@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>>>>> for (i = 0; i <= kvm->arch.root_level; i++)
>>>>>> kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);
>>>>>>
>>>>>> + spin_lock_init(&kvm->arch.phyid_map_lock);
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> @@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>>>>>> {
>>>>>> kvm_destroy_vcpus(kvm);
>>>>>> free_page((unsigned long)kvm->arch.pgd);
>>>>>> + kvfree(kvm->arch.phyid_map);
>>>>>> kvm->arch.pgd = NULL;
>>>>>> + kvm->arch.phyid_map = NULL;
>>>>>> }
>>>>>>
>>>>>> int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>>>> --
>>>>>> 2.39.3
>>>>>>
>>>>
>>


2024-05-06 09:41:43

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid

On Mon, May 6, 2024 at 5:35 PM maobibo <[email protected]> wrote:
>
>
>
> On 2024/5/6 下午4:59, Huacai Chen wrote:
> > On Mon, May 6, 2024 at 4:18 PM maobibo <[email protected]> wrote:
> >>
> >>
> >>
> >> On 2024/5/6 下午3:06, Huacai Chen wrote:
> >>> Hi, Bibo,
> >>>
> >>> On Mon, May 6, 2024 at 2:36 PM maobibo <[email protected]> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2024/5/6 上午9:49, Huacai Chen wrote:
> >>>>> Hi, Bibo,
> >>>>>
> >>>>> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
> >>>>>>
> >>>>>> Physical cpuid is used for interrupt routing for irqchips such as
> >>>>>> ipi/msi/extioi interrupt controller. And physical cpuid is stored
> >>>>>> at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
> >>>>>> is created and physical cpuid of two vcpus cannot be the same.
> >>>>>>
> >>>>>> Different irqchips have different size declaration about physical cpuid,
> >>>>>> max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
> >>>>>> supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
> >>>>>> for MSI irqchip.
> >>>>>>
> >>>>>> The smallest value from all interrupt controllers is selected now,
> >>>>>> and the max cpuid size is defines as 256 by KVM which comes from
> >>>>>> extioi irqchip.
> >>>>>>
> >>>>>> Signed-off-by: Bibo Mao <[email protected]>
> >>>>>> ---
> >>>>>> arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
> >>>>>> arch/loongarch/include/asm/kvm_vcpu.h | 1 +
> >>>>>> arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
> >>>>>> arch/loongarch/kvm/vm.c | 11 ++++
> >>>>>> 4 files changed, 130 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
> >>>>>> index 2d62f7b0d377..3ba16ef1fe69 100644
> >>>>>> --- a/arch/loongarch/include/asm/kvm_host.h
> >>>>>> +++ b/arch/loongarch/include/asm/kvm_host.h
> >>>>>> @@ -64,6 +64,30 @@ struct kvm_world_switch {
> >>>>>>
> >>>>>> #define MAX_PGTABLE_LEVELS 4
> >>>>>>
> >>>>>> +/*
> >>>>>> + * Physical cpu id is used for interrupt routing, there are different
> >>>>>> + * definitions about physical cpuid on different hardwares.
> >>>>>> + * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
> >>>>>> + * For IPI HW, max dest CPUID size 1024
> >>>>>> + * For extioi interrupt controller, max dest CPUID size is 256
> >>>>>> + * For MSI interrupt controller, max supported CPUID size is 65536
> >>>>>> + *
> >>>>>> + * Currently max CPUID is defined as 256 for KVM hypervisor, in future
> >>>>>> + * it will be expanded to 4096, including 16 packages at most. And every
> >>>>>> + * package supports at most 256 vcpus
> >>>>>> + */
> >>>>>> +#define KVM_MAX_PHYID 256
> >>>>>> +
> >>>>>> +struct kvm_phyid_info {
> >>>>>> + struct kvm_vcpu *vcpu;
> >>>>>> + bool enabled;
> >>>>>> +};
> >>>>>> +
> >>>>>> +struct kvm_phyid_map {
> >>>>>> + int max_phyid;
> >>>>>> + struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
> >>>>>> +};
> >>>>>> +
> >>>>>> struct kvm_arch {
> >>>>>> /* Guest physical mm */
> >>>>>> kvm_pte_t *pgd;
> >>>>>> @@ -71,6 +95,8 @@ struct kvm_arch {
> >>>>>> unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
> >>>>>> unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
> >>>>>> unsigned int root_level;
> >>>>>> + spinlock_t phyid_map_lock;
> >>>>>> + struct kvm_phyid_map *phyid_map;
> >>>>>>
> >>>>>> s64 time_offset;
> >>>>>> struct kvm_context __percpu *vmcs;
> >>>>>> diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
> >>>>>> index 0cb4fdb8a9b5..9f53950959da 100644
> >>>>>> --- a/arch/loongarch/include/asm/kvm_vcpu.h
> >>>>>> +++ b/arch/loongarch/include/asm/kvm_vcpu.h
> >>>>>> @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
> >>>>>> void kvm_restore_timer(struct kvm_vcpu *vcpu);
> >>>>>>
> >>>>>> int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
> >>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
> >>>>>>
> >>>>>> /*
> >>>>>> * Loongarch KVM guest interrupt handling
> >>>>>> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
> >>>>>> index 3a8779065f73..b633fd28b8db 100644
> >>>>>> --- a/arch/loongarch/kvm/vcpu.c
> >>>>>> +++ b/arch/loongarch/kvm/vcpu.c
> >>>>>> @@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
> >>>>>> return 0;
> >>>>>> }
> >>>>>>
> >>>>>> +static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
> >>>>>> +{
> >>>>>> + int cpuid;
> >>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
> >>>>>> + struct kvm_phyid_map *map;
> >>>>>> +
> >>>>>> + if (val >= KVM_MAX_PHYID)
> >>>>>> + return -EINVAL;
> >>>>>> +
> >>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
> >>>>>> + map = vcpu->kvm->arch.phyid_map;
> >>>>>> + spin_lock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>> + if (map->phys_map[cpuid].enabled) {
> >>>>>> + /*
> >>>>>> + * Cpuid is already set before
> >>>>>> + * Forbid changing different cpuid at runtime
> >>>>>> + */
> >>>>>> + if (cpuid != val) {
> >>>>>> + /*
> >>>>>> + * Cpuid 0 is initial value for vcpu, maybe invalid
> >>>>>> + * unset value for vcpu
> >>>>>> + */
> >>>>>> + if (cpuid) {
> >>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>> + return -EINVAL;
> >>>>>> + }
> >>>>>> + } else {
> >>>>>> + /* Discard duplicated cpuid set */
> >>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>> + return 0;
> >>>>>> + }
> >>>>>> + }
> >>>>> I have changed the logic and comments when I apply, you can double
> >>>>> check whether it is correct.
> >>>> I checkout the latest version, the modification in function
> >>>> kvm_set_cpuid() is good for me.
> >>> Now the modified version is like this:
> >>>
> >>> + if (map->phys_map[cpuid].enabled) {
> >>> + /* Discard duplicated CPUID set operation */
> >>> + if (cpuid == val) {
> >>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>> + return 0;
> >>> + }
> >>> +
> >>> + /*
> >>> + * CPUID is already set before
> >>> + * Forbid changing different CPUID at runtime
> >>> + * But CPUID 0 is the initial value for vcpu, so allow
> >>> + * changing from 0 to others
> >>> + */
> >>> + if (cpuid) {
> >>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>> + return -EINVAL;
> >>> + }
> >>> + }
> >>> But I still doubt whether we should allow changing from 0 to others
> >>> while map->phys_map[cpuid].enabled is 1.
> >> It is necessary since the default sw cpuid is zero :-( And we can
> >> optimize it in later, such as set INVALID cpuid in function
> >> kvm_arch_vcpu_create() and logic will be simple in function kvm_set_cpuid().
> > In my opinion, if a vcpu with a uninitialized default physid=0, then
> > map->phys_map[cpuid].enabled should be 0, then code won't come here.
> > And if a vcpu with a real physid=0, then map->phys_map[cpuid].enabled
> > is 1, but we shouldn't allow it to change physid in this case.
> yes, that is actually a problem.
>
> vcpu0 firstly set physid=0, and vcpu0 set physid=1 again is not allowed.
> vcpu0 firstly set physid=0, and vcpu1 set physid=1 is allowed.

So can we simply drop the if (cpuid) checking? That means:
+ if (map->phys_map[cpuid].enabled) {
+ /* Discard duplicated CPUID set operation */
+ if (cpuid == val) {
+ spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+ return 0;
+ }
+
+ spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+ return -EINVAL;
+ }

Huacai

>
>
> >
> > Huacai
> >
> >>
> >> Regards
> >> Bibo Mao
> >>
> >>>
> >>> Huacai
> >>>
> >>>>>
> >>>>>> +
> >>>>>> + if (map->phys_map[val].enabled) {
> >>>>>> + /*
> >>>>>> + * New cpuid is already set with other vcpu
> >>>>>> + * Forbid sharing the same cpuid between different vcpus
> >>>>>> + */
> >>>>>> + if (map->phys_map[val].vcpu != vcpu) {
> >>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>> + return -EINVAL;
> >>>>>> + }
> >>>>>> +
> >>>>>> + /* Discard duplicated cpuid set operation*/
> >>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>> + return 0;
> >>>>>> + }
> >>>>>> +
> >>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
> >>>>>> + map->phys_map[val].enabled = true;
> >>>>>> + map->phys_map[val].vcpu = vcpu;
> >>>>>> + if (map->max_phyid < val)
> >>>>>> + map->max_phyid = val;
> >>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>> + return 0;
> >>>>>> +}
> >>>>>> +
> >>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
> >>>>>> +{
> >>>>>> + struct kvm_phyid_map *map;
> >>>>>> +
> >>>>>> + if (cpuid >= KVM_MAX_PHYID)
> >>>>>> + return NULL;
> >>>>>> +
> >>>>>> + map = kvm->arch.phyid_map;
> >>>>>> + if (map->phys_map[cpuid].enabled)
> >>>>>> + return map->phys_map[cpuid].vcpu;
> >>>>>> +
> >>>>>> + return NULL;
> >>>>>> +}
> >>>>>> +
> >>>>>> +static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
> >>>>>> +{
> >>>>>> + int cpuid;
> >>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
> >>>>>> + struct kvm_phyid_map *map;
> >>>>>> +
> >>>>>> + map = vcpu->kvm->arch.phyid_map;
> >>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
> >>>>>> + if (cpuid >= KVM_MAX_PHYID)
> >>>>>> + return;
> >>>>>> +
> >>>>>> + if (map->phys_map[cpuid].enabled) {
> >>>>>> + map->phys_map[cpuid].vcpu = NULL;
> >>>>>> + map->phys_map[cpuid].enabled = false;
> >>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
> >>>>>> + }
> >>>>>> +}
> >>>>> While kvm_set_cpuid() is protected by a spinlock, do kvm_drop_cpuid()
> >>>>> and kvm_get_vcpu_by_cpuid() also need it?
> >>>>>
> >>>> It is good to me that spinlock is added in function kvm_drop_cpuid().
> >>>> And thinks for the efforts.
> >>>>
> >>>> Regards
> >>>> Bibo Mao
> >>>>>> +
> >>>>>> static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
> >>>>>> {
> >>>>>> int ret = 0, gintc;
> >>>>>> @@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
> >>>>>> kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
> >>>>>>
> >>>>>> return ret;
> >>>>>> - }
> >>>>>> + } else if (id == LOONGARCH_CSR_CPUID)
> >>>>>> + return kvm_set_cpuid(vcpu, val);
> >>>>>>
> >>>>>> kvm_write_sw_gcsr(csr, id, val);
> >>>>>>
> >>>>>> @@ -943,6 +1033,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> >>>>>> hrtimer_cancel(&vcpu->arch.swtimer);
> >>>>>> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> >>>>>> kfree(vcpu->arch.csr);
> >>>>>> + kvm_drop_cpuid(vcpu);
> >>>>> I think this line should be before the above kfree(), otherwise you
> >>>>> get a "use after free".
> >>>>>
> >>>>> Huacai
> >>>>>
> >>>>>>
> >>>>>> /*
> >>>>>> * If the vCPU is freed and reused as another vCPU, we don't want the
> >>>>>> diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
> >>>>>> index 0a37f6fa8f2d..6006a28653ad 100644
> >>>>>> --- a/arch/loongarch/kvm/vm.c
> >>>>>> +++ b/arch/loongarch/kvm/vm.c
> >>>>>> @@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >>>>>> if (!kvm->arch.pgd)
> >>>>>> return -ENOMEM;
> >>>>>>
> >>>>>> + kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
> >>>>>> + GFP_KERNEL_ACCOUNT);
> >>>>>> + if (!kvm->arch.phyid_map) {
> >>>>>> + free_page((unsigned long)kvm->arch.pgd);
> >>>>>> + kvm->arch.pgd = NULL;
> >>>>>> + return -ENOMEM;
> >>>>>> + }
> >>>>>> +
> >>>>>> kvm_init_vmcs(kvm);
> >>>>>> kvm->arch.gpa_size = BIT(cpu_vabits - 1);
> >>>>>> kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
> >>>>>> @@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >>>>>> for (i = 0; i <= kvm->arch.root_level; i++)
> >>>>>> kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);
> >>>>>>
> >>>>>> + spin_lock_init(&kvm->arch.phyid_map_lock);
> >>>>>> return 0;
> >>>>>> }
> >>>>>>
> >>>>>> @@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> >>>>>> {
> >>>>>> kvm_destroy_vcpus(kvm);
> >>>>>> free_page((unsigned long)kvm->arch.pgd);
> >>>>>> + kvfree(kvm->arch.phyid_map);
> >>>>>> kvm->arch.pgd = NULL;
> >>>>>> + kvm->arch.phyid_map = NULL;
> >>>>>> }
> >>>>>>
> >>>>>> int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >>>>>> --
> >>>>>> 2.39.3
> >>>>>>
> >>>>
> >>
>

2024-05-06 10:06:21

by Bibo Mao

[permalink] [raw]
Subject: Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid



On 2024/5/6 下午5:40, Huacai Chen wrote:
> On Mon, May 6, 2024 at 5:35 PM maobibo <[email protected]> wrote:
>>
>>
>>
>> On 2024/5/6 下午4:59, Huacai Chen wrote:
>>> On Mon, May 6, 2024 at 4:18 PM maobibo <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On 2024/5/6 下午3:06, Huacai Chen wrote:
>>>>> Hi, Bibo,
>>>>>
>>>>> On Mon, May 6, 2024 at 2:36 PM maobibo <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2024/5/6 上午9:49, Huacai Chen wrote:
>>>>>>> Hi, Bibo,
>>>>>>>
>>>>>>> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Physical cpuid is used for interrupt routing for irqchips such as
>>>>>>>> ipi/msi/extioi interrupt controller. And physical cpuid is stored
>>>>>>>> at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
>>>>>>>> is created and physical cpuid of two vcpus cannot be the same.
>>>>>>>>
>>>>>>>> Different irqchips have different size declaration about physical cpuid,
>>>>>>>> max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
>>>>>>>> supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
>>>>>>>> for MSI irqchip.
>>>>>>>>
>>>>>>>> The smallest value from all interrupt controllers is selected now,
>>>>>>>> and the max cpuid size is defines as 256 by KVM which comes from
>>>>>>>> extioi irqchip.
>>>>>>>>
>>>>>>>> Signed-off-by: Bibo Mao <[email protected]>
>>>>>>>> ---
>>>>>>>> arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
>>>>>>>> arch/loongarch/include/asm/kvm_vcpu.h | 1 +
>>>>>>>> arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
>>>>>>>> arch/loongarch/kvm/vm.c | 11 ++++
>>>>>>>> 4 files changed, 130 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
>>>>>>>> index 2d62f7b0d377..3ba16ef1fe69 100644
>>>>>>>> --- a/arch/loongarch/include/asm/kvm_host.h
>>>>>>>> +++ b/arch/loongarch/include/asm/kvm_host.h
>>>>>>>> @@ -64,6 +64,30 @@ struct kvm_world_switch {
>>>>>>>>
>>>>>>>> #define MAX_PGTABLE_LEVELS 4
>>>>>>>>
>>>>>>>> +/*
>>>>>>>> + * Physical cpu id is used for interrupt routing, there are different
>>>>>>>> + * definitions about physical cpuid on different hardwares.
>>>>>>>> + * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
>>>>>>>> + * For IPI HW, max dest CPUID size 1024
>>>>>>>> + * For extioi interrupt controller, max dest CPUID size is 256
>>>>>>>> + * For MSI interrupt controller, max supported CPUID size is 65536
>>>>>>>> + *
>>>>>>>> + * Currently max CPUID is defined as 256 for KVM hypervisor, in future
>>>>>>>> + * it will be expanded to 4096, including 16 packages at most. And every
>>>>>>>> + * package supports at most 256 vcpus
>>>>>>>> + */
>>>>>>>> +#define KVM_MAX_PHYID 256
>>>>>>>> +
>>>>>>>> +struct kvm_phyid_info {
>>>>>>>> + struct kvm_vcpu *vcpu;
>>>>>>>> + bool enabled;
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +struct kvm_phyid_map {
>>>>>>>> + int max_phyid;
>>>>>>>> + struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> struct kvm_arch {
>>>>>>>> /* Guest physical mm */
>>>>>>>> kvm_pte_t *pgd;
>>>>>>>> @@ -71,6 +95,8 @@ struct kvm_arch {
>>>>>>>> unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
>>>>>>>> unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
>>>>>>>> unsigned int root_level;
>>>>>>>> + spinlock_t phyid_map_lock;
>>>>>>>> + struct kvm_phyid_map *phyid_map;
>>>>>>>>
>>>>>>>> s64 time_offset;
>>>>>>>> struct kvm_context __percpu *vmcs;
>>>>>>>> diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
>>>>>>>> index 0cb4fdb8a9b5..9f53950959da 100644
>>>>>>>> --- a/arch/loongarch/include/asm/kvm_vcpu.h
>>>>>>>> +++ b/arch/loongarch/include/asm/kvm_vcpu.h
>>>>>>>> @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
>>>>>>>> void kvm_restore_timer(struct kvm_vcpu *vcpu);
>>>>>>>>
>>>>>>>> int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
>>>>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
>>>>>>>>
>>>>>>>> /*
>>>>>>>> * Loongarch KVM guest interrupt handling
>>>>>>>> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
>>>>>>>> index 3a8779065f73..b633fd28b8db 100644
>>>>>>>> --- a/arch/loongarch/kvm/vcpu.c
>>>>>>>> +++ b/arch/loongarch/kvm/vcpu.c
>>>>>>>> @@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>> +static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
>>>>>>>> +{
>>>>>>>> + int cpuid;
>>>>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
>>>>>>>> + struct kvm_phyid_map *map;
>>>>>>>> +
>>>>>>>> + if (val >= KVM_MAX_PHYID)
>>>>>>>> + return -EINVAL;
>>>>>>>> +
>>>>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
>>>>>>>> + map = vcpu->kvm->arch.phyid_map;
>>>>>>>> + spin_lock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>> + if (map->phys_map[cpuid].enabled) {
>>>>>>>> + /*
>>>>>>>> + * Cpuid is already set before
>>>>>>>> + * Forbid changing different cpuid at runtime
>>>>>>>> + */
>>>>>>>> + if (cpuid != val) {
>>>>>>>> + /*
>>>>>>>> + * Cpuid 0 is initial value for vcpu, maybe invalid
>>>>>>>> + * unset value for vcpu
>>>>>>>> + */
>>>>>>>> + if (cpuid) {
>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>> + return -EINVAL;
>>>>>>>> + }
>>>>>>>> + } else {
>>>>>>>> + /* Discard duplicated cpuid set */
>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>> + return 0;
>>>>>>>> + }
>>>>>>>> + }
>>>>>>> I have changed the logic and comments when I apply, you can double
>>>>>>> check whether it is correct.
>>>>>> I checkout the latest version, the modification in function
>>>>>> kvm_set_cpuid() is good for me.
>>>>> Now the modified version is like this:
>>>>>
>>>>> + if (map->phys_map[cpuid].enabled) {
>>>>> + /* Discard duplicated CPUID set operation */
>>>>> + if (cpuid == val) {
>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>> + return 0;
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> + * CPUID is already set before
>>>>> + * Forbid changing different CPUID at runtime
>>>>> + * But CPUID 0 is the initial value for vcpu, so allow
>>>>> + * changing from 0 to others
>>>>> + */
>>>>> + if (cpuid) {
>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>> + return -EINVAL;
>>>>> + }
>>>>> + }
>>>>> But I still doubt whether we should allow changing from 0 to others
>>>>> while map->phys_map[cpuid].enabled is 1.
>>>> It is necessary since the default sw cpuid is zero :-( And we can
>>>> optimize it in later, such as set INVALID cpuid in function
>>>> kvm_arch_vcpu_create() and logic will be simple in function kvm_set_cpuid().
>>> In my opinion, if a vcpu with a uninitialized default physid=0, then
>>> map->phys_map[cpuid].enabled should be 0, then code won't come here.
>>> And if a vcpu with a real physid=0, then map->phys_map[cpuid].enabled
>>> is 1, but we shouldn't allow it to change physid in this case.
>> yes, that is actually a problem.
>>
>> vcpu0 firstly set physid=0, and vcpu0 set physid=1 again is not allowed.
>> vcpu0 firstly set physid=0, and vcpu1 set physid=1 is allowed.
>
> So can we simply drop the if (cpuid) checking? That means:
> + if (map->phys_map[cpuid].enabled) {
> + /* Discard duplicated CPUID set operation */
> + if (cpuid == val) {
> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> + return 0;
> + }
> +
> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> + return -EINVAL;
> + }
yes, the similar modification such as following, since the secondary
scenario should be allowed.
"vcpu0 firstly set physid=0, and vcpu1 set physid=1 is allowed though
default sw cpuid is zero"

--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -272,7 +272,7 @@ static inline int kvm_set_cpuid(struct kvm_vcpu
*vcpu, u64 val)
cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_CPUID);

spin_lock(&vcpu->kvm->arch.phyid_map_lock);
- if (map->phys_map[cpuid].enabled) {
+ if ((cpuid != KVM_MAX_PHYID) && map->phys_map[cpuid].enabled) {
/* Discard duplicated CPUID set operation */
if (cpuid == val) {
spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
@@ -282,13 +282,9 @@ static inline int kvm_set_cpuid(struct kvm_vcpu
*vcpu, u64 val)
/*
* CPUID is already set before
* Forbid changing different CPUID at runtime
- * But CPUID 0 is the initial value for vcpu, so allow
- * changing from 0 to others
*/
- if (cpuid) {
- spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
- return -EINVAL;
- }
+ spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+ return -EINVAL;
}

if (map->phys_map[val].enabled) {
@@ -1029,6 +1025,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)

/* Set cpuid */
kvm_write_sw_gcsr(csr, LOONGARCH_CSR_TMID, vcpu->vcpu_id);
+ kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, KVM_MAX_PHYID);

/* Start with no pending virtual guest interrupts */
csr->csrs[LOONGARCH_CSR_GINTC] = 0;


>
> Huacai
>
>>
>>
>>>
>>> Huacai
>>>
>>>>
>>>> Regards
>>>> Bibo Mao
>>>>
>>>>>
>>>>> Huacai
>>>>>
>>>>>>>
>>>>>>>> +
>>>>>>>> + if (map->phys_map[val].enabled) {
>>>>>>>> + /*
>>>>>>>> + * New cpuid is already set with other vcpu
>>>>>>>> + * Forbid sharing the same cpuid between different vcpus
>>>>>>>> + */
>>>>>>>> + if (map->phys_map[val].vcpu != vcpu) {
>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>> + return -EINVAL;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + /* Discard duplicated cpuid set operation*/
>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>> + return 0;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
>>>>>>>> + map->phys_map[val].enabled = true;
>>>>>>>> + map->phys_map[val].vcpu = vcpu;
>>>>>>>> + if (map->max_phyid < val)
>>>>>>>> + map->max_phyid = val;
>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>> + return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
>>>>>>>> +{
>>>>>>>> + struct kvm_phyid_map *map;
>>>>>>>> +
>>>>>>>> + if (cpuid >= KVM_MAX_PHYID)
>>>>>>>> + return NULL;
>>>>>>>> +
>>>>>>>> + map = kvm->arch.phyid_map;
>>>>>>>> + if (map->phys_map[cpuid].enabled)
>>>>>>>> + return map->phys_map[cpuid].vcpu;
>>>>>>>> +
>>>>>>>> + return NULL;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
>>>>>>>> +{
>>>>>>>> + int cpuid;
>>>>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
>>>>>>>> + struct kvm_phyid_map *map;
>>>>>>>> +
>>>>>>>> + map = vcpu->kvm->arch.phyid_map;
>>>>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
>>>>>>>> + if (cpuid >= KVM_MAX_PHYID)
>>>>>>>> + return;
>>>>>>>> +
>>>>>>>> + if (map->phys_map[cpuid].enabled) {
>>>>>>>> + map->phys_map[cpuid].vcpu = NULL;
>>>>>>>> + map->phys_map[cpuid].enabled = false;
>>>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
>>>>>>>> + }
>>>>>>>> +}
>>>>>>> While kvm_set_cpuid() is protected by a spinlock, do kvm_drop_cpuid()
>>>>>>> and kvm_get_vcpu_by_cpuid() also need it?
>>>>>>>
>>>>>> It is good to me that spinlock is added in function kvm_drop_cpuid().
>>>>>> And thinks for the efforts.
>>>>>>
>>>>>> Regards
>>>>>> Bibo Mao
>>>>>>>> +
>>>>>>>> static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
>>>>>>>> {
>>>>>>>> int ret = 0, gintc;
>>>>>>>> @@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
>>>>>>>> kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
>>>>>>>>
>>>>>>>> return ret;
>>>>>>>> - }
>>>>>>>> + } else if (id == LOONGARCH_CSR_CPUID)
>>>>>>>> + return kvm_set_cpuid(vcpu, val);
>>>>>>>>
>>>>>>>> kvm_write_sw_gcsr(csr, id, val);
>>>>>>>>
>>>>>>>> @@ -943,6 +1033,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>>>>>>>> hrtimer_cancel(&vcpu->arch.swtimer);
>>>>>>>> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
>>>>>>>> kfree(vcpu->arch.csr);
>>>>>>>> + kvm_drop_cpuid(vcpu);
>>>>>>> I think this line should be before the above kfree(), otherwise you
>>>>>>> get a "use after free".
>>>>>>>
>>>>>>> Huacai
>>>>>>>
>>>>>>>>
>>>>>>>> /*
>>>>>>>> * If the vCPU is freed and reused as another vCPU, we don't want the
>>>>>>>> diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
>>>>>>>> index 0a37f6fa8f2d..6006a28653ad 100644
>>>>>>>> --- a/arch/loongarch/kvm/vm.c
>>>>>>>> +++ b/arch/loongarch/kvm/vm.c
>>>>>>>> @@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>>>>>>> if (!kvm->arch.pgd)
>>>>>>>> return -ENOMEM;
>>>>>>>>
>>>>>>>> + kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
>>>>>>>> + GFP_KERNEL_ACCOUNT);
>>>>>>>> + if (!kvm->arch.phyid_map) {
>>>>>>>> + free_page((unsigned long)kvm->arch.pgd);
>>>>>>>> + kvm->arch.pgd = NULL;
>>>>>>>> + return -ENOMEM;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> kvm_init_vmcs(kvm);
>>>>>>>> kvm->arch.gpa_size = BIT(cpu_vabits - 1);
>>>>>>>> kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
>>>>>>>> @@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>>>>>>> for (i = 0; i <= kvm->arch.root_level; i++)
>>>>>>>> kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);
>>>>>>>>
>>>>>>>> + spin_lock_init(&kvm->arch.phyid_map_lock);
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>> @@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>>>>>>>> {
>>>>>>>> kvm_destroy_vcpus(kvm);
>>>>>>>> free_page((unsigned long)kvm->arch.pgd);
>>>>>>>> + kvfree(kvm->arch.phyid_map);
>>>>>>>> kvm->arch.pgd = NULL;
>>>>>>>> + kvm->arch.phyid_map = NULL;
>>>>>>>> }
>>>>>>>>
>>>>>>>> int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>>>>>> --
>>>>>>>> 2.39.3
>>>>>>>>
>>>>>>
>>>>
>>


2024-05-06 14:18:34

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid

On Mon, May 6, 2024 at 6:05 PM maobibo <[email protected]> wrote:
>
>
>
> On 2024/5/6 下午5:40, Huacai Chen wrote:
> > On Mon, May 6, 2024 at 5:35 PM maobibo <[email protected]> wrote:
> >>
> >>
> >>
> >> On 2024/5/6 下午4:59, Huacai Chen wrote:
> >>> On Mon, May 6, 2024 at 4:18 PM maobibo <[email protected]> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2024/5/6 下午3:06, Huacai Chen wrote:
> >>>>> Hi, Bibo,
> >>>>>
> >>>>> On Mon, May 6, 2024 at 2:36 PM maobibo <[email protected]> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 2024/5/6 上午9:49, Huacai Chen wrote:
> >>>>>>> Hi, Bibo,
> >>>>>>>
> >>>>>>> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>> Physical cpuid is used for interrupt routing for irqchips such as
> >>>>>>>> ipi/msi/extioi interrupt controller. And physical cpuid is stored
> >>>>>>>> at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
> >>>>>>>> is created and physical cpuid of two vcpus cannot be the same.
> >>>>>>>>
> >>>>>>>> Different irqchips have different size declaration about physical cpuid,
> >>>>>>>> max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
> >>>>>>>> supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
> >>>>>>>> for MSI irqchip.
> >>>>>>>>
> >>>>>>>> The smallest value from all interrupt controllers is selected now,
> >>>>>>>> and the max cpuid size is defines as 256 by KVM which comes from
> >>>>>>>> extioi irqchip.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Bibo Mao <[email protected]>
> >>>>>>>> ---
> >>>>>>>> arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
> >>>>>>>> arch/loongarch/include/asm/kvm_vcpu.h | 1 +
> >>>>>>>> arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
> >>>>>>>> arch/loongarch/kvm/vm.c | 11 ++++
> >>>>>>>> 4 files changed, 130 insertions(+), 1 deletion(-)
> >>>>>>>>
> >>>>>>>> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
> >>>>>>>> index 2d62f7b0d377..3ba16ef1fe69 100644
> >>>>>>>> --- a/arch/loongarch/include/asm/kvm_host.h
> >>>>>>>> +++ b/arch/loongarch/include/asm/kvm_host.h
> >>>>>>>> @@ -64,6 +64,30 @@ struct kvm_world_switch {
> >>>>>>>>
> >>>>>>>> #define MAX_PGTABLE_LEVELS 4
> >>>>>>>>
> >>>>>>>> +/*
> >>>>>>>> + * Physical cpu id is used for interrupt routing, there are different
> >>>>>>>> + * definitions about physical cpuid on different hardwares.
> >>>>>>>> + * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
> >>>>>>>> + * For IPI HW, max dest CPUID size 1024
> >>>>>>>> + * For extioi interrupt controller, max dest CPUID size is 256
> >>>>>>>> + * For MSI interrupt controller, max supported CPUID size is 65536
> >>>>>>>> + *
> >>>>>>>> + * Currently max CPUID is defined as 256 for KVM hypervisor, in future
> >>>>>>>> + * it will be expanded to 4096, including 16 packages at most. And every
> >>>>>>>> + * package supports at most 256 vcpus
> >>>>>>>> + */
> >>>>>>>> +#define KVM_MAX_PHYID 256
> >>>>>>>> +
> >>>>>>>> +struct kvm_phyid_info {
> >>>>>>>> + struct kvm_vcpu *vcpu;
> >>>>>>>> + bool enabled;
> >>>>>>>> +};
> >>>>>>>> +
> >>>>>>>> +struct kvm_phyid_map {
> >>>>>>>> + int max_phyid;
> >>>>>>>> + struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
> >>>>>>>> +};
> >>>>>>>> +
> >>>>>>>> struct kvm_arch {
> >>>>>>>> /* Guest physical mm */
> >>>>>>>> kvm_pte_t *pgd;
> >>>>>>>> @@ -71,6 +95,8 @@ struct kvm_arch {
> >>>>>>>> unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
> >>>>>>>> unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
> >>>>>>>> unsigned int root_level;
> >>>>>>>> + spinlock_t phyid_map_lock;
> >>>>>>>> + struct kvm_phyid_map *phyid_map;
> >>>>>>>>
> >>>>>>>> s64 time_offset;
> >>>>>>>> struct kvm_context __percpu *vmcs;
> >>>>>>>> diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
> >>>>>>>> index 0cb4fdb8a9b5..9f53950959da 100644
> >>>>>>>> --- a/arch/loongarch/include/asm/kvm_vcpu.h
> >>>>>>>> +++ b/arch/loongarch/include/asm/kvm_vcpu.h
> >>>>>>>> @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
> >>>>>>>> void kvm_restore_timer(struct kvm_vcpu *vcpu);
> >>>>>>>>
> >>>>>>>> int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
> >>>>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
> >>>>>>>>
> >>>>>>>> /*
> >>>>>>>> * Loongarch KVM guest interrupt handling
> >>>>>>>> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpuc
> >>>>>>>> index 3a8779065f73..b633fd28b8db 100644
> >>>>>>>> --- a/arch/loongarch/kvm/vcpu.c
> >>>>>>>> +++ b/arch/loongarch/kvm/vcpu.c
> >>>>>>>> @@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
> >>>>>>>> return 0;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> +static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
> >>>>>>>> +{
> >>>>>>>> + int cpuid;
> >>>>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
> >>>>>>>> + struct kvm_phyid_map *map;
> >>>>>>>> +
> >>>>>>>> + if (val >= KVM_MAX_PHYID)
> >>>>>>>> + return -EINVAL;
> >>>>>>>> +
> >>>>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
> >>>>>>>> + map = vcpu->kvm->arch.phyid_map;
> >>>>>>>> + spin_lock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>> + if (map->phys_map[cpuid].enabled) {
> >>>>>>>> + /*
> >>>>>>>> + * Cpuid is already set before
> >>>>>>>> + * Forbid changing different cpuid at runtime
> >>>>>>>> + */
> >>>>>>>> + if (cpuid != val) {
> >>>>>>>> + /*
> >>>>>>>> + * Cpuid 0 is initial value for vcpu, maybe invalid
> >>>>>>>> + * unset value for vcpu
> >>>>>>>> + */
> >>>>>>>> + if (cpuid) {
> >>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>> + return -EINVAL;
> >>>>>>>> + }
> >>>>>>>> + } else {
> >>>>>>>> + /* Discard duplicated cpuid set */
> >>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>> + return 0;
> >>>>>>>> + }
> >>>>>>>> + }
> >>>>>>> I have changed the logic and comments when I apply, you can double
> >>>>>>> check whether it is correct.
> >>>>>> I checkout the latest version, the modification in function
> >>>>>> kvm_set_cpuid() is good for me.
> >>>>> Now the modified version is like this:
> >>>>>
> >>>>> + if (map->phys_map[cpuid].enabled) {
> >>>>> + /* Discard duplicated CPUID set operation */
> >>>>> + if (cpuid == val) {
> >>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>> + return 0;
> >>>>> + }
> >>>>> +
> >>>>> + /*
> >>>>> + * CPUID is already set before
> >>>>> + * Forbid changing different CPUID at runtime
> >>>>> + * But CPUID 0 is the initial value for vcpu, so allow
> >>>>> + * changing from 0 to others
> >>>>> + */
> >>>>> + if (cpuid) {
> >>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>> + return -EINVAL;
> >>>>> + }
> >>>>> + }
> >>>>> But I still doubt whether we should allow changing from 0 to others
> >>>>> while map->phys_map[cpuid].enabled is 1.
> >>>> It is necessary since the default sw cpuid is zero :-( And we can
> >>>> optimize it in later, such as set INVALID cpuid in function
> >>>> kvm_arch_vcpu_create() and logic will be simple in function kvm_set_cpuid().
> >>> In my opinion, if a vcpu with a uninitialized default physid=0, then
> >>> map->phys_map[cpuid].enabled should be 0, then code won't come here.
> >>> And if a vcpu with a real physid=0, then map->phys_map[cpuid].enabled
> >>> is 1, but we shouldn't allow it to change physid in this case.
> >> yes, that is actually a problem.
> >>
> >> vcpu0 firstly set physid=0, and vcpu0 set physid=1 again is not allowed.
> >> vcpu0 firstly set physid=0, and vcpu1 set physid=1 is allowed.
> >
> > So can we simply drop the if (cpuid) checking? That means:
> > + if (map->phys_map[cpuid].enabled) {
> > + /* Discard duplicated CPUID set operation */
> > + if (cpuid == val) {
> > + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> > + return 0;
> > + }
> > +
> > + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> > + return -EINVAL;
> > + }
> yes, the similar modification such as following, since the secondary
> scenario should be allowed.
> "vcpu0 firstly set physid=0, and vcpu1 set physid=1 is allowed though
> default sw cpuid is zero"
>
> --- a/arch/loongarch/kvm/vcpu.c
> +++ b/arch/loongarch/kvm/vcpu.c
> @@ -272,7 +272,7 @@ static inline int kvm_set_cpuid(struct kvm_vcpu
> *vcpu, u64 val)
> cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_CPUID);
>
> spin_lock(&vcpu->kvm->arch.phyid_map_lock);
> - if (map->phys_map[cpuid].enabled) {
> + if ((cpuid != KVM_MAX_PHYID) && map->phys_map[cpuid].enabled) {
> /* Discard duplicated CPUID set operation */
> if (cpuid == val) {
> spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> @@ -282,13 +282,9 @@ static inline int kvm_set_cpuid(struct kvm_vcpu
> *vcpu, u64 val)
> /*
> * CPUID is already set before
> * Forbid changing different CPUID at runtime
> - * But CPUID 0 is the initial value for vcpu, so allow
> - * changing from 0 to others
> */
> - if (cpuid) {
> - spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> - return -EINVAL;
> - }
> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> + return -EINVAL;
> }
>
> if (map->phys_map[val].enabled) {
> @@ -1029,6 +1025,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>
> /* Set cpuid */
> kvm_write_sw_gcsr(csr, LOONGARCH_CSR_TMID, vcpu->vcpu_id);
> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, KVM_MAX_PHYID);
>
> /* Start with no pending virtual guest interrupts */
> csr->csrs[LOONGARCH_CSR_GINTC] = 0;
Very nice, but I think kvm_drop_cpuid() should also set to KVM_MAX_PHYID.
Now I update my loongarch-kvm branch, you can test it again, and hope
it is in the perfect status.

Huacai
>
>
> >
> > Huacai
> >
> >>
> >>
> >>>
> >>> Huacai
> >>>
> >>>>
> >>>> Regards
> >>>> Bibo Mao
> >>>>
> >>>>>
> >>>>> Huacai
> >>>>>
> >>>>>>>
> >>>>>>>> +
> >>>>>>>> + if (map->phys_map[val].enabled) {
> >>>>>>>> + /*
> >>>>>>>> + * New cpuid is already set with other vcpu
> >>>>>>>> + * Forbid sharing the same cpuid between different vcpus
> >>>>>>>> + */
> >>>>>>>> + if (map->phys_map[val].vcpu != vcpu) {
> >>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>> + return -EINVAL;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + /* Discard duplicated cpuid set operation*/
> >>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>> + return 0;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
> >>>>>>>> + map->phys_map[val].enabled = true;
> >>>>>>>> + map->phys_map[val].vcpu = vcpu;
> >>>>>>>> + if (map->max_phyid < val)
> >>>>>>>> + map->max_phyid = val;
> >>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>> + return 0;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
> >>>>>>>> +{
> >>>>>>>> + struct kvm_phyid_map *map;
> >>>>>>>> +
> >>>>>>>> + if (cpuid >= KVM_MAX_PHYID)
> >>>>>>>> + return NULL;
> >>>>>>>> +
> >>>>>>>> + map = kvm->arch.phyid_map;
> >>>>>>>> + if (map->phys_map[cpuid].enabled)
> >>>>>>>> + return map->phys_map[cpuid].vcpu;
> >>>>>>>> +
> >>>>>>>> + return NULL;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
> >>>>>>>> +{
> >>>>>>>> + int cpuid;
> >>>>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
> >>>>>>>> + struct kvm_phyid_map *map;
> >>>>>>>> +
> >>>>>>>> + map = vcpu->kvm->arch.phyid_map;
> >>>>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
> >>>>>>>> + if (cpuid >= KVM_MAX_PHYID)
> >>>>>>>> + return;
> >>>>>>>> +
> >>>>>>>> + if (map->phys_map[cpuid].enabled) {
> >>>>>>>> + map->phys_map[cpuid].vcpu = NULL;
> >>>>>>>> + map->phys_map[cpuid].enabled = false;
> >>>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
> >>>>>>>> + }
> >>>>>>>> +}
> >>>>>>> While kvm_set_cpuid() is protected by a spinlock, do kvm_drop_cpuid()
> >>>>>>> and kvm_get_vcpu_by_cpuid() also need it?
> >>>>>>>
> >>>>>> It is good to me that spinlock is added in function kvm_drop_cpuid().
> >>>>>> And thinks for the efforts.
> >>>>>>
> >>>>>> Regards
> >>>>>> Bibo Mao
> >>>>>>>> +
> >>>>>>>> static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
> >>>>>>>> {
> >>>>>>>> int ret = 0, gintc;
> >>>>>>>> @@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
> >>>>>>>> kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
> >>>>>>>>
> >>>>>>>> return ret;
> >>>>>>>> - }
> >>>>>>>> + } else if (id == LOONGARCH_CSR_CPUID)
> >>>>>>>> + return kvm_set_cpuid(vcpu, val);
> >>>>>>>>
> >>>>>>>> kvm_write_sw_gcsr(csr, id, val);
> >>>>>>>>
> >>>>>>>> @@ -943,6 +1033,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> >>>>>>>> hrtimer_cancel(&vcpu->arch.swtimer);
> >>>>>>>> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> >>>>>>>> kfree(vcpu->arch.csr);
> >>>>>>>> + kvm_drop_cpuid(vcpu);
> >>>>>>> I think this line should be before the above kfree(), otherwise you
> >>>>>>> get a "use after free".
> >>>>>>>
> >>>>>>> Huacai
> >>>>>>>
> >>>>>>>>
> >>>>>>>> /*
> >>>>>>>> * If the vCPU is freed and reused as another vCPU, we don't want the
> >>>>>>>> diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
> >>>>>>>> index 0a37f6fa8f2d..6006a28653ad 100644
> >>>>>>>> --- a/arch/loongarch/kvm/vm.c
> >>>>>>>> +++ b/arch/loongarch/kvm/vm.c
> >>>>>>>> @@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >>>>>>>> if (!kvm->arch.pgd)
> >>>>>>>> return -ENOMEM;
> >>>>>>>>
> >>>>>>>> + kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
> >>>>>>>> + GFP_KERNEL_ACCOUNT);
> >>>>>>>> + if (!kvm->arch.phyid_map) {
> >>>>>>>> + free_page((unsigned long)kvm->arch.pgd);
> >>>>>>>> + kvm->arch.pgd = NULL;
> >>>>>>>> + return -ENOMEM;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> kvm_init_vmcs(kvm);
> >>>>>>>> kvm->arch.gpa_size = BIT(cpu_vabits - 1);
> >>>>>>>> kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
> >>>>>>>> @@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >>>>>>>> for (i = 0; i <= kvm->arch.root_level; i++)
> >>>>>>>> kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);
> >>>>>>>>
> >>>>>>>> + spin_lock_init(&kvm->arch.phyid_map_lock);
> >>>>>>>> return 0;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> @@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> >>>>>>>> {
> >>>>>>>> kvm_destroy_vcpus(kvm);
> >>>>>>>> free_page((unsigned long)kvm->arch.pgd);
> >>>>>>>> + kvfree(kvm->arch.phyid_map);
> >>>>>>>> kvm->arch.pgd = NULL;
> >>>>>>>> + kvm->arch.phyid_map = NULL;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >>>>>>>> --
> >>>>>>>> 2.39.3
> >>>>>>>>
> >>>>>>
> >>>>
> >>
>

2024-05-07 01:40:54

by Bibo Mao

[permalink] [raw]
Subject: Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid



On 2024/5/6 下午10:17, Huacai Chen wrote:
> On Mon, May 6, 2024 at 6:05 PM maobibo <[email protected]> wrote:
>>
>>
>>
>> On 2024/5/6 下午5:40, Huacai Chen wrote:
>>> On Mon, May 6, 2024 at 5:35 PM maobibo <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On 2024/5/6 下午4:59, Huacai Chen wrote:
>>>>> On Mon, May 6, 2024 at 4:18 PM maobibo <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2024/5/6 下午3:06, Huacai Chen wrote:
>>>>>>> Hi, Bibo,
>>>>>>>
>>>>>>> On Mon, May 6, 2024 at 2:36 PM maobibo <[email protected]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2024/5/6 上午9:49, Huacai Chen wrote:
>>>>>>>>> Hi, Bibo,
>>>>>>>>>
>>>>>>>>> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>> Physical cpuid is used for interrupt routing for irqchips such as
>>>>>>>>>> ipi/msi/extioi interrupt controller. And physical cpuid is stored
>>>>>>>>>> at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
>>>>>>>>>> is created and physical cpuid of two vcpus cannot be the same.
>>>>>>>>>>
>>>>>>>>>> Different irqchips have different size declaration about physical cpuid,
>>>>>>>>>> max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
>>>>>>>>>> supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
>>>>>>>>>> for MSI irqchip.
>>>>>>>>>>
>>>>>>>>>> The smallest value from all interrupt controllers is selected now,
>>>>>>>>>> and the max cpuid size is defines as 256 by KVM which comes from
>>>>>>>>>> extioi irqchip.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Bibo Mao <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>> arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
>>>>>>>>>> arch/loongarch/include/asm/kvm_vcpu.h | 1 +
>>>>>>>>>> arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
>>>>>>>>>> arch/loongarch/kvm/vm.c | 11 ++++
>>>>>>>>>> 4 files changed, 130 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
>>>>>>>>>> index 2d62f7b0d377..3ba16ef1fe69 100644
>>>>>>>>>> --- a/arch/loongarch/include/asm/kvm_host.h
>>>>>>>>>> +++ b/arch/loongarch/include/asm/kvm_host.h
>>>>>>>>>> @@ -64,6 +64,30 @@ struct kvm_world_switch {
>>>>>>>>>>
>>>>>>>>>> #define MAX_PGTABLE_LEVELS 4
>>>>>>>>>>
>>>>>>>>>> +/*
>>>>>>>>>> + * Physical cpu id is used for interrupt routing, there are different
>>>>>>>>>> + * definitions about physical cpuid on different hardwares.
>>>>>>>>>> + * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
>>>>>>>>>> + * For IPI HW, max dest CPUID size 1024
>>>>>>>>>> + * For extioi interrupt controller, max dest CPUID size is 256
>>>>>>>>>> + * For MSI interrupt controller, max supported CPUID size is 65536
>>>>>>>>>> + *
>>>>>>>>>> + * Currently max CPUID is defined as 256 for KVM hypervisor, in future
>>>>>>>>>> + * it will be expanded to 4096, including 16 packages at most. And every
>>>>>>>>>> + * package supports at most 256 vcpus
>>>>>>>>>> + */
>>>>>>>>>> +#define KVM_MAX_PHYID 256
>>>>>>>>>> +
>>>>>>>>>> +struct kvm_phyid_info {
>>>>>>>>>> + struct kvm_vcpu *vcpu;
>>>>>>>>>> + bool enabled;
>>>>>>>>>> +};
>>>>>>>>>> +
>>>>>>>>>> +struct kvm_phyid_map {
>>>>>>>>>> + int max_phyid;
>>>>>>>>>> + struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
>>>>>>>>>> +};
>>>>>>>>>> +
>>>>>>>>>> struct kvm_arch {
>>>>>>>>>> /* Guest physical mm */
>>>>>>>>>> kvm_pte_t *pgd;
>>>>>>>>>> @@ -71,6 +95,8 @@ struct kvm_arch {
>>>>>>>>>> unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
>>>>>>>>>> unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
>>>>>>>>>> unsigned int root_level;
>>>>>>>>>> + spinlock_t phyid_map_lock;
>>>>>>>>>> + struct kvm_phyid_map *phyid_map;
>>>>>>>>>>
>>>>>>>>>> s64 time_offset;
>>>>>>>>>> struct kvm_context __percpu *vmcs;
>>>>>>>>>> diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
>>>>>>>>>> index 0cb4fdb8a9b5..9f53950959da 100644
>>>>>>>>>> --- a/arch/loongarch/include/asm/kvm_vcpu.h
>>>>>>>>>> +++ b/arch/loongarch/include/asm/kvm_vcpu.h
>>>>>>>>>> @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
>>>>>>>>>> void kvm_restore_timer(struct kvm_vcpu *vcpu);
>>>>>>>>>>
>>>>>>>>>> int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
>>>>>>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
>>>>>>>>>>
>>>>>>>>>> /*
>>>>>>>>>> * Loongarch KVM guest interrupt handling
>>>>>>>>>> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
>>>>>>>>>> index 3a8779065f73..b633fd28b8db 100644
>>>>>>>>>> --- a/arch/loongarch/kvm/vcpu.c
>>>>>>>>>> +++ b/arch/loongarch/kvm/vcpu.c
>>>>>>>>>> @@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
>>>>>>>>>> return 0;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> +static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
>>>>>>>>>> +{
>>>>>>>>>> + int cpuid;
>>>>>>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
>>>>>>>>>> + struct kvm_phyid_map *map;
>>>>>>>>>> +
>>>>>>>>>> + if (val >= KVM_MAX_PHYID)
>>>>>>>>>> + return -EINVAL;
>>>>>>>>>> +
>>>>>>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
>>>>>>>>>> + map = vcpu->kvm->arch.phyid_map;
>>>>>>>>>> + spin_lock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>> + if (map->phys_map[cpuid].enabled) {
>>>>>>>>>> + /*
>>>>>>>>>> + * Cpuid is already set before
>>>>>>>>>> + * Forbid changing different cpuid at runtime
>>>>>>>>>> + */
>>>>>>>>>> + if (cpuid != val) {
>>>>>>>>>> + /*
>>>>>>>>>> + * Cpuid 0 is initial value for vcpu, maybe invalid
>>>>>>>>>> + * unset value for vcpu
>>>>>>>>>> + */
>>>>>>>>>> + if (cpuid) {
>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>> + return -EINVAL;
>>>>>>>>>> + }
>>>>>>>>>> + } else {
>>>>>>>>>> + /* Discard duplicated cpuid set */
>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>> + return 0;
>>>>>>>>>> + }
>>>>>>>>>> + }
>>>>>>>>> I have changed the logic and comments when I apply, you can double
>>>>>>>>> check whether it is correct.
>>>>>>>> I checkout the latest version, the modification in function
>>>>>>>> kvm_set_cpuid() is good for me.
>>>>>>> Now the modified version is like this:
>>>>>>>
>>>>>>> + if (map->phys_map[cpuid].enabled) {
>>>>>>> + /* Discard duplicated CPUID set operation */
>>>>>>> + if (cpuid == val) {
>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>> + return 0;
>>>>>>> + }
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * CPUID is already set before
>>>>>>> + * Forbid changing different CPUID at runtime
>>>>>>> + * But CPUID 0 is the initial value for vcpu, so allow
>>>>>>> + * changing from 0 to others
>>>>>>> + */
>>>>>>> + if (cpuid) {
>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>> + return -EINVAL;
>>>>>>> + }
>>>>>>> + }
>>>>>>> But I still doubt whether we should allow changing from 0 to others
>>>>>>> while map->phys_map[cpuid].enabled is 1.
>>>>>> It is necessary since the default sw cpuid is zero :-( And we can
>>>>>> optimize it in later, such as set INVALID cpuid in function
>>>>>> kvm_arch_vcpu_create() and logic will be simple in function kvm_set_cpuid().
>>>>> In my opinion, if a vcpu with a uninitialized default physid=0, then
>>>>> map->phys_map[cpuid].enabled should be 0, then code won't come here.
>>>>> And if a vcpu with a real physid=0, then map->phys_map[cpuid].enabled
>>>>> is 1, but we shouldn't allow it to change physid in this case.
>>>> yes, that is actually a problem.
>>>>
>>>> vcpu0 firstly set physid=0, and vcpu0 set physid=1 again is not allowed.
>>>> vcpu0 firstly set physid=0, and vcpu1 set physid=1 is allowed.
>>>
>>> So can we simply drop the if (cpuid) checking? That means:
>>> + if (map->phys_map[cpuid].enabled) {
>>> + /* Discard duplicated CPUID set operation */
>>> + if (cpuid == val) {
>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>> + return 0;
>>> + }
>>> +
>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>> + return -EINVAL;
>>> + }
>> yes, the similar modification such as following, since the secondary
>> scenario should be allowed.
>> "vcpu0 firstly set physid=0, and vcpu1 set physid=1 is allowed though
>> default sw cpuid is zero"
>>
>> --- a/arch/loongarch/kvm/vcpu.c
>> +++ b/arch/loongarch/kvm/vcpu.c
>> @@ -272,7 +272,7 @@ static inline int kvm_set_cpuid(struct kvm_vcpu
>> *vcpu, u64 val)
>> cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_CPUID);
>>
>> spin_lock(&vcpu->kvm->arch.phyid_map_lock);
>> - if (map->phys_map[cpuid].enabled) {
>> + if ((cpuid != KVM_MAX_PHYID) && map->phys_map[cpuid].enabled) {
>> /* Discard duplicated CPUID set operation */
>> if (cpuid == val) {
>> spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>> @@ -282,13 +282,9 @@ static inline int kvm_set_cpuid(struct kvm_vcpu
>> *vcpu, u64 val)
>> /*
>> * CPUID is already set before
>> * Forbid changing different CPUID at runtime
>> - * But CPUID 0 is the initial value for vcpu, so allow
>> - * changing from 0 to others
>> */
>> - if (cpuid) {
>> - spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>> - return -EINVAL;
>> - }
>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>> + return -EINVAL;
>> }
>>
>> if (map->phys_map[val].enabled) {
>> @@ -1029,6 +1025,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>>
>> /* Set cpuid */
>> kvm_write_sw_gcsr(csr, LOONGARCH_CSR_TMID, vcpu->vcpu_id);
>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, KVM_MAX_PHYID);
>>
>> /* Start with no pending virtual guest interrupts */
>> csr->csrs[LOONGARCH_CSR_GINTC] = 0;
> Very nice, but I think kvm_drop_cpuid() should also set to KVM_MAX_PHYID.
> Now I update my loongarch-kvm branch, you can test it again, and hope
> it is in the perfect status.
I sync and test the latest code from loongarch-kvm, pv ipi works well
with 256 vcpus. And the code looks good to me, thanks for your review in
short time.

Regards
Bibo Mao
>
> Huacai
>>
>>
>>>
>>> Huacai
>>>
>>>>
>>>>
>>>>>
>>>>> Huacai
>>>>>
>>>>>>
>>>>>> Regards
>>>>>> Bibo Mao
>>>>>>
>>>>>>>
>>>>>>> Huacai
>>>>>>>
>>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>> + if (map->phys_map[val].enabled) {
>>>>>>>>>> + /*
>>>>>>>>>> + * New cpuid is already set with other vcpu
>>>>>>>>>> + * Forbid sharing the same cpuid between different vcpus
>>>>>>>>>> + */
>>>>>>>>>> + if (map->phys_map[val].vcpu != vcpu) {
>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>> + return -EINVAL;
>>>>>>>>>> + }
>>>>>>>>>> +
>>>>>>>>>> + /* Discard duplicated cpuid set operation*/
>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>> + return 0;
>>>>>>>>>> + }
>>>>>>>>>> +
>>>>>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
>>>>>>>>>> + map->phys_map[val].enabled = true;
>>>>>>>>>> + map->phys_map[val].vcpu = vcpu;
>>>>>>>>>> + if (map->max_phyid < val)
>>>>>>>>>> + map->max_phyid = val;
>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>> + return 0;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
>>>>>>>>>> +{
>>>>>>>>>> + struct kvm_phyid_map *map;
>>>>>>>>>> +
>>>>>>>>>> + if (cpuid >= KVM_MAX_PHYID)
>>>>>>>>>> + return NULL;
>>>>>>>>>> +
>>>>>>>>>> + map = kvm->arch.phyid_map;
>>>>>>>>>> + if (map->phys_map[cpuid].enabled)
>>>>>>>>>> + return map->phys_map[cpuid].vcpu;
>>>>>>>>>> +
>>>>>>>>>> + return NULL;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
>>>>>>>>>> +{
>>>>>>>>>> + int cpuid;
>>>>>>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
>>>>>>>>>> + struct kvm_phyid_map *map;
>>>>>>>>>> +
>>>>>>>>>> + map = vcpu->kvm->arch.phyid_map;
>>>>>>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
>>>>>>>>>> + if (cpuid >= KVM_MAX_PHYID)
>>>>>>>>>> + return;
>>>>>>>>>> +
>>>>>>>>>> + if (map->phys_map[cpuid].enabled) {
>>>>>>>>>> + map->phys_map[cpuid].vcpu = NULL;
>>>>>>>>>> + map->phys_map[cpuid].enabled = false;
>>>>>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
>>>>>>>>>> + }
>>>>>>>>>> +}
>>>>>>>>> While kvm_set_cpuid() is protected by a spinlock, do kvm_drop_cpuid()
>>>>>>>>> and kvm_get_vcpu_by_cpuid() also need it?
>>>>>>>>>
>>>>>>>> It is good to me that spinlock is added in function kvm_drop_cpuid().
>>>>>>>> And thinks for the efforts.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Bibo Mao
>>>>>>>>>> +
>>>>>>>>>> static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
>>>>>>>>>> {
>>>>>>>>>> int ret = 0, gintc;
>>>>>>>>>> @@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
>>>>>>>>>> kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
>>>>>>>>>>
>>>>>>>>>> return ret;
>>>>>>>>>> - }
>>>>>>>>>> + } else if (id == LOONGARCH_CSR_CPUID)
>>>>>>>>>> + return kvm_set_cpuid(vcpu, val);
>>>>>>>>>>
>>>>>>>>>> kvm_write_sw_gcsr(csr, id, val);
>>>>>>>>>>
>>>>>>>>>> @@ -943,6 +1033,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>>>>>>>>>> hrtimer_cancel(&vcpu->arch.swtimer);
>>>>>>>>>> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
>>>>>>>>>> kfree(vcpu->arch.csr);
>>>>>>>>>> + kvm_drop_cpuid(vcpu);
>>>>>>>>> I think this line should be before the above kfree(), otherwise you
>>>>>>>>> get a "use after free".
>>>>>>>>>
>>>>>>>>> Huacai
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> /*
>>>>>>>>>> * If the vCPU is freed and reused as another vCPU, we don't want the
>>>>>>>>>> diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
>>>>>>>>>> index 0a37f6fa8f2d..6006a28653ad 100644
>>>>>>>>>> --- a/arch/loongarch/kvm/vm.c
>>>>>>>>>> +++ b/arch/loongarch/kvm/vm.c
>>>>>>>>>> @@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>>>>>>>>> if (!kvm->arch.pgd)
>>>>>>>>>> return -ENOMEM;
>>>>>>>>>>
>>>>>>>>>> + kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
>>>>>>>>>> + GFP_KERNEL_ACCOUNT);
>>>>>>>>>> + if (!kvm->arch.phyid_map) {
>>>>>>>>>> + free_page((unsigned long)kvm->arch.pgd);
>>>>>>>>>> + kvm->arch.pgd = NULL;
>>>>>>>>>> + return -ENOMEM;
>>>>>>>>>> + }
>>>>>>>>>> +
>>>>>>>>>> kvm_init_vmcs(kvm);
>>>>>>>>>> kvm->arch.gpa_size = BIT(cpu_vabits - 1);
>>>>>>>>>> kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
>>>>>>>>>> @@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>>>>>>>>> for (i = 0; i <= kvm->arch.root_level; i++)
>>>>>>>>>> kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);
>>>>>>>>>>
>>>>>>>>>> + spin_lock_init(&kvm->arch.phyid_map_lock);
>>>>>>>>>> return 0;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> @@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>>>>>>>>>> {
>>>>>>>>>> kvm_destroy_vcpus(kvm);
>>>>>>>>>> free_page((unsigned long)kvm->arch.pgd);
>>>>>>>>>> + kvfree(kvm->arch.phyid_map);
>>>>>>>>>> kvm->arch.pgd = NULL;
>>>>>>>>>> + kvm->arch.phyid_map = NULL;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>>>>>>>> --
>>>>>>>>>> 2.39.3
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>


2024-05-07 02:06:24

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid

On Tue, May 7, 2024 at 9:40 AM maobibo <[email protected]> wrote:
>
>
>
> On 2024/5/6 下午10:17, Huacai Chen wrote:
> > On Mon, May 6, 2024 at 6:05 PM maobibo <[email protected]> wrote:
> >>
> >>
> >>
> >> On 2024/5/6 下午5:40, Huacai Chen wrote:
> >>> On Mon, May 6, 2024 at 5:35 PM maobibo <[email protected]> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2024/5/6 下午4:59, Huacai Chen wrote:
> >>>>> On Mon, May 6, 2024 at 4:18 PM maobibo <[email protected]> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 2024/5/6 下午3:06, Huacai Chen wrote:
> >>>>>>> Hi, Bibo,
> >>>>>>>
> >>>>>>> On Mon, May 6, 2024 at 2:36 PM maobibo <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 2024/5/6 上午9:49, Huacai Chen wrote:
> >>>>>>>>> Hi, Bibo,
> >>>>>>>>>
> >>>>>>>>> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Physical cpuid is used for interrupt routing for irqchips such as
> >>>>>>>>>> ipi/msi/extioi interrupt controller. And physical cpuid is stored
> >>>>>>>>>> at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
> >>>>>>>>>> is created and physical cpuid of two vcpus cannot be the same.
> >>>>>>>>>>
> >>>>>>>>>> Different irqchips have different size declaration about physical cpuid,
> >>>>>>>>>> max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
> >>>>>>>>>> supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
> >>>>>>>>>> for MSI irqchip.
> >>>>>>>>>>
> >>>>>>>>>> The smallest value from all interrupt controllers is selected now,
> >>>>>>>>>> and the max cpuid size is defines as 256 by KVM which comes from
> >>>>>>>>>> extioi irqchip.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Bibo Mao <[email protected]>
> >>>>>>>>>> ---
> >>>>>>>>>> arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
> >>>>>>>>>> arch/loongarch/include/asm/kvm_vcpu.h | 1 +
> >>>>>>>>>> arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
> >>>>>>>>>> arch/loongarch/kvm/vm.c | 11 ++++
> >>>>>>>>>> 4 files changed, 130 insertions(+), 1 deletion(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
> >>>>>>>>>> index 2d62f7b0d377..3ba16ef1fe69 100644
> >>>>>>>>>> --- a/arch/loongarch/include/asm/kvm_host.h
> >>>>>>>>>> +++ b/arch/loongarch/include/asm/kvm_host.h
> >>>>>>>>>> @@ -64,6 +64,30 @@ struct kvm_world_switch {
> >>>>>>>>>>
> >>>>>>>>>> #define MAX_PGTABLE_LEVELS 4
> >>>>>>>>>>
> >>>>>>>>>> +/*
> >>>>>>>>>> + * Physical cpu id is used for interrupt routing, there are different
> >>>>>>>>>> + * definitions about physical cpuid on different hardwares.
> >>>>>>>>>> + * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
> >>>>>>>>>> + * For IPI HW, max dest CPUID size 1024
> >>>>>>>>>> + * For extioi interrupt controller, max dest CPUID size is 256
> >>>>>>>>>> + * For MSI interrupt controller, max supported CPUID size is 65536
> >>>>>>>>>> + *
> >>>>>>>>>> + * Currently max CPUID is defined as 256 for KVM hypervisor, in future
> >>>>>>>>>> + * it will be expanded to 4096, including 16 packages at most And every
> >>>>>>>>>> + * package supports at most 256 vcpus
> >>>>>>>>>> + */
> >>>>>>>>>> +#define KVM_MAX_PHYID 256
> >>>>>>>>>> +
> >>>>>>>>>> +struct kvm_phyid_info {
> >>>>>>>>>> + struct kvm_vcpu *vcpu;
> >>>>>>>>>> + bool enabled;
> >>>>>>>>>> +};
> >>>>>>>>>> +
> >>>>>>>>>> +struct kvm_phyid_map {
> >>>>>>>>>> + int max_phyid;
> >>>>>>>>>> + struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
> >>>>>>>>>> +};
> >>>>>>>>>> +
> >>>>>>>>>> struct kvm_arch {
> >>>>>>>>>> /* Guest physical mm */
> >>>>>>>>>> kvm_pte_t *pgd;
> >>>>>>>>>> @@ -71,6 +95,8 @@ struct kvm_arch {
> >>>>>>>>>> unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
> >>>>>>>>>> unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
> >>>>>>>>>> unsigned int root_level;
> >>>>>>>>>> + spinlock_t phyid_map_lock;
> >>>>>>>>>> + struct kvm_phyid_map *phyid_map;
> >>>>>>>>>>
> >>>>>>>>>> s64 time_offset;
> >>>>>>>>>> struct kvm_context __percpu *vmcs;
> >>>>>>>>>> diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
> >>>>>>>>>> index 0cb4fdb8a9b5..9f53950959da 100644
> >>>>>>>>>> --- a/arch/loongarch/include/asm/kvm_vcpu.h
> >>>>>>>>>> +++ b/arch/loongarch/include/asm/kvm_vcpu.h
> >>>>>>>>>> @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
> >>>>>>>>>> void kvm_restore_timer(struct kvm_vcpu *vcpu);
> >>>>>>>>>>
> >>>>>>>>>> int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
> >>>>>>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
> >>>>>>>>>>
> >>>>>>>>>> /*
> >>>>>>>>>> * Loongarch KVM guest interrupt handling
> >>>>>>>>>> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
> >>>>>>>>>> index 3a8779065f73..b633fd28b8db 100644
> >>>>>>>>>> --- a/arch/loongarch/kvm/vcpu.c
> >>>>>>>>>> +++ b/arch/loongarch/kvm/vcpu.c
> >>>>>>>>>> @@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
> >>>>>>>>>> return 0;
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> +static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
> >>>>>>>>>> +{
> >>>>>>>>>> + int cpuid;
> >>>>>>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
> >>>>>>>>>> + struct kvm_phyid_map *map;
> >>>>>>>>>> +
> >>>>>>>>>> + if (val >= KVM_MAX_PHYID)
> >>>>>>>>>> + return -EINVAL;
> >>>>>>>>>> +
> >>>>>>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
> >>>>>>>>>> + map = vcpu->kvm->arch.phyid_map;
> >>>>>>>>>> + spin_lock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>>>> + if (map->phys_map[cpuid].enabled) {
> >>>>>>>>>> + /*
> >>>>>>>>>> + * Cpuid is already set before
> >>>>>>>>>> + * Forbid changing different cpuid at runtime
> >>>>>>>>>> + */
> >>>>>>>>>> + if (cpuid != val) {
> >>>>>>>>>> + /*
> >>>>>>>>>> + * Cpuid 0 is initial value for vcpu, maybe invalid
> >>>>>>>>>> + * unset value for vcpu
> >>>>>>>>>> + */
> >>>>>>>>>> + if (cpuid) {
> >>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>>>> + return -EINVAL;
> >>>>>>>>>> + }
> >>>>>>>>>> + } else {
> >>>>>>>>>> + /* Discard duplicated cpuid set */
> >>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>>>> + return 0;
> >>>>>>>>>> + }
> >>>>>>>>>> + }
> >>>>>>>>> I have changed the logic and comments when I apply, you can double
> >>>>>>>>> check whether it is correct.
> >>>>>>>> I checkout the latest version, the modification in function
> >>>>>>>> kvm_set_cpuid() is good for me.
> >>>>>>> Now the modified version is like this:
> >>>>>>>
> >>>>>>> + if (map->phys_map[cpuid].enabled) {
> >>>>>>> + /* Discard duplicated CPUID set operation */
> >>>>>>> + if (cpuid == val) {
> >>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>> + return 0;
> >>>>>>> + }
> >>>>>>> +
> >>>>>>> + /*
> >>>>>>> + * CPUID is already set before
> >>>>>>> + * Forbid changing different CPUID at runtime
> >>>>>>> + * But CPUID 0 is the initial value for vcpu, so allow
> >>>>>>> + * changing from 0 to others
> >>>>>>> + */
> >>>>>>> + if (cpuid) {
> >>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>> + return -EINVAL;
> >>>>>>> + }
> >>>>>>> + }
> >>>>>>> But I still doubt whether we should allow changing from 0 to others
> >>>>>>> while map->phys_map[cpuid].enabled is 1.
> >>>>>> It is necessary since the default sw cpuid is zero :-( And we can
> >>>>>> optimize it in later, such as set INVALID cpuid in function
> >>>>>> kvm_arch_vcpu_create() and logic will be simple in function kvm_set_cpuid().
> >>>>> In my opinion, if a vcpu with a uninitialized default physid=0, then
> >>>>> map->phys_map[cpuid].enabled should be 0, then code won't come here.
> >>>>> And if a vcpu with a real physid=0, then map->phys_map[cpuid].enabled
> >>>>> is 1, but we shouldn't allow it to change physid in this case.
> >>>> yes, that is actually a problem.
> >>>>
> >>>> vcpu0 firstly set physid=0, and vcpu0 set physid=1 again is not allowed.
> >>>> vcpu0 firstly set physid=0, and vcpu1 set physid=1 is allowed.
> >>>
> >>> So can we simply drop the if (cpuid) checking? That means:
> >>> + if (map->phys_map[cpuid].enabled) {
> >>> + /* Discard duplicated CPUID set operation */
> >>> + if (cpuid == val) {
> >>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>> + return 0;
> >>> + }
> >>> +
> >>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>> + return -EINVAL;
> >>> + }
> >> yes, the similar modification such as following, since the secondary
> >> scenario should be allowed.
> >> "vcpu0 firstly set physid=0, and vcpu1 set physid=1 is allowed though
> >> default sw cpuid is zero"
> >>
> >> --- a/arch/loongarch/kvm/vcpu.c
> >> +++ b/arch/loongarch/kvm/vcpu.c
> >> @@ -272,7 +272,7 @@ static inline int kvm_set_cpuid(struct kvm_vcpu
> >> *vcpu, u64 val)
> >> cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_CPUID);
> >>
> >> spin_lock(&vcpu->kvm->arch.phyid_map_lock);
> >> - if (map->phys_map[cpuid].enabled) {
> >> + if ((cpuid != KVM_MAX_PHYID) && map->phys_map[cpuid].enabled) {
> >> /* Discard duplicated CPUID set operation */
> >> if (cpuid == val) {
> >> spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >> @@ -282,13 +282,9 @@ static inline int kvm_set_cpuid(struct kvm_vcpu
> >> *vcpu, u64 val)
> >> /*
> >> * CPUID is already set before
> >> * Forbid changing different CPUID at runtime
> >> - * But CPUID 0 is the initial value for vcpu, so allow
> >> - * changing from 0 to others
> >> */
> >> - if (cpuid) {
> >> - spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >> - return -EINVAL;
> >> - }
> >> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >> + return -EINVAL;
> >> }
> >>
> >> if (map->phys_map[val].enabled) {
> >> @@ -1029,6 +1025,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> >>
> >> /* Set cpuid */
> >> kvm_write_sw_gcsr(csr, LOONGARCH_CSR_TMID, vcpu->vcpu_id);
> >> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, KVM_MAX_PHYID);
> >>
> >> /* Start with no pending virtual guest interrupts */
> >> csr->csrs[LOONGARCH_CSR_GINTC] = 0;
> > Very nice, but I think kvm_drop_cpuid() should also set to KVM_MAX_PHYID.
> > Now I update my loongarch-kvm branch, you can test it again, and hope
> > it is in the perfect status.
> I sync and test the latest code from loongarch-kvm, pv ipi works well
> with 256 vcpus. And the code looks good to me, thanks for your review in
> short time.
OK, if SWDBG also works well, I will send PR to Paolo tomorrow.

Huacai

>
> Regards
> Bibo Mao
> >
> > Huacai
> >>
> >>
> >>>
> >>> Huacai
> >>>
> >>>>
> >>>>
> >>>>>
> >>>>> Huacai
> >>>>>
> >>>>>>
> >>>>>> Regards
> >>>>>> Bibo Mao
> >>>>>>
> >>>>>>>
> >>>>>>> Huacai
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>>> +
> >>>>>>>>>> + if (map->phys_map[val].enabled) {
> >>>>>>>>>> + /*
> >>>>>>>>>> + * New cpuid is already set with other vcpu
> >>>>>>>>>> + * Forbid sharing the same cpuid between different vcpus
> >>>>>>>>>> + */
> >>>>>>>>>> + if (map->phys_map[val].vcpu != vcpu) {
> >>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>>>> + return -EINVAL;
> >>>>>>>>>> + }
> >>>>>>>>>> +
> >>>>>>>>>> + /* Discard duplicated cpuid set operation*/
> >>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>>>> + return 0;
> >>>>>>>>>> + }
> >>>>>>>>>> +
> >>>>>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
> >>>>>>>>>> + map->phys_map[val].enabled = true;
> >>>>>>>>>> + map->phys_map[val].vcpu = vcpu;
> >>>>>>>>>> + if (map->max_phyid < val)
> >>>>>>>>>> + map->max_phyid = val;
> >>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>>>> + return 0;
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
> >>>>>>>>>> +{
> >>>>>>>>>> + struct kvm_phyid_map *map;
> >>>>>>>>>> +
> >>>>>>>>>> + if (cpuid >= KVM_MAX_PHYID)
> >>>>>>>>>> + return NULL;
> >>>>>>>>>> +
> >>>>>>>>>> + map = kvm->arch.phyid_map;
> >>>>>>>>>> + if (map->phys_map[cpuid].enabled)
> >>>>>>>>>> + return map->phys_map[cpuid].vcpu;
> >>>>>>>>>> +
> >>>>>>>>>> + return NULL;
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>> +static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
> >>>>>>>>>> +{
> >>>>>>>>>> + int cpuid;
> >>>>>>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
> >>>>>>>>>> + struct kvm_phyid_map *map;
> >>>>>>>>>> +
> >>>>>>>>>> + map = vcpu->kvm->arch.phyid_map;
> >>>>>>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
> >>>>>>>>>> + if (cpuid >= KVM_MAX_PHYID)
> >>>>>>>>>> + return;
> >>>>>>>>>> +
> >>>>>>>>>> + if (map->phys_map[cpuid].enabled) {
> >>>>>>>>>> + map->phys_map[cpuid].vcpu = NULL;
> >>>>>>>>>> + map->phys_map[cpuid].enabled = false;
> >>>>>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
> >>>>>>>>>> + }
> >>>>>>>>>> +}
> >>>>>>>>> While kvm_set_cpuid() is protected by a spinlock, do kvm_drop_cpuid()
> >>>>>>>>> and kvm_get_vcpu_by_cpuid() also need it?
> >>>>>>>>>
> >>>>>>>> It is good to me that spinlock is added in function kvm_drop_cpuid().
> >>>>>>>> And thinks for the efforts.
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>> Bibo Mao
> >>>>>>>>>> +
> >>>>>>>>>> static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
> >>>>>>>>>> {
> >>>>>>>>>> int ret = 0, gintc;
> >>>>>>>>>> @@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
> >>>>>>>>>> kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
> >>>>>>>>>>
> >>>>>>>>>> return ret;
> >>>>>>>>>> - }
> >>>>>>>>>> + } else if (id == LOONGARCH_CSR_CPUID)
> >>>>>>>>>> + return kvm_set_cpuid(vcpu, val);
> >>>>>>>>>>
> >>>>>>>>>> kvm_write_sw_gcsr(csr, id, val);
> >>>>>>>>>>
> >>>>>>>>>> @@ -943,6 +1033,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> >>>>>>>>>> hrtimer_cancel(&vcpu->arch.swtimer);
> >>>>>>>>>> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> >>>>>>>>>> kfree(vcpu->arch.csr);
> >>>>>>>>>> + kvm_drop_cpuid(vcpu);
> >>>>>>>>> I think this line should be before the above kfree(), otherwise you
> >>>>>>>>> get a "use after free".
> >>>>>>>>>
> >>>>>>>>> Huacai
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> /*
> >>>>>>>>>> * If the vCPU is freed and reused as another vCPU, we don't want the
> >>>>>>>>>> diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
> >>>>>>>>>> index 0a37f6fa8f2d..6006a28653ad 100644
> >>>>>>>>>> --- a/arch/loongarch/kvm/vm.c
> >>>>>>>>>> +++ b/arch/loongarch/kvm/vm.c
> >>>>>>>>>> @@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >>>>>>>>>> if (!kvm->arch.pgd)
> >>>>>>>>>> return -ENOMEM;
> >>>>>>>>>>
> >>>>>>>>>> + kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
> >>>>>>>>>> + GFP_KERNEL_ACCOUNT);
> >>>>>>>>>> + if (!kvm->arch.phyid_map) {
> >>>>>>>>>> + free_page((unsigned long)kvm->arch.pgd);
> >>>>>>>>>> + kvm->arch.pgd = NULL;
> >>>>>>>>>> + return -ENOMEM;
> >>>>>>>>>> + }
> >>>>>>>>>> +
> >>>>>>>>>> kvm_init_vmcs(kvm);
> >>>>>>>>>> kvm->arch.gpa_size = BIT(cpu_vabits - 1);
> >>>>>>>>>> kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
> >>>>>>>>>> @@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >>>>>>>>>> for (i = 0; i <= kvm->arch.root_level; i++)
> >>>>>>>>>> kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);
> >>>>>>>>>>
> >>>>>>>>>> + spin_lock_init(&kvm->arch.phyid_map_lock);
> >>>>>>>>>> return 0;
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> @@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> >>>>>>>>>> {
> >>>>>>>>>> kvm_destroy_vcpus(kvm);
> >>>>>>>>>> free_page((unsigned long)kvm->arch.pgd);
> >>>>>>>>>> + kvfree(kvm->arch.phyid_map);
> >>>>>>>>>> kvm->arch.pgd = NULL;
> >>>>>>>>>> + kvm->arch.phyid_map = NULL;
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >>>>>>>>>> --
> >>>>>>>>>> 2.39.3
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
>
>

2024-05-07 03:06:35

by Bibo Mao

[permalink] [raw]
Subject: Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid



On 2024/5/7 上午10:05, Huacai Chen wrote:
> On Tue, May 7, 2024 at 9:40 AM maobibo <[email protected]> wrote:
>>
>>
>>
>> On 2024/5/6 下午10:17, Huacai Chen wrote:
>>> On Mon, May 6, 2024 at 6:05 PM maobibo <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On 2024/5/6 下午5:40, Huacai Chen wrote:
>>>>> On Mon, May 6, 2024 at 5:35 PM maobibo <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2024/5/6 下午4:59, Huacai Chen wrote:
>>>>>>> On Mon, May 6, 2024 at 4:18 PM maobibo <[email protected]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2024/5/6 下午3:06, Huacai Chen wrote:
>>>>>>>>> Hi, Bibo,
>>>>>>>>>
>>>>>>>>> On Mon, May 6, 2024 at 2:36 PM maobibo <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2024/5/6 上午9:49, Huacai Chen wrote:
>>>>>>>>>>> Hi, Bibo,
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Physical cpuid is used for interrupt routing for irqchips such as
>>>>>>>>>>>> ipi/msi/extioi interrupt controller. And physical cpuid is stored
>>>>>>>>>>>> at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
>>>>>>>>>>>> is created and physical cpuid of two vcpus cannot be the same.
>>>>>>>>>>>>
>>>>>>>>>>>> Different irqchips have different size declaration about physical cpuid,
>>>>>>>>>>>> max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
>>>>>>>>>>>> supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
>>>>>>>>>>>> for MSI irqchip.
>>>>>>>>>>>>
>>>>>>>>>>>> The smallest value from all interrupt controllers is selected now,
>>>>>>>>>>>> and the max cpuid size is defines as 256 by KVM which comes from
>>>>>>>>>>>> extioi irqchip.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Bibo Mao <[email protected]>
>>>>>>>>>>>> ---
>>>>>>>>>>>> arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
>>>>>>>>>>>> arch/loongarch/include/asm/kvm_vcpu.h | 1 +
>>>>>>>>>>>> arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
>>>>>>>>>>>> arch/loongarch/kvm/vm.c | 11 ++++
>>>>>>>>>>>> 4 files changed, 130 insertions(+), 1 deletion(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
>>>>>>>>>>>> index 2d62f7b0d377..3ba16ef1fe69 100644
>>>>>>>>>>>> --- a/arch/loongarch/include/asm/kvm_host.h
>>>>>>>>>>>> +++ b/arch/loongarch/include/asm/kvm_host.h
>>>>>>>>>>>> @@ -64,6 +64,30 @@ struct kvm_world_switch {
>>>>>>>>>>>>
>>>>>>>>>>>> #define MAX_PGTABLE_LEVELS 4
>>>>>>>>>>>>
>>>>>>>>>>>> +/*
>>>>>>>>>>>> + * Physical cpu id is used for interrupt routing, there are different
>>>>>>>>>>>> + * definitions about physical cpuid on different hardwares.
>>>>>>>>>>>> + * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
>>>>>>>>>>>> + * For IPI HW, max dest CPUID size 1024
>>>>>>>>>>>> + * For extioi interrupt controller, max dest CPUID size is 256
>>>>>>>>>>>> + * For MSI interrupt controller, max supported CPUID size is 65536
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Currently max CPUID is defined as 256 for KVM hypervisor, in future
>>>>>>>>>>>> + * it will be expanded to 4096, including 16 packages at most. And every
>>>>>>>>>>>> + * package supports at most 256 vcpus
>>>>>>>>>>>> + */
>>>>>>>>>>>> +#define KVM_MAX_PHYID 256
>>>>>>>>>>>> +
>>>>>>>>>>>> +struct kvm_phyid_info {
>>>>>>>>>>>> + struct kvm_vcpu *vcpu;
>>>>>>>>>>>> + bool enabled;
>>>>>>>>>>>> +};
>>>>>>>>>>>> +
>>>>>>>>>>>> +struct kvm_phyid_map {
>>>>>>>>>>>> + int max_phyid;
>>>>>>>>>>>> + struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
>>>>>>>>>>>> +};
>>>>>>>>>>>> +
>>>>>>>>>>>> struct kvm_arch {
>>>>>>>>>>>> /* Guest physical mm */
>>>>>>>>>>>> kvm_pte_t *pgd;
>>>>>>>>>>>> @@ -71,6 +95,8 @@ struct kvm_arch {
>>>>>>>>>>>> unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
>>>>>>>>>>>> unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
>>>>>>>>>>>> unsigned int root_level;
>>>>>>>>>>>> + spinlock_t phyid_map_lock;
>>>>>>>>>>>> + struct kvm_phyid_map *phyid_map;
>>>>>>>>>>>>
>>>>>>>>>>>> s64 time_offset;
>>>>>>>>>>>> struct kvm_context __percpu *vmcs;
>>>>>>>>>>>> diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
>>>>>>>>>>>> index 0cb4fdb8a9b5..9f53950959da 100644
>>>>>>>>>>>> --- a/arch/loongarch/include/asm/kvm_vcpu.h
>>>>>>>>>>>> +++ b/arch/loongarch/include/asm/kvm_vcpu.h
>>>>>>>>>>>> @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
>>>>>>>>>>>> void kvm_restore_timer(struct kvm_vcpu *vcpu);
>>>>>>>>>>>>
>>>>>>>>>>>> int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
>>>>>>>>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
>>>>>>>>>>>>
>>>>>>>>>>>> /*
>>>>>>>>>>>> * Loongarch KVM guest interrupt handling
>>>>>>>>>>>> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
>>>>>>>>>>>> index 3a8779065f73..b633fd28b8db 100644
>>>>>>>>>>>> --- a/arch/loongarch/kvm/vcpu.c
>>>>>>>>>>>> +++ b/arch/loongarch/kvm/vcpu.c
>>>>>>>>>>>> @@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
>>>>>>>>>>>> return 0;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> +static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
>>>>>>>>>>>> +{
>>>>>>>>>>>> + int cpuid;
>>>>>>>>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
>>>>>>>>>>>> + struct kvm_phyid_map *map;
>>>>>>>>>>>> +
>>>>>>>>>>>> + if (val >= KVM_MAX_PHYID)
>>>>>>>>>>>> + return -EINVAL;
>>>>>>>>>>>> +
>>>>>>>>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
>>>>>>>>>>>> + map = vcpu->kvm->arch.phyid_map;
>>>>>>>>>>>> + spin_lock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>>>> + if (map->phys_map[cpuid].enabled) {
>>>>>>>>>>>> + /*
>>>>>>>>>>>> + * Cpuid is already set before
>>>>>>>>>>>> + * Forbid changing different cpuid at runtime
>>>>>>>>>>>> + */
>>>>>>>>>>>> + if (cpuid != val) {
>>>>>>>>>>>> + /*
>>>>>>>>>>>> + * Cpuid 0 is initial value for vcpu, maybe invalid
>>>>>>>>>>>> + * unset value for vcpu
>>>>>>>>>>>> + */
>>>>>>>>>>>> + if (cpuid) {
>>>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>>>> + return -EINVAL;
>>>>>>>>>>>> + }
>>>>>>>>>>>> + } else {
>>>>>>>>>>>> + /* Discard duplicated cpuid set */
>>>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>>>> + return 0;
>>>>>>>>>>>> + }
>>>>>>>>>>>> + }
>>>>>>>>>>> I have changed the logic and comments when I apply, you can double
>>>>>>>>>>> check whether it is correct.
>>>>>>>>>> I checkout the latest version, the modification in function
>>>>>>>>>> kvm_set_cpuid() is good for me.
>>>>>>>>> Now the modified version is like this:
>>>>>>>>>
>>>>>>>>> + if (map->phys_map[cpuid].enabled) {
>>>>>>>>> + /* Discard duplicated CPUID set operation */
>>>>>>>>> + if (cpuid == val) {
>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>> + return 0;
>>>>>>>>> + }
>>>>>>>>> +
>>>>>>>>> + /*
>>>>>>>>> + * CPUID is already set before
>>>>>>>>> + * Forbid changing different CPUID at runtime
>>>>>>>>> + * But CPUID 0 is the initial value for vcpu, so allow
>>>>>>>>> + * changing from 0 to others
>>>>>>>>> + */
>>>>>>>>> + if (cpuid) {
>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>> + return -EINVAL;
>>>>>>>>> + }
>>>>>>>>> + }
>>>>>>>>> But I still doubt whether we should allow changing from 0 to others
>>>>>>>>> while map->phys_map[cpuid].enabled is 1.
>>>>>>>> It is necessary since the default sw cpuid is zero :-( And we can
>>>>>>>> optimize it in later, such as set INVALID cpuid in function
>>>>>>>> kvm_arch_vcpu_create() and logic will be simple in function kvm_set_cpuid().
>>>>>>> In my opinion, if a vcpu with a uninitialized default physid=0, then
>>>>>>> map->phys_map[cpuid].enabled should be 0, then code won't come here.
>>>>>>> And if a vcpu with a real physid=0, then map->phys_map[cpuid].enabled
>>>>>>> is 1, but we shouldn't allow it to change physid in this case.
>>>>>> yes, that is actually a problem.
>>>>>>
>>>>>> vcpu0 firstly set physid=0, and vcpu0 set physid=1 again is not allowed.
>>>>>> vcpu0 firstly set physid=0, and vcpu1 set physid=1 is allowed.
>>>>>
>>>>> So can we simply drop the if (cpuid) checking? That means:
>>>>> + if (map->phys_map[cpuid].enabled) {
>>>>> + /* Discard duplicated CPUID set operation */
>>>>> + if (cpuid == val) {
>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>> + return 0;
>>>>> + }
>>>>> +
>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>> + return -EINVAL;
>>>>> + }
>>>> yes, the similar modification such as following, since the secondary
>>>> scenario should be allowed.
>>>> "vcpu0 firstly set physid=0, and vcpu1 set physid=1 is allowed though
>>>> default sw cpuid is zero"
>>>>
>>>> --- a/arch/loongarch/kvm/vcpu.c
>>>> +++ b/arch/loongarch/kvm/vcpu.c
>>>> @@ -272,7 +272,7 @@ static inline int kvm_set_cpuid(struct kvm_vcpu
>>>> *vcpu, u64 val)
>>>> cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_CPUID);
>>>>
>>>> spin_lock(&vcpu->kvm->arch.phyid_map_lock);
>>>> - if (map->phys_map[cpuid].enabled) {
>>>> + if ((cpuid != KVM_MAX_PHYID) && map->phys_map[cpuid].enabled) {
>>>> /* Discard duplicated CPUID set operation */
>>>> if (cpuid == val) {
>>>> spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>> @@ -282,13 +282,9 @@ static inline int kvm_set_cpuid(struct kvm_vcpu
>>>> *vcpu, u64 val)
>>>> /*
>>>> * CPUID is already set before
>>>> * Forbid changing different CPUID at runtime
>>>> - * But CPUID 0 is the initial value for vcpu, so allow
>>>> - * changing from 0 to others
>>>> */
>>>> - if (cpuid) {
>>>> - spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>> - return -EINVAL;
>>>> - }
>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>> + return -EINVAL;
>>>> }
>>>>
>>>> if (map->phys_map[val].enabled) {
>>>> @@ -1029,6 +1025,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>>>>
>>>> /* Set cpuid */
>>>> kvm_write_sw_gcsr(csr, LOONGARCH_CSR_TMID, vcpu->vcpu_id);
>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, KVM_MAX_PHYID);
>>>>
>>>> /* Start with no pending virtual guest interrupts */
>>>> csr->csrs[LOONGARCH_CSR_GINTC] = 0;
>>> Very nice, but I think kvm_drop_cpuid() should also set to KVM_MAX_PHYID.
>>> Now I update my loongarch-kvm branch, you can test it again, and hope
>>> it is in the perfect status.
>> I sync and test the latest code from loongarch-kvm, pv ipi works well
>> with 256 vcpus. And the code looks good to me, thanks for your review in
>> short time.
> OK, if SWDBG also works well, I will send PR to Paolo tomorrow.
yes, sw debug works well with patch from qemu. And I will refresh patch
to qemu after it is merged.

https://lore.kernel.org/all/[email protected]/

--- a/configs/targets/loongarch64-softmmu.mak
+++ b/configs/targets/loongarch64-softmmu.mak
@@ -1,5 +1,6 @@
TARGET_ARCH=loongarch64
TARGET_BASE_ARCH=loongarch
TARGET_SUPPORTS_MTTCG=y
+TARGET_KVM_HAVE_GUEST_DEBUG=y
TARGET_XML_FILES= gdb-xml/loongarch-base32.xml
gdb-xml/loongarch-base64.xml gdb-xml/loongarch-fpu.xml
TARGET_NEED_FDT=y

Regards
Bibo Mao
>
> Huacai
>
>>
>> Regards
>> Bibo Mao
>>>
>>> Huacai
>>>>
>>>>
>>>>>
>>>>> Huacai
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Huacai
>>>>>>>
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Bibo Mao
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Huacai
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> +
>>>>>>>>>>>> + if (map->phys_map[val].enabled) {
>>>>>>>>>>>> + /*
>>>>>>>>>>>> + * New cpuid is already set with other vcpu
>>>>>>>>>>>> + * Forbid sharing the same cpuid between different vcpus
>>>>>>>>>>>> + */
>>>>>>>>>>>> + if (map->phys_map[val].vcpu != vcpu) {
>>>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>>>> + return -EINVAL;
>>>>>>>>>>>> + }
>>>>>>>>>>>> +
>>>>>>>>>>>> + /* Discard duplicated cpuid set operation*/
>>>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>>>> + return 0;
>>>>>>>>>>>> + }
>>>>>>>>>>>> +
>>>>>>>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
>>>>>>>>>>>> + map->phys_map[val].enabled = true;
>>>>>>>>>>>> + map->phys_map[val].vcpu = vcpu;
>>>>>>>>>>>> + if (map->max_phyid < val)
>>>>>>>>>>>> + map->max_phyid = val;
>>>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>>>> + return 0;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
>>>>>>>>>>>> +{
>>>>>>>>>>>> + struct kvm_phyid_map *map;
>>>>>>>>>>>> +
>>>>>>>>>>>> + if (cpuid >= KVM_MAX_PHYID)
>>>>>>>>>>>> + return NULL;
>>>>>>>>>>>> +
>>>>>>>>>>>> + map = kvm->arch.phyid_map;
>>>>>>>>>>>> + if (map->phys_map[cpuid].enabled)
>>>>>>>>>>>> + return map->phys_map[cpuid].vcpu;
>>>>>>>>>>>> +
>>>>>>>>>>>> + return NULL;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
>>>>>>>>>>>> +{
>>>>>>>>>>>> + int cpuid;
>>>>>>>>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
>>>>>>>>>>>> + struct kvm_phyid_map *map;
>>>>>>>>>>>> +
>>>>>>>>>>>> + map = vcpu->kvm->arch.phyid_map;
>>>>>>>>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
>>>>>>>>>>>> + if (cpuid >= KVM_MAX_PHYID)
>>>>>>>>>>>> + return;
>>>>>>>>>>>> +
>>>>>>>>>>>> + if (map->phys_map[cpuid].enabled) {
>>>>>>>>>>>> + map->phys_map[cpuid].vcpu = NULL;
>>>>>>>>>>>> + map->phys_map[cpuid].enabled = false;
>>>>>>>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
>>>>>>>>>>>> + }
>>>>>>>>>>>> +}
>>>>>>>>>>> While kvm_set_cpuid() is protected by a spinlock, do kvm_drop_cpuid()
>>>>>>>>>>> and kvm_get_vcpu_by_cpuid() also need it?
>>>>>>>>>>>
>>>>>>>>>> It is good to me that spinlock is added in function kvm_drop_cpuid().
>>>>>>>>>> And thinks for the efforts.
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Bibo Mao
>>>>>>>>>>>> +
>>>>>>>>>>>> static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
>>>>>>>>>>>> {
>>>>>>>>>>>> int ret = 0, gintc;
>>>>>>>>>>>> @@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
>>>>>>>>>>>> kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
>>>>>>>>>>>>
>>>>>>>>>>>> return ret;
>>>>>>>>>>>> - }
>>>>>>>>>>>> + } else if (id == LOONGARCH_CSR_CPUID)
>>>>>>>>>>>> + return kvm_set_cpuid(vcpu, val);
>>>>>>>>>>>>
>>>>>>>>>>>> kvm_write_sw_gcsr(csr, id, val);
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -943,6 +1033,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>>>>>>>>>>>> hrtimer_cancel(&vcpu->arch.swtimer);
>>>>>>>>>>>> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
>>>>>>>>>>>> kfree(vcpu->arch.csr);
>>>>>>>>>>>> + kvm_drop_cpuid(vcpu);
>>>>>>>>>>> I think this line should be before the above kfree(), otherwise you
>>>>>>>>>>> get a "use after free".
>>>>>>>>>>>
>>>>>>>>>>> Huacai
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> /*
>>>>>>>>>>>> * If the vCPU is freed and reused as another vCPU, we don't want the
>>>>>>>>>>>> diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
>>>>>>>>>>>> index 0a37f6fa8f2d..6006a28653ad 100644
>>>>>>>>>>>> --- a/arch/loongarch/kvm/vm.c
>>>>>>>>>>>> +++ b/arch/loongarch/kvm/vm.c
>>>>>>>>>>>> @@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>>>>>>>>>>> if (!kvm->arch.pgd)
>>>>>>>>>>>> return -ENOMEM;
>>>>>>>>>>>>
>>>>>>>>>>>> + kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
>>>>>>>>>>>> + GFP_KERNEL_ACCOUNT);
>>>>>>>>>>>> + if (!kvm->arch.phyid_map) {
>>>>>>>>>>>> + free_page((unsigned long)kvm->arch.pgd);
>>>>>>>>>>>> + kvm->arch.pgd = NULL;
>>>>>>>>>>>> + return -ENOMEM;
>>>>>>>>>>>> + }
>>>>>>>>>>>> +
>>>>>>>>>>>> kvm_init_vmcs(kvm);
>>>>>>>>>>>> kvm->arch.gpa_size = BIT(cpu_vabits - 1);
>>>>>>>>>>>> kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
>>>>>>>>>>>> @@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>>>>>>>>>>> for (i = 0; i <= kvm->arch.root_level; i++)
>>>>>>>>>>>> kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);
>>>>>>>>>>>>
>>>>>>>>>>>> + spin_lock_init(&kvm->arch.phyid_map_lock);
>>>>>>>>>>>> return 0;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>>>>>>>>>>>> {
>>>>>>>>>>>> kvm_destroy_vcpus(kvm);
>>>>>>>>>>>> free_page((unsigned long)kvm->arch.pgd);
>>>>>>>>>>>> + kvfree(kvm->arch.phyid_map);
>>>>>>>>>>>> kvm->arch.pgd = NULL;
>>>>>>>>>>>> + kvm->arch.phyid_map = NULL;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>>>>>>>>>> --
>>>>>>>>>>>> 2.39.3
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
>>


2024-05-08 05:01:08

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid

On Tue, May 7, 2024 at 11:06 AM maobibo <[email protected]> wrote:
>
>
>
> On 2024/5/7 上午10:05, Huacai Chen wrote:
> > On Tue, May 7, 2024 at 9:40 AM maobibo <[email protected]> wrote:
> >>
> >>
> >>
> >> On 2024/5/6 下午10:17, Huacai Chen wrote:
> >>> On Mon, May 6, 2024 at 6:05 PM maobibo <[email protected]> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2024/5/6 下午5:40, Huacai Chen wrote:
> >>>>> On Mon, May 6, 2024 at 5:35 PM maobibo <[email protected]> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 2024/5/6 下午4:59, Huacai Chen wrote:
> >>>>>>> On Mon, May 6, 2024 at 4:18 PM maobibo <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 2024/5/6 下午3:06, Huacai Chen wrote:
> >>>>>>>>> Hi, Bibo,
> >>>>>>>>>
> >>>>>>>>> On Mon, May 6, 2024 at 2:36 PM maobibo <[email protected]> wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 2024/5/6 上午9:49, Huacai Chen wrote:
> >>>>>>>>>>> Hi, Bibo,
> >>>>>>>>>>>
> >>>>>>>>>>> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Physical cpuid is used for interrupt routing for irqchips such as
> >>>>>>>>>>>> ipi/msi/extioi interrupt controller. And physical cpuid is stored
> >>>>>>>>>>>> at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
> >>>>>>>>>>>> is created and physical cpuid of two vcpus cannot be the same.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Different irqchips have different size declaration about physical cpuid,
> >>>>>>>>>>>> max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
> >>>>>>>>>>>> supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
> >>>>>>>>>>>> for MSI irqchip.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The smallest value from all interrupt controllers is selected now,
> >>>>>>>>>>>> and the max cpuid size is defines as 256 by KVM which comes from
> >>>>>>>>>>>> extioi irqchip.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Signed-off-by: Bibo Mao <[email protected]>
> >>>>>>>>>>>> ---
> >>>>>>>>>>>> arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
> >>>>>>>>>>>> arch/loongarch/include/asm/kvm_vcpu.h | 1 +
> >>>>>>>>>>>> arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
> >>>>>>>>>>>> arch/loongarch/kvm/vm.c | 11 ++++
> >>>>>>>>>>>> 4 files changed, 130 insertions(+), 1 deletion(-)
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
> >>>>>>>>>>>> index 2d62f7b0d377..3ba16ef1fe69 100644
> >>>>>>>>>>>> --- a/arch/loongarch/include/asm/kvm_host.h
> >>>>>>>>>>>> +++ b/arch/loongarch/include/asm/kvm_host.h
> >>>>>>>>>>>> @@ -64,6 +64,30 @@ struct kvm_world_switch {
> >>>>>>>>>>>>
> >>>>>>>>>>>> #define MAX_PGTABLE_LEVELS 4
> >>>>>>>>>>>>
> >>>>>>>>>>>> +/*
> >>>>>>>>>>>> + * Physical cpu id is used for interrupt routing, there are different
> >>>>>>>>>>>> + * definitions about physical cpuid on different hardwares.
> >>>>>>>>>>>> + * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
> >>>>>>>>>>>> + * For IPI HW, max dest CPUID size 1024
> >>>>>>>>>>>> + * For extioi interrupt controller, max dest CPUID size is 256
> >>>>>>>>>>>> + * For MSI interrupt controller, max supported CPUID size is 65536
> >>>>>>>>>>>> + *
> >>>>>>>>>>>> + * Currently max CPUID is defined as 256 for KVM hypervisor, in future
> >>>>>>>>>>>> + * it will be expanded to 4096, including 16 packages at most. And every
> >>>>>>>>>>>> + * package supports at most 256 vcpus
> >>>>>>>>>>>> + */
> >>>>>>>>>>>> +#define KVM_MAX_PHYID 256
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +struct kvm_phyid_info {
> >>>>>>>>>>>> + struct kvm_vcpu *vcpu;
> >>>>>>>>>>>> + bool enabled;
> >>>>>>>>>>>> +};
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +struct kvm_phyid_map {
> >>>>>>>>>>>> + int max_phyid;
> >>>>>>>>>>>> + struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
> >>>>>>>>>>>> +};
> >>>>>>>>>>>> +
> >>>>>>>>>>>> struct kvm_arch {
> >>>>>>>>>>>> /* Guest physical mm */
> >>>>>>>>>>>> kvm_pte_t *pgd;
> >>>>>>>>>>>> @@ -71,6 +95,8 @@ struct kvm_arch {
> >>>>>>>>>>>> unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
> >>>>>>>>>>>> unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
> >>>>>>>>>>>> unsigned int root_level;
> >>>>>>>>>>>> + spinlock_t phyid_map_lock;
> >>>>>>>>>>>> + struct kvm_phyid_map *phyid_map;
> >>>>>>>>>>>>
> >>>>>>>>>>>> s64 time_offset;
> >>>>>>>>>>>> struct kvm_context __percpu *vmcs;
> >>>>>>>>>>>> diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
> >>>>>>>>>>>> index 0cb4fdb8a9b5..9f53950959da 100644
> >>>>>>>>>>>> --- a/arch/loongarch/include/asm/kvm_vcpu.h
> >>>>>>>>>>>> +++ b/arch/loongarch/include/asm/kvm_vcpu.h
> >>>>>>>>>>>> @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
> >>>>>>>>>>>> void kvm_restore_timer(struct kvm_vcpu *vcpu);
> >>>>>>>>>>>>
> >>>>>>>>>>>> int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
> >>>>>>>>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
> >>>>>>>>>>>>
> >>>>>>>>>>>> /*
> >>>>>>>>>>>> * Loongarch KVM guest interrupt handling
> >>>>>>>>>>>> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
> >>>>>>>>>>>> index 3a8779065f73..b633fd28b8db 100644
> >>>>>>>>>>>> --- a/arch/loongarch/kvm/vcpu.c
> >>>>>>>>>>>> +++ b/arch/loongarch/kvm/vcpu.c
> >>>>>>>>>>>> @@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
> >>>>>>>>>>>> return 0;
> >>>>>>>>>>>> }
> >>>>>>>>>>>>
> >>>>>>>>>>>> +static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> + int cpuid;
> >>>>>>>>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
> >>>>>>>>>>>> + struct kvm_phyid_map *map;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + if (val >= KVM_MAX_PHYID)
> >>>>>>>>>>>> + return -EINVAL;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
> >>>>>>>>>>>> + map = vcpu->kvm->arch.phyid_map;
> >>>>>>>>>>>> + spin_lock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>>>>>> + if (map->phys_map[cpuid].enabled) {
> >>>>>>>>>>>> + /*
> >>>>>>>>>>>> + * Cpuid is already set before
> >>>>>>>>>>>> + * Forbid changing different cpuid at runtime
> >>>>>>>>>>>> + */
> >>>>>>>>>>>> + if (cpuid != val) {
> >>>>>>>>>>>> + /*
> >>>>>>>>>>>> + * Cpuid 0 is initial value for vcpu, maybe invalid
> >>>>>>>>>>>> + * unset value for vcpu
> >>>>>>>>>>>> + */
> >>>>>>>>>>>> + if (cpuid) {
> >>>>>>>>>>>> + spin_unlock(&vcpu->kvm->archphyid_map_lock);
> >>>>>>>>>>>> + return -EINVAL;
> >>>>>>>>>>>> + }
> >>>>>>>>>>>> + } else {
> >>>>>>>>>>>> + /* Discard duplicated cpuid set */
> >>>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>>>>>> + return 0;
> >>>>>>>>>>>> + }
> >>>>>>>>>>>> + }
> >>>>>>>>>>> I have changed the logic and comments when I apply, you can double
> >>>>>>>>>>> check whether it is correct.
> >>>>>>>>>> I checkout the latest version, the modification in function
> >>>>>>>>>> kvm_set_cpuid() is good for me.
> >>>>>>>>> Now the modified version is like this:
> >>>>>>>>>
> >>>>>>>>> + if (map->phys_map[cpuid].enabled) {
> >>>>>>>>> + /* Discard duplicated CPUID set operation */
> >>>>>>>>> + if (cpuid == val) {
> >>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>>> + return 0;
> >>>>>>>>> + }
> >>>>>>>>> +
> >>>>>>>>> + /*
> >>>>>>>>> + * CPUID is already set before
> >>>>>>>>> + * Forbid changing different CPUID at runtime
> >>>>>>>>> + * But CPUID 0 is the initial value for vcpu, so allow
> >>>>>>>>> + * changing from 0 to others
> >>>>>>>>> + */
> >>>>>>>>> + if (cpuid) {
> >>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>>> + return -EINVAL;
> >>>>>>>>> + }
> >>>>>>>>> + }
> >>>>>>>>> But I still doubt whether we should allow changing from 0 to others
> >>>>>>>>> while map->phys_map[cpuid].enabled is 1.
> >>>>>>>> It is necessary since the default sw cpuid is zero :-( And we can
> >>>>>>>> optimize it in later, such as set INVALID cpuid in function
> >>>>>>>> kvm_arch_vcpu_create() and logic will be simple in function kvm_set_cpuid().
> >>>>>>> In my opinion, if a vcpu with a uninitialized default physid=0, then
> >>>>>>> map->phys_map[cpuid].enabled should be 0, then code won't come here.
> >>>>>>> And if a vcpu with a real physid=0, then map->phys_map[cpuid].enabled
> >>>>>>> is 1, but we shouldn't allow it to change physid in this case.
> >>>>>> yes, that is actually a problem.
> >>>>>>
> >>>>>> vcpu0 firstly set physid=0, and vcpu0 set physid=1 again is not allowed.
> >>>>>> vcpu0 firstly set physid=0, and vcpu1 set physid=1 is allowed.
> >>>>>
> >>>>> So can we simply drop the if (cpuid) checking? That means:
> >>>>> + if (map->phys_map[cpuid].enabled) {
> >>>>> + /* Discard duplicated CPUID set operation */
> >>>>> + if (cpuid == val) {
> >>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>> + return 0;
> >>>>> + }
> >>>>> +
> >>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>> + return -EINVAL;
> >>>>> + }
> >>>> yes, the similar modification such as following, since the secondary
> >>>> scenario should be allowed.
> >>>> "vcpu0 firstly set physid=0, and vcpu1 set physid=1 is allowed though
> >>>> default sw cpuid is zero"
> >>>>
> >>>> --- a/arch/loongarch/kvm/vcpu.c
> >>>> +++ b/arch/loongarch/kvm/vcpu.c
> >>>> @@ -272,7 +272,7 @@ static inline int kvm_set_cpuid(struct kvm_vcpu
> >>>> *vcpu, u64 val)
> >>>> cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_CPUID);
> >>>>
> >>>> spin_lock(&vcpu->kvm->arch.phyid_map_lock);
> >>>> - if (map->phys_map[cpuid].enabled) {
> >>>> + if ((cpuid != KVM_MAX_PHYID) && map->phys_map[cpuid].enabled) {
> >>>> /* Discard duplicated CPUID set operation */
> >>>> if (cpuid == val) {
> >>>> spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>> @@ -282,13 +282,9 @@ static inline int kvm_set_cpuid(struct kvm_vcpu
> >>>> *vcpu, u64 val)
> >>>> /*
> >>>> * CPUID is already set before
> >>>> * Forbid changing different CPUID at runtime
> >>>> - * But CPUID 0 is the initial value for vcpu, so allow
> >>>> - * changing from 0 to others
> >>>> */
> >>>> - if (cpuid) {
> >>>> - spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>> - return -EINVAL;
> >>>> - }
> >>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>> + return -EINVAL;
> >>>> }
> >>>>
> >>>> if (map->phys_map[val].enabled) {
> >>>> @@ -1029,6 +1025,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> >>>>
> >>>> /* Set cpuid */
> >>>> kvm_write_sw_gcsr(csr, LOONGARCH_CSR_TMID, vcpu->vcpu_id);
> >>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, KVM_MAX_PHYID);
> >>>>
> >>>> /* Start with no pending virtual guest interrupts */
> >>>> csr->csrs[LOONGARCH_CSR_GINTC] = 0;
> >>> Very nice, but I think kvm_drop_cpuid() should also set to KVM_MAX_PHYID.
> >>> Now I update my loongarch-kvm branch, you can test it again, and hope
> >>> it is in the perfect status.
> >> I sync and test the latest code from loongarch-kvm, pv ipi works well
> >> with 256 vcpus. And the code looks good to me, thanks for your review in
> >> short time.
> > OK, if SWDBG also works well, I will send PR to Paolo tomorrow.
> yes, sw debug works well with patch from qemu. And I will refresh patch
> to qemu after it is merged.
>
> https://lore.kernel.org/all/[email protected]/
>
> --- a/configs/targets/loongarch64-softmmu.mak
> +++ b/configs/targets/loongarch64-softmmu.mak
> @@ -1,5 +1,6 @@
> TARGET_ARCH=loongarch64
> TARGET_BASE_ARCH=loongarch
> TARGET_SUPPORTS_MTTCG=y
> +TARGET_KVM_HAVE_GUEST_DEBUG=y
> TARGET_XML_FILES= gdb-xml/loongarch-base32.xml
> gdb-xml/loongarch-base64.xml gdb-xml/loongarch-fpu.xml
> TARGET_NEED_FDT=y
Not enough, we need kvm_arch_update_guest_debug() and some other functions.

Huacai

>
> Regards
> Bibo Mao
> >
> > Huacai
> >
> >>
> >> Regards
> >> Bibo Mao
> >>>
> >>> Huacai
> >>>>
> >>>>
> >>>>>
> >>>>> Huacai
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Huacai
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>> Bibo Mao
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Huacai
> >>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + if (map->phys_map[val].enabled) {
> >>>>>>>>>>>> + /*
> >>>>>>>>>>>> + * New cpuid is already set with other vcpu
> >>>>>>>>>>>> + * Forbid sharing the same cpuid between different vcpus
> >>>>>>>>>>>> + */
> >>>>>>>>>>>> + if (map->phys_map[val].vcpu != vcpu) {
> >>>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>>>>>> + return -EINVAL;
> >>>>>>>>>>>> + }
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + /* Discard duplicated cpuid set operation*/
> >>>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>>>>>> + return 0;
> >>>>>>>>>>>> + }
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
> >>>>>>>>>>>> + map->phys_map[val].enabled = true;
> >>>>>>>>>>>> + map->phys_map[val].vcpu = vcpu;
> >>>>>>>>>>>> + if (map->max_phyid < val)
> >>>>>>>>>>>> + map->max_phyid = val;
> >>>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> >>>>>>>>>>>> + return 0;
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> + struct kvm_phyid_map *map;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + if (cpuid >= KVM_MAX_PHYID)
> >>>>>>>>>>>> + return NULL;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + map = kvm->arch.phyid_map;
> >>>>>>>>>>>> + if (map->phys_map[cpuid].enabled)
> >>>>>>>>>>>> + return map->phys_map[cpuid].vcpu;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + return NULL;
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> + int cpuid;
> >>>>>>>>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
> >>>>>>>>>>>> + struct kvm_phyid_map *map;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + map = vcpu->kvm->arch.phyid_map;
> >>>>>>>>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
> >>>>>>>>>>>> + if (cpuid >= KVM_MAX_PHYID)
> >>>>>>>>>>>> + return;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> + if (map->phys_map[cpuid].enabled) {
> >>>>>>>>>>>> + map->phys_map[cpuid].vcpu = NULL;
> >>>>>>>>>>>> + map->phys_map[cpuid].enabled = false;
> >>>>>>>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
> >>>>>>>>>>>> + }
> >>>>>>>>>>>> +}
> >>>>>>>>>>> While kvm_set_cpuid() is protected by a spinlock, do kvm_drop_cpuid()
> >>>>>>>>>>> and kvm_get_vcpu_by_cpuid() also need it?
> >>>>>>>>>>>
> >>>>>>>>>> It is good to me that spinlock is added in function kvm_drop_cpuid().
> >>>>>>>>>> And thinks for the efforts.
> >>>>>>>>>>
> >>>>>>>>>> Regards
> >>>>>>>>>> Bibo Mao
> >>>>>>>>>>>> +
> >>>>>>>>>>>> static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
> >>>>>>>>>>>> {
> >>>>>>>>>>>> int ret = 0, gintc;
> >>>>>>>>>>>> @@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
> >>>>>>>>>>>> kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
> >>>>>>>>>>>>
> >>>>>>>>>>>> return ret;
> >>>>>>>>>>>> - }
> >>>>>>>>>>>> + } else if (id == LOONGARCH_CSR_CPUID)
> >>>>>>>>>>>> + return kvm_set_cpuid(vcpu, val);
> >>>>>>>>>>>>
> >>>>>>>>>>>> kvm_write_sw_gcsr(csr, id, val);
> >>>>>>>>>>>>
> >>>>>>>>>>>> @@ -943,6 +1033,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> >>>>>>>>>>>> hrtimer_cancel(&vcpu->arch.swtimer);
> >>>>>>>>>>>> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> >>>>>>>>>>>> kfree(vcpu->arch.csr);
> >>>>>>>>>>>> + kvm_drop_cpuid(vcpu);
> >>>>>>>>>>> I think this line should be before the above kfree(), otherwise you
> >>>>>>>>>>> get a "use after free".
> >>>>>>>>>>>
> >>>>>>>>>>> Huacai
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> /*
> >>>>>>>>>>>> * If the vCPU is freed and reused as another vCPU, we don't want the
> >>>>>>>>>>>> diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vmc
> >>>>>>>>>>>> index 0a37f6fa8f2d..6006a28653ad 100644
> >>>>>>>>>>>> --- a/arch/loongarch/kvm/vm.c
> >>>>>>>>>>>> +++ b/arch/loongarch/kvm/vm.c
> >>>>>>>>>>>> @@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >>>>>>>>>>>> if (!kvm->arch.pgd)
> >>>>>>>>>>>> return -ENOMEM;
> >>>>>>>>>>>>
> >>>>>>>>>>>> + kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
> >>>>>>>>>>>> + GFP_KERNEL_ACCOUNT);
> >>>>>>>>>>>> + if (!kvm->arch.phyid_map) {
> >>>>>>>>>>>> + free_page((unsigned long)kvm->arch.pgd);
> >>>>>>>>>>>> + kvm->arch.pgd = NULL;
> >>>>>>>>>>>> + return -ENOMEM;
> >>>>>>>>>>>> + }
> >>>>>>>>>>>> +
> >>>>>>>>>>>> kvm_init_vmcs(kvm);
> >>>>>>>>>>>> kvm->arch.gpa_size = BIT(cpu_vabits - 1);
> >>>>>>>>>>>> kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
> >>>>>>>>>>>> @@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >>>>>>>>>>>> for (i = 0; i <= kvm->arch.root_level; i++)
> >>>>>>>>>>>> kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);
> >>>>>>>>>>>>
> >>>>>>>>>>>> + spin_lock_init(&kvm->arch.phyid_map_lock);
> >>>>>>>>>>>> return 0;
> >>>>>>>>>>>> }
> >>>>>>>>>>>>
> >>>>>>>>>>>> @@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> >>>>>>>>>>>> {
> >>>>>>>>>>>> kvm_destroy_vcpus(kvm);
> >>>>>>>>>>>> free_page((unsigned long)kvm->arch.pgd);
> >>>>>>>>>>>> + kvfree(kvm->arch.phyid_map);
> >>>>>>>>>>>> kvm->arch.pgd = NULL;
> >>>>>>>>>>>> + kvm->arch.phyid_map = NULL;
> >>>>>>>>>>>> }
> >>>>>>>>>>>>
> >>>>>>>>>>>> int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >>>>>>>>>>>> --
> >>>>>>>>>>>> 2.39.3
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> >>
>
>

2024-05-08 06:10:46

by Bibo Mao

[permalink] [raw]
Subject: Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid



On 2024/5/8 下午1:00, Huacai Chen wrote:
> On Tue, May 7, 2024 at 11:06 AM maobibo <[email protected]> wrote:
>>
>>
>>
>> On 2024/5/7 上午10:05, Huacai Chen wrote:
>>> On Tue, May 7, 2024 at 9:40 AM maobibo <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On 2024/5/6 下午10:17, Huacai Chen wrote:
>>>>> On Mon, May 6, 2024 at 6:05 PM maobibo <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2024/5/6 下午5:40, Huacai Chen wrote:
>>>>>>> On Mon, May 6, 2024 at 5:35 PM maobibo <[email protected]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2024/5/6 下午4:59, Huacai Chen wrote:
>>>>>>>>> On Mon, May 6, 2024 at 4:18 PM maobibo <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2024/5/6 下午3:06, Huacai Chen wrote:
>>>>>>>>>>> Hi, Bibo,
>>>>>>>>>>>
>>>>>>>>>>> On Mon, May 6, 2024 at 2:36 PM maobibo <[email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2024/5/6 上午9:49, Huacai Chen wrote:
>>>>>>>>>>>>> Hi, Bibo,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao <[email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Physical cpuid is used for interrupt routing for irqchips such as
>>>>>>>>>>>>>> ipi/msi/extioi interrupt controller. And physical cpuid is stored
>>>>>>>>>>>>>> at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
>>>>>>>>>>>>>> is created and physical cpuid of two vcpus cannot be the same.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Different irqchips have different size declaration about physical cpuid,
>>>>>>>>>>>>>> max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
>>>>>>>>>>>>>> supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
>>>>>>>>>>>>>> for MSI irqchip.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The smallest value from all interrupt controllers is selected now,
>>>>>>>>>>>>>> and the max cpuid size is defines as 256 by KVM which comes from
>>>>>>>>>>>>>> extioi irqchip.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Bibo Mao <[email protected]>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> arch/loongarch/include/asm/kvm_host.h | 26 ++++++++
>>>>>>>>>>>>>> arch/loongarch/include/asm/kvm_vcpu.h | 1 +
>>>>>>>>>>>>>> arch/loongarch/kvm/vcpu.c | 93 ++++++++++++++++++++++++++-
>>>>>>>>>>>>>> arch/loongarch/kvm/vm.c | 11 ++++
>>>>>>>>>>>>>> 4 files changed, 130 insertions(+), 1 deletion(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
>>>>>>>>>>>>>> index 2d62f7b0d377..3ba16ef1fe69 100644
>>>>>>>>>>>>>> --- a/arch/loongarch/include/asm/kvm_host.h
>>>>>>>>>>>>>> +++ b/arch/loongarch/include/asm/kvm_host.h
>>>>>>>>>>>>>> @@ -64,6 +64,30 @@ struct kvm_world_switch {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> #define MAX_PGTABLE_LEVELS 4
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +/*
>>>>>>>>>>>>>> + * Physical cpu id is used for interrupt routing, there are different
>>>>>>>>>>>>>> + * definitions about physical cpuid on different hardwares.
>>>>>>>>>>>>>> + * For LOONGARCH_CSR_CPUID register, max cpuid size if 512
>>>>>>>>>>>>>> + * For IPI HW, max dest CPUID size 1024
>>>>>>>>>>>>>> + * For extioi interrupt controller, max dest CPUID size is 256
>>>>>>>>>>>>>> + * For MSI interrupt controller, max supported CPUID size is 65536
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Currently max CPUID is defined as 256 for KVM hypervisor, in future
>>>>>>>>>>>>>> + * it will be expanded to 4096, including 16 packages at most. And every
>>>>>>>>>>>>>> + * package supports at most 256 vcpus
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +#define KVM_MAX_PHYID 256
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +struct kvm_phyid_info {
>>>>>>>>>>>>>> + struct kvm_vcpu *vcpu;
>>>>>>>>>>>>>> + bool enabled;
>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +struct kvm_phyid_map {
>>>>>>>>>>>>>> + int max_phyid;
>>>>>>>>>>>>>> + struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> struct kvm_arch {
>>>>>>>>>>>>>> /* Guest physical mm */
>>>>>>>>>>>>>> kvm_pte_t *pgd;
>>>>>>>>>>>>>> @@ -71,6 +95,8 @@ struct kvm_arch {
>>>>>>>>>>>>>> unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
>>>>>>>>>>>>>> unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
>>>>>>>>>>>>>> unsigned int root_level;
>>>>>>>>>>>>>> + spinlock_t phyid_map_lock;
>>>>>>>>>>>>>> + struct kvm_phyid_map *phyid_map;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> s64 time_offset;
>>>>>>>>>>>>>> struct kvm_context __percpu *vmcs;
>>>>>>>>>>>>>> diff --git a/arch/loongarch/include/asm/kvm_vcpu.h b/arch/loongarch/include/asm/kvm_vcpu.h
>>>>>>>>>>>>>> index 0cb4fdb8a9b5..9f53950959da 100644
>>>>>>>>>>>>>> --- a/arch/loongarch/include/asm/kvm_vcpu.h
>>>>>>>>>>>>>> +++ b/arch/loongarch/include/asm/kvm_vcpu.h
>>>>>>>>>>>>>> @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
>>>>>>>>>>>>>> void kvm_restore_timer(struct kvm_vcpu *vcpu);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
>>>>>>>>>>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /*
>>>>>>>>>>>>>> * Loongarch KVM guest interrupt handling
>>>>>>>>>>>>>> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
>>>>>>>>>>>>>> index 3a8779065f73..b633fd28b8db 100644
>>>>>>>>>>>>>> --- a/arch/loongarch/kvm/vcpu.c
>>>>>>>>>>>>>> +++ b/arch/loongarch/kvm/vcpu.c
>>>>>>>>>>>>>> @@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
>>>>>>>>>>>>>> return 0;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> + int cpuid;
>>>>>>>>>>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
>>>>>>>>>>>>>> + struct kvm_phyid_map *map;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + if (val >= KVM_MAX_PHYID)
>>>>>>>>>>>>>> + return -EINVAL;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
>>>>>>>>>>>>>> + map = vcpu->kvm->arch.phyid_map;
>>>>>>>>>>>>>> + spin_lock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>>>>>> + if (map->phys_map[cpuid].enabled) {
>>>>>>>>>>>>>> + /*
>>>>>>>>>>>>>> + * Cpuid is already set before
>>>>>>>>>>>>>> + * Forbid changing different cpuid at runtime
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> + if (cpuid != val) {
>>>>>>>>>>>>>> + /*
>>>>>>>>>>>>>> + * Cpuid 0 is initial value for vcpu, maybe invalid
>>>>>>>>>>>>>> + * unset value for vcpu
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> + if (cpuid) {
>>>>>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>>>>>> + return -EINVAL;
>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>> + } else {
>>>>>>>>>>>>>> + /* Discard duplicated cpuid set */
>>>>>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>>>>>> + return 0;
>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>> + }
>>>>>>>>>>>>> I have changed the logic and comments when I apply, you can double
>>>>>>>>>>>>> check whether it is correct.
>>>>>>>>>>>> I checkout the latest version, the modification in function
>>>>>>>>>>>> kvm_set_cpuid() is good for me.
>>>>>>>>>>> Now the modified version is like this:
>>>>>>>>>>>
>>>>>>>>>>> + if (map->phys_map[cpuid].enabled) {
>>>>>>>>>>> + /* Discard duplicated CPUID set operation */
>>>>>>>>>>> + if (cpuid == val) {
>>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>>> + return 0;
>>>>>>>>>>> + }
>>>>>>>>>>> +
>>>>>>>>>>> + /*
>>>>>>>>>>> + * CPUID is already set before
>>>>>>>>>>> + * Forbid changing different CPUID at runtime
>>>>>>>>>>> + * But CPUID 0 is the initial value for vcpu, so allow
>>>>>>>>>>> + * changing from 0 to others
>>>>>>>>>>> + */
>>>>>>>>>>> + if (cpuid) {
>>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>>> + return -EINVAL;
>>>>>>>>>>> + }
>>>>>>>>>>> + }
>>>>>>>>>>> But I still doubt whether we should allow changing from 0 to others
>>>>>>>>>>> while map->phys_map[cpuid].enabled is 1.
>>>>>>>>>> It is necessary since the default sw cpuid is zero :-( And we can
>>>>>>>>>> optimize it in later, such as set INVALID cpuid in function
>>>>>>>>>> kvm_arch_vcpu_create() and logic will be simple in function kvm_set_cpuid().
>>>>>>>>> In my opinion, if a vcpu with a uninitialized default physid=0, then
>>>>>>>>> map->phys_map[cpuid].enabled should be 0, then code won't come here.
>>>>>>>>> And if a vcpu with a real physid=0, then map->phys_map[cpuid].enabled
>>>>>>>>> is 1, but we shouldn't allow it to change physid in this case.
>>>>>>>> yes, that is actually a problem.
>>>>>>>>
>>>>>>>> vcpu0 firstly set physid=0, and vcpu0 set physid=1 again is not allowed.
>>>>>>>> vcpu0 firstly set physid=0, and vcpu1 set physid=1 is allowed.
>>>>>>>
>>>>>>> So can we simply drop the if (cpuid) checking? That means:
>>>>>>> + if (map->phys_map[cpuid].enabled) {
>>>>>>> + /* Discard duplicated CPUID set operation */
>>>>>>> + if (cpuid == val) {
>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>> + return 0;
>>>>>>> + }
>>>>>>> +
>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>> + return -EINVAL;
>>>>>>> + }
>>>>>> yes, the similar modification such as following, since the secondary
>>>>>> scenario should be allowed.
>>>>>> "vcpu0 firstly set physid=0, and vcpu1 set physid=1 is allowed though
>>>>>> default sw cpuid is zero"
>>>>>>
>>>>>> --- a/arch/loongarch/kvm/vcpu.c
>>>>>> +++ b/arch/loongarch/kvm/vcpu.c
>>>>>> @@ -272,7 +272,7 @@ static inline int kvm_set_cpuid(struct kvm_vcpu
>>>>>> *vcpu, u64 val)
>>>>>> cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_CPUID);
>>>>>>
>>>>>> spin_lock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>> - if (map->phys_map[cpuid].enabled) {
>>>>>> + if ((cpuid != KVM_MAX_PHYID) && map->phys_map[cpuid].enabled) {
>>>>>> /* Discard duplicated CPUID set operation */
>>>>>> if (cpuid == val) {
>>>>>> spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>> @@ -282,13 +282,9 @@ static inline int kvm_set_cpuid(struct kvm_vcpu
>>>>>> *vcpu, u64 val)
>>>>>> /*
>>>>>> * CPUID is already set before
>>>>>> * Forbid changing different CPUID at runtime
>>>>>> - * But CPUID 0 is the initial value for vcpu, so allow
>>>>>> - * changing from 0 to others
>>>>>> */
>>>>>> - if (cpuid) {
>>>>>> - spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>> - return -EINVAL;
>>>>>> - }
>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>> + return -EINVAL;
>>>>>> }
>>>>>>
>>>>>> if (map->phys_map[val].enabled) {
>>>>>> @@ -1029,6 +1025,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>>>>>>
>>>>>> /* Set cpuid */
>>>>>> kvm_write_sw_gcsr(csr, LOONGARCH_CSR_TMID, vcpu->vcpu_id);
>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, KVM_MAX_PHYID);
>>>>>>
>>>>>> /* Start with no pending virtual guest interrupts */
>>>>>> csr->csrs[LOONGARCH_CSR_GINTC] = 0;
>>>>> Very nice, but I think kvm_drop_cpuid() should also set to KVM_MAX_PHYID.
>>>>> Now I update my loongarch-kvm branch, you can test it again, and hope
>>>>> it is in the perfect status.
>>>> I sync and test the latest code from loongarch-kvm, pv ipi works well
>>>> with 256 vcpus. And the code looks good to me, thanks for your review in
>>>> short time.
>>> OK, if SWDBG also works well, I will send PR to Paolo tomorrow.
>> yes, sw debug works well with patch from qemu. And I will refresh patch
>> to qemu after it is merged.
>>
>> https://lore.kernel.org/all/[email protected]/
>>
>> --- a/configs/targets/loongarch64-softmmu.mak
>> +++ b/configs/targets/loongarch64-softmmu.mak
>> @@ -1,5 +1,6 @@
>> TARGET_ARCH=loongarch64
>> TARGET_BASE_ARCH=loongarch
>> TARGET_SUPPORTS_MTTCG=y
>> +TARGET_KVM_HAVE_GUEST_DEBUG=y
>> TARGET_XML_FILES= gdb-xml/loongarch-base32.xml
>> gdb-xml/loongarch-base64.xml gdb-xml/loongarch-fpu.xml
>> TARGET_NEED_FDT=y
> Not enough, we need kvm_arch_update_guest_debug() and some other functions.
yes, the RFC patch on qemu side is posted at website:
https://lore.kernel.org/all/[email protected]/

>
> Huacai
>
>>
>> Regards
>> Bibo Mao
>>>
>>> Huacai
>>>
>>>>
>>>> Regards
>>>> Bibo Mao
>>>>>
>>>>> Huacai
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Huacai
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Huacai
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Bibo Mao
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Huacai
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + if (map->phys_map[val].enabled) {
>>>>>>>>>>>>>> + /*
>>>>>>>>>>>>>> + * New cpuid is already set with other vcpu
>>>>>>>>>>>>>> + * Forbid sharing the same cpuid between different vcpus
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> + if (map->phys_map[val].vcpu != vcpu) {
>>>>>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>>>>>> + return -EINVAL;
>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + /* Discard duplicated cpuid set operation*/
>>>>>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>>>>>> + return 0;
>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
>>>>>>>>>>>>>> + map->phys_map[val].enabled = true;
>>>>>>>>>>>>>> + map->phys_map[val].vcpu = vcpu;
>>>>>>>>>>>>>> + if (map->max_phyid < val)
>>>>>>>>>>>>>> + map->max_phyid = val;
>>>>>>>>>>>>>> + spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
>>>>>>>>>>>>>> + return 0;
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> + struct kvm_phyid_map *map;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + if (cpuid >= KVM_MAX_PHYID)
>>>>>>>>>>>>>> + return NULL;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + map = kvm->arch.phyid_map;
>>>>>>>>>>>>>> + if (map->phys_map[cpuid].enabled)
>>>>>>>>>>>>>> + return map->phys_map[cpuid].vcpu;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + return NULL;
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> + int cpuid;
>>>>>>>>>>>>>> + struct loongarch_csrs *csr = vcpu->arch.csr;
>>>>>>>>>>>>>> + struct kvm_phyid_map *map;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + map = vcpu->kvm->arch.phyid_map;
>>>>>>>>>>>>>> + cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
>>>>>>>>>>>>>> + if (cpuid >= KVM_MAX_PHYID)
>>>>>>>>>>>>>> + return;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + if (map->phys_map[cpuid].enabled) {
>>>>>>>>>>>>>> + map->phys_map[cpuid].vcpu = NULL;
>>>>>>>>>>>>>> + map->phys_map[cpuid].enabled = false;
>>>>>>>>>>>>>> + kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, 0);
>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>> While kvm_set_cpuid() is protected by a spinlock, do kvm_drop_cpuid()
>>>>>>>>>>>>> and kvm_get_vcpu_by_cpuid() also need it?
>>>>>>>>>>>>>
>>>>>>>>>>>> It is good to me that spinlock is added in function kvm_drop_cpuid().
>>>>>>>>>>>> And thinks for the efforts.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>> Bibo Mao
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> int ret = 0, gintc;
>>>>>>>>>>>>>> @@ -291,7 +380,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
>>>>>>>>>>>>>> kvm_set_sw_gcsr(csr, LOONGARCH_CSR_ESTAT, gintc);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> return ret;
>>>>>>>>>>>>>> - }
>>>>>>>>>>>>>> + } else if (id == LOONGARCH_CSR_CPUID)
>>>>>>>>>>>>>> + return kvm_set_cpuid(vcpu, val);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> kvm_write_sw_gcsr(csr, id, val);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -943,6 +1033,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>>>>>>>>>>>>>> hrtimer_cancel(&vcpu->arch.swtimer);
>>>>>>>>>>>>>> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
>>>>>>>>>>>>>> kfree(vcpu->arch.csr);
>>>>>>>>>>>>>> + kvm_drop_cpuid(vcpu);
>>>>>>>>>>>>> I think this line should be before the above kfree(), otherwise you
>>>>>>>>>>>>> get a "use after free".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Huacai
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /*
>>>>>>>>>>>>>> * If the vCPU is freed and reused as another vCPU, we don't want the
>>>>>>>>>>>>>> diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
>>>>>>>>>>>>>> index 0a37f6fa8f2d..6006a28653ad 100644
>>>>>>>>>>>>>> --- a/arch/loongarch/kvm/vm.c
>>>>>>>>>>>>>> +++ b/arch/loongarch/kvm/vm.c
>>>>>>>>>>>>>> @@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>>>>>>>>>>>>> if (!kvm->arch.pgd)
>>>>>>>>>>>>>> return -ENOMEM;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> + kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map),
>>>>>>>>>>>>>> + GFP_KERNEL_ACCOUNT);
>>>>>>>>>>>>>> + if (!kvm->arch.phyid_map) {
>>>>>>>>>>>>>> + free_page((unsigned long)kvm->arch.pgd);
>>>>>>>>>>>>>> + kvm->arch.pgd = NULL;
>>>>>>>>>>>>>> + return -ENOMEM;
>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> kvm_init_vmcs(kvm);
>>>>>>>>>>>>>> kvm->arch.gpa_size = BIT(cpu_vabits - 1);
>>>>>>>>>>>>>> kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
>>>>>>>>>>>>>> @@ -44,6 +52,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>>>>>>>>>>>>> for (i = 0; i <= kvm->arch.root_level; i++)
>>>>>>>>>>>>>> kvm->arch.pte_shifts[i] = PAGE_SHIFT + i * (PAGE_SHIFT - 3);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> + spin_lock_init(&kvm->arch.phyid_map_lock);
>>>>>>>>>>>>>> return 0;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -51,7 +60,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> kvm_destroy_vcpus(kvm);
>>>>>>>>>>>>>> free_page((unsigned long)kvm->arch.pgd);
>>>>>>>>>>>>>> + kvfree(kvm->arch.phyid_map);
>>>>>>>>>>>>>> kvm->arch.pgd = NULL;
>>>>>>>>>>>>>> + kvm->arch.phyid_map = NULL;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> 2.39.3
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>>
>>
>>