2019-07-29 11:57:16

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 00/16] KVM RISC-V Support

This series adds initial KVM RISC-V support. Currently, we are able to boot
RISC-V 64bit Linux Guests with multiple VCPUs.

Few key aspects of KVM RISC-V added by this series are:
1. Minimal possible KVM world-switch which touches only GPRs and few CSRs.
2. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure.
3. KVM ONE_REG interface for VCPU register access from user-space.
4. PLIC emulation is done in user-space. In-kernel PLIC emulation, will
be added in future.
5. Timer and IPI emuation is done in-kernel.
6. MMU notifiers supported.
7. FP lazy save/restore supported.
8. SBI v0.1 emulation for KVM Guest available.

More feature additions and enhancments will follow after this series and
eventually KVM RISC-V will be at-par with other architectures.

This series is based upon KVM pre-patches sent by Atish earlier
(https://lkml.org/lkml/2019/7/26/1271) and it can be found in
riscv_kvm_v1 branch at:
https//github.com/avpatel/linux.git

Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v1 branch at:
https//github.com/avpatel/kvmtool.git

We need OpenSBI with RISC-V hypervisor extension support which can be
found in hyp_ext_changes_v1 branch at:
https://github.com/riscv/opensbi.git

The QEMU RISC-V hypervisor emulation is done by Alistair and is available
in riscv-hyp-work.next branch at:
https://github.com/alistair23/qemu.git

To play around with KVM RISC-V, here are few reference commands:
1) To cross-compile KVMTOOL:
$ make lkvm-static
2) To launch RISC-V Host Linux:
$ qemu-system-riscv64 -monitor null -cpu rv64,h=true -M virt \
-m 512M -display none -serial mon:stdio \
-kernel opensbi/build/platform/qemu/virt/firmware/fw_jump.elf \
-device loader,file=build-riscv64/arch/riscv/boot/Image,addr=0x80200000 \
-initrd ./rootfs_kvm_riscv64.img \
-append "root=/dev/ram rw console=ttyS0 earlycon=sbi"
3) To launch RISC-V Guest Linux with 9P rootfs:
$ ./apps/lkvm-static run -m 128 -c2 --console serial \
-p "console=ttyS0 earlycon=uart8250,mmio,0x3f8" -k ./apps/Image --debug
4) To launch RISC-V Guest Linux with initrd:
$ ./apps/lkvm-static run -m 128 -c2 --console serial \
-p "console=ttyS0 earlycon=uart8250,mmio,0x3f8" -k ./apps/Image \
-i ./apps/rootfs.img --debug

Anup Patel (13):
KVM: RISC-V: Add KVM_REG_RISCV for ONE_REG interface
RISC-V: Add hypervisor extension related CSR defines
RISC-V: Add initial skeletal KVM support
RISC-V: KVM: Implement VCPU create, init and destroy functions
RISC-V: KVM: Implement VCPU interrupts and requests handling
RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
RISC-V: KVM: Implement VCPU world-switch
RISC-V: KVM: Handle MMIO exits for VCPU
RISC-V: KVM: Handle WFI exits for VCPU
RISC-V: KVM: Implement VMID allocator
RISC-V: KVM: Implement stage2 page table programming
RISC-V: KVM: Implement MMU notifiers
RISC-V: Enable VIRTIO drivers in RV64 and RV32 defconfig

Atish Patra (3):
RISC-V: KVM: Add timer functionality
RISC-V: KVM: FP lazy save/restore
RISC-V: KVM: Add SBI v0.1 support

arch/riscv/Kconfig | 2 +
arch/riscv/Makefile | 2 +
arch/riscv/configs/defconfig | 23 +-
arch/riscv/configs/rv32_defconfig | 13 +
arch/riscv/include/asm/csr.h | 58 ++
arch/riscv/include/asm/kvm_host.h | 232 ++++++
arch/riscv/include/asm/kvm_vcpu_timer.h | 32 +
arch/riscv/include/asm/pgtable-bits.h | 1 +
arch/riscv/include/uapi/asm/kvm.h | 74 ++
arch/riscv/kernel/asm-offsets.c | 148 ++++
arch/riscv/kvm/Kconfig | 34 +
arch/riscv/kvm/Makefile | 14 +
arch/riscv/kvm/main.c | 64 ++
arch/riscv/kvm/mmu.c | 904 ++++++++++++++++++++++++
arch/riscv/kvm/tlb.S | 42 ++
arch/riscv/kvm/vcpu.c | 817 +++++++++++++++++++++
arch/riscv/kvm/vcpu_exit.c | 553 +++++++++++++++
arch/riscv/kvm/vcpu_sbi.c | 118 ++++
arch/riscv/kvm/vcpu_switch.S | 367 ++++++++++
arch/riscv/kvm/vcpu_timer.c | 106 +++
arch/riscv/kvm/vm.c | 107 +++
arch/riscv/kvm/vmid.c | 130 ++++
drivers/clocksource/timer-riscv.c | 6 +
include/clocksource/timer-riscv.h | 14 +
include/uapi/linux/kvm.h | 1 +
25 files changed, 3857 insertions(+), 5 deletions(-)
create mode 100644 arch/riscv/include/asm/kvm_host.h
create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
create mode 100644 arch/riscv/include/uapi/asm/kvm.h
create mode 100644 arch/riscv/kvm/Kconfig
create mode 100644 arch/riscv/kvm/Makefile
create mode 100644 arch/riscv/kvm/main.c
create mode 100644 arch/riscv/kvm/mmu.c
create mode 100644 arch/riscv/kvm/tlb.S
create mode 100644 arch/riscv/kvm/vcpu.c
create mode 100644 arch/riscv/kvm/vcpu_exit.c
create mode 100644 arch/riscv/kvm/vcpu_sbi.c
create mode 100644 arch/riscv/kvm/vcpu_switch.S
create mode 100644 arch/riscv/kvm/vcpu_timer.c
create mode 100644 arch/riscv/kvm/vm.c
create mode 100644 arch/riscv/kvm/vmid.c
create mode 100644 include/clocksource/timer-riscv.h

--
2.17.1


2019-07-29 11:57:34

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 02/16] RISC-V: Add hypervisor extension related CSR defines

This patch extends asm/csr.h by adding RISC-V hypervisor extension
related defines.

Signed-off-by: Anup Patel <[email protected]>
---
arch/riscv/include/asm/csr.h | 58 ++++++++++++++++++++++++++++++++++++
1 file changed, 58 insertions(+)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index a18923fa23c8..059c5cb22aaf 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -27,6 +27,8 @@
#define SR_XS_CLEAN _AC(0x00010000, UL)
#define SR_XS_DIRTY _AC(0x00018000, UL)

+#define SR_MXR _AC(0x00080000, UL)
+
#ifndef CONFIG_64BIT
#define SR_SD _AC(0x80000000, UL) /* FS/XS dirty */
#else
@@ -59,10 +61,13 @@

#define EXC_INST_MISALIGNED 0
#define EXC_INST_ACCESS 1
+#define EXC_INST_ILLEGAL 2
#define EXC_BREAKPOINT 3
#define EXC_LOAD_ACCESS 5
#define EXC_STORE_ACCESS 7
#define EXC_SYSCALL 8
+#define EXC_HYPERVISOR_SYSCALL 9
+#define EXC_SUPERVISOR_SYSCALL 10
#define EXC_INST_PAGE_FAULT 12
#define EXC_LOAD_PAGE_FAULT 13
#define EXC_STORE_PAGE_FAULT 15
@@ -72,6 +77,43 @@
#define SIE_STIE (_AC(0x1, UL) << IRQ_S_TIMER)
#define SIE_SEIE (_AC(0x1, UL) << IRQ_S_EXT)

+/* HSTATUS flags */
+#define HSTATUS_VTSR _AC(0x00400000, UL)
+#define HSTATUS_VTVM _AC(0x00100000, UL)
+#define HSTATUS_SP2V _AC(0x00000200, UL)
+#define HSTATUS_SP2P _AC(0x00000100, UL)
+#define HSTATUS_SPV _AC(0x00000080, UL)
+#define HSTATUS_STL _AC(0x00000040, UL)
+#define HSTATUS_SPRV _AC(0x00000001, UL)
+
+/* HGATP flags */
+#define HGATP_MODE_OFF _AC(0, UL)
+#define HGATP_MODE_SV32X4 _AC(1, UL)
+#define HGATP_MODE_SV39X4 _AC(8, UL)
+#define HGATP_MODE_SV48X4 _AC(9, UL)
+
+#define HGATP32_MODE_SHIFT 31
+#define HGATP32_VMID_SHIFT 22
+#define HGATP32_VMID_MASK _AC(0x1FC00000, UL)
+#define HGATP32_PPN _AC(0x003FFFFF, UL)
+
+#define HGATP64_MODE_SHIFT 60
+#define HGATP64_VMID_SHIFT 44
+#define HGATP64_VMID_MASK _AC(0x03FFF00000000000, UL)
+#define HGATP64_PPN _AC(0x00000FFFFFFFFFFF, UL)
+
+#ifdef CONFIG_64BIT
+#define HGATP_PPN HGATP64_PPN
+#define HGATP_VMID_SHIFT HGATP64_VMID_SHIFT
+#define HGATP_VMID_MASK HGATP64_VMID_MASK
+#define HGATP_MODE (HGATP_MODE_SV39X4 << HGATP64_MODE_SHIFT)
+#else
+#define HGATP_PPN HGATP32_PPN
+#define HGATP_VMID_SHIFT HGATP32_VMID_SHIFT
+#define HGATP_VMID_MASK HGATP32_VMID_MASK
+#define HGATP_MODE (HGATP_MODE_SV32X4 << HGATP32_MODE_SHIFT)
+#endif
+
#define CSR_CYCLE 0xc00
#define CSR_TIME 0xc01
#define CSR_INSTRET 0xc02
@@ -85,6 +127,22 @@
#define CSR_STVAL 0x143
#define CSR_SIP 0x144
#define CSR_SATP 0x180
+
+#define CSR_VSSTATUS 0x200
+#define CSR_VSIE 0x204
+#define CSR_VSTVEC 0x205
+#define CSR_VSSCRATCH 0x240
+#define CSR_VSEPC 0x241
+#define CSR_VSCAUSE 0x242
+#define CSR_VSTVAL 0x243
+#define CSR_VSIP 0x244
+#define CSR_VSATP 0x280
+
+#define CSR_HSTATUS 0x600
+#define CSR_HEDELEG 0x602
+#define CSR_HIDELEG 0x603
+#define CSR_HGATP 0x680
+
#define CSR_CYCLEH 0xc80
#define CSR_TIMEH 0xc81
#define CSR_INSTRETH 0xc82
--
2.17.1

2019-07-29 11:57:46

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 03/16] RISC-V: Add initial skeletal KVM support

This patch adds initial skeletal KVM RISC-V support which has:
1. A simple implementation of arch specific VM functions
except kvm_vm_ioctl_get_dirty_log() which will implemeted
in-future as part of stage2 page loging.
2. Stubs of required arch specific VCPU functions except
kvm_arch_vcpu_ioctl_run() which is semi-complete and
extended by subsequent patches.
3. Stubs for required arch specific stage2 MMU functions.

Signed-off-by: Anup Patel <[email protected]>
---
arch/riscv/Kconfig | 2 +
arch/riscv/Makefile | 2 +
arch/riscv/include/asm/kvm_host.h | 82 ++++++++
arch/riscv/include/uapi/asm/kvm.h | 47 +++++
arch/riscv/kvm/Kconfig | 33 ++++
arch/riscv/kvm/Makefile | 13 ++
arch/riscv/kvm/main.c | 60 ++++++
arch/riscv/kvm/mmu.c | 83 ++++++++
arch/riscv/kvm/vcpu.c | 305 ++++++++++++++++++++++++++++++
arch/riscv/kvm/vcpu_exit.c | 35 ++++
arch/riscv/kvm/vm.c | 101 ++++++++++
11 files changed, 763 insertions(+)
create mode 100644 arch/riscv/include/asm/kvm_host.h
create mode 100644 arch/riscv/include/uapi/asm/kvm.h
create mode 100644 arch/riscv/kvm/Kconfig
create mode 100644 arch/riscv/kvm/Makefile
create mode 100644 arch/riscv/kvm/main.c
create mode 100644 arch/riscv/kvm/mmu.c
create mode 100644 arch/riscv/kvm/vcpu.c
create mode 100644 arch/riscv/kvm/vcpu_exit.c
create mode 100644 arch/riscv/kvm/vm.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 59a4727ecd6c..906104b8dc74 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -289,3 +289,5 @@ menu "Power management options"
source "kernel/power/Kconfig"

endmenu
+
+source "arch/riscv/kvm/Kconfig"
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 7a117be8297c..9f4f418978b1 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -74,6 +74,8 @@ head-y := arch/riscv/kernel/head.o

core-y += arch/riscv/kernel/ arch/riscv/mm/ arch/riscv/net/

+core-$(CONFIG_KVM) += arch/riscv/kvm/
+
libs-y += arch/riscv/lib/

PHONY += vdso_install
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
new file mode 100644
index 000000000000..81acfb307d5c
--- /dev/null
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Anup Patel <[email protected]>
+ */
+
+#ifndef __RISCV_KVM_HOST_H__
+#define __RISCV_KVM_HOST_H__
+
+#include <linux/types.h>
+#include <linux/kvm.h>
+#include <linux/kvm_types.h>
+
+#ifdef CONFIG_64BIT
+#define KVM_MAX_VCPUS (1U << 16)
+#else
+#define KVM_MAX_VCPUS (1U << 9)
+#endif
+
+#define KVM_USER_MEM_SLOTS 512
+#define KVM_HALT_POLL_NS_DEFAULT 500000
+
+#define KVM_VCPU_MAX_FEATURES 0
+
+#define KVM_REQ_SLEEP \
+ KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_IRQ_PENDING KVM_ARCH_REQ(1)
+#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2)
+
+struct kvm_vm_stat {
+ ulong remote_tlb_flush;
+};
+
+struct kvm_vcpu_stat {
+ u64 halt_successful_poll;
+ u64 halt_attempted_poll;
+ u64 halt_poll_invalid;
+ u64 halt_wakeup;
+ u64 ecall_exit_stat;
+ u64 wfi_exit_stat;
+ u64 mmio_exit_user;
+ u64 mmio_exit_kernel;
+ u64 exits;
+};
+
+struct kvm_arch_memory_slot {
+};
+
+struct kvm_arch {
+ /* stage2 page table */
+ pgd_t *pgd;
+ phys_addr_t pgd_phys;
+};
+
+struct kvm_vcpu_arch {
+ /* Don't run the VCPU (blocked) */
+ bool pause;
+};
+
+static inline void kvm_arch_hardware_unsetup(void) {}
+static inline void kvm_arch_sync_events(struct kvm *kvm) {}
+static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
+
+void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
+int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
+void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
+void kvm_riscv_stage2_update_pgtbl(struct kvm_vcpu *vcpu);
+
+int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
+ unsigned long scause, unsigned long stval);
+
+static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {}
+
+void kvm_riscv_halt_guest(struct kvm *kvm);
+void kvm_riscv_resume_guest(struct kvm *kvm);
+
+#endif /* __RISCV_KVM_HOST_H__ */
diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
new file mode 100644
index 000000000000..d15875818b6e
--- /dev/null
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Anup Patel <[email protected]>
+ */
+
+#ifndef __LINUX_KVM_RISCV_H
+#define __LINUX_KVM_RISCV_H
+
+#ifndef __ASSEMBLY__
+
+#include <linux/types.h>
+#include <asm/ptrace.h>
+
+#define __KVM_HAVE_READONLY_MEM
+
+#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+
+/* for KVM_GET_REGS and KVM_SET_REGS */
+struct kvm_regs {
+};
+
+/* for KVM_GET_FPU and KVM_SET_FPU */
+struct kvm_fpu {
+};
+
+/* KVM Debug exit structure */
+struct kvm_debug_exit_arch {
+};
+
+/* for KVM_SET_GUEST_DEBUG */
+struct kvm_guest_debug_arch {
+};
+
+/* definition of registers in kvm_run */
+struct kvm_sync_regs {
+};
+
+/* dummy definition */
+struct kvm_sregs {
+};
+
+#endif
+
+#endif /* __LINUX_KVM_RISCV_H */
diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig
new file mode 100644
index 000000000000..35fd30d0e432
--- /dev/null
+++ b/arch/riscv/kvm/Kconfig
@@ -0,0 +1,33 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# KVM configuration
+#
+
+source "virt/kvm/Kconfig"
+
+menuconfig VIRTUALIZATION
+ bool "Virtualization"
+ help
+ Say Y here to get to see options for using your Linux host to run
+ other operating systems inside virtual machines (guests).
+ This option alone does not add any kernel code.
+
+ If you say N, all options in this submenu will be skipped and
+ disabled.
+
+if VIRTUALIZATION
+
+config KVM
+ tristate "Kernel-based Virtual Machine (KVM) support"
+ depends on OF
+ select PREEMPT_NOTIFIERS
+ select ANON_INODES
+ select KVM_MMIO
+ select HAVE_KVM_VCPU_ASYNC_IOCTL
+ select SRCU
+ help
+ Support hosting virtualized guest machines.
+
+ If unsure, say N.
+
+endif # VIRTUALIZATION
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
new file mode 100644
index 000000000000..37b5a59d4f4f
--- /dev/null
+++ b/arch/riscv/kvm/Makefile
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for RISC-V KVM support
+#
+
+common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
+
+ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
+
+kvm-objs := $(common-objs-y)
+
+kvm-objs += main.o vm.o mmu.o vcpu.o vcpu_exit.o
+
+obj-$(CONFIG_KVM) += kvm.o
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
new file mode 100644
index 000000000000..8cac0571a264
--- /dev/null
+++ b/arch/riscv/kvm/main.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Anup Patel <[email protected]>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/kvm_host.h>
+#include <asm/hwcap.h>
+
+long kvm_arch_dev_ioctl(struct file *filp,
+ unsigned int ioctl, unsigned long arg)
+{
+ return -EINVAL;
+}
+
+int kvm_arch_check_processor_compat(void)
+{
+ return 0;
+}
+
+int kvm_arch_hardware_setup(void)
+{
+ return 0;
+}
+
+int kvm_arch_hardware_enable(void)
+{
+ return 0;
+}
+
+void kvm_arch_hardware_disable(void)
+{
+}
+
+int kvm_arch_init(void *opaque)
+{
+ if (!riscv_isa_extension_available(H)) {
+ kvm_info("hypervisor extension not available\n");
+ return -ENODEV;
+ }
+
+ kvm_info("hypervisor extension available\n");
+
+ return 0;
+}
+
+void kvm_arch_exit(void)
+{
+}
+
+static int riscv_kvm_init(void)
+{
+ return kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+}
+module_init(riscv_kvm_init);
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
new file mode 100644
index 000000000000..cead012a8399
--- /dev/null
+++ b/arch/riscv/kvm/mmu.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Anup Patel <[email protected]>
+ */
+
+#include <linux/bitops.h>
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/hugetlb.h>
+#include <linux/module.h>
+#include <linux/uaccess.h>
+#include <linux/vmalloc.h>
+#include <linux/kvm_host.h>
+#include <linux/sched/signal.h>
+#include <asm/page.h>
+#include <asm/pgtable.h>
+
+void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
+ struct kvm_memory_slot *dont)
+{
+}
+
+int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
+ unsigned long npages)
+{
+ return 0;
+}
+
+void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
+{
+}
+
+void kvm_arch_flush_shadow_all(struct kvm *kvm)
+{
+ /* TODO: */
+}
+
+void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
+ struct kvm_memory_slot *slot)
+{
+}
+
+void kvm_arch_commit_memory_region(struct kvm *kvm,
+ const struct kvm_userspace_memory_region *mem,
+ const struct kvm_memory_slot *old,
+ const struct kvm_memory_slot *new,
+ enum kvm_mr_change change)
+{
+ /* TODO: */
+}
+
+int kvm_arch_prepare_memory_region(struct kvm *kvm,
+ struct kvm_memory_slot *memslot,
+ const struct kvm_userspace_memory_region *mem,
+ enum kvm_mr_change change)
+{
+ /* TODO: */
+ return 0;
+}
+
+void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu)
+{
+ /* TODO: */
+}
+
+int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm)
+{
+ /* TODO: */
+ return 0;
+}
+
+void kvm_riscv_stage2_free_pgd(struct kvm *kvm)
+{
+ /* TODO: */
+}
+
+void kvm_riscv_stage2_update_pgtbl(struct kvm_vcpu *vcpu)
+{
+ /* TODO: */
+}
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
new file mode 100644
index 000000000000..9fea9128d964
--- /dev/null
+++ b/arch/riscv/kvm/vcpu.c
@@ -0,0 +1,305 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Anup Patel <[email protected]>
+ */
+
+#include <linux/bitops.h>
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kdebug.h>
+#include <linux/module.h>
+#include <linux/uaccess.h>
+#include <linux/vmalloc.h>
+#include <linux/sched/signal.h>
+#include <linux/fs.h>
+#include <linux/kvm_host.h>
+#include <asm/csr.h>
+#include <asm/delay.h>
+#include <asm/hwcap.h>
+
+#define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
+
+struct kvm_stats_debugfs_item debugfs_entries[] = {
+ VCPU_STAT(ecall_exit_stat),
+ VCPU_STAT(wfi_exit_stat),
+ VCPU_STAT(mmio_exit_user),
+ VCPU_STAT(mmio_exit_kernel),
+ VCPU_STAT(exits),
+ { NULL }
+};
+
+struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
+{
+ /* TODO: */
+ return NULL;
+}
+
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+ return 0;
+}
+
+void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
+{
+}
+
+int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
+{
+ /* TODO: */
+ return 0;
+}
+
+void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
+{
+ /* TODO: */
+}
+
+int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
+{
+ /* TODO: */
+ return 0;
+}
+
+void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+}
+
+int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
+{
+ /* TODO: */
+ return 0;
+}
+
+int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
+{
+ /* TODO: */
+ return 0;
+}
+
+bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
+{
+ /* TODO: */
+ return false;
+}
+
+bool kvm_arch_has_vcpu_debugfs(void)
+{
+ return false;
+}
+
+int kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
+{
+ return 0;
+}
+
+vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
+{
+ return VM_FAULT_SIGBUS;
+}
+
+long kvm_arch_vcpu_async_ioctl(struct file *filp,
+ unsigned int ioctl, unsigned long arg)
+{
+ /* TODO; */
+ return -ENOIOCTLCMD;
+}
+
+long kvm_arch_vcpu_ioctl(struct file *filp,
+ unsigned int ioctl, unsigned long arg)
+{
+ /* TODO: */
+ return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+ struct kvm_sregs *sregs)
+{
+ return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+ struct kvm_sregs *sregs)
+{
+ return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+ return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+ return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
+ struct kvm_translation *tr)
+{
+ return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+ return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+ return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
+ struct kvm_mp_state *mp_state)
+{
+ /* TODO: */
+ return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
+ struct kvm_mp_state *mp_state)
+{
+ /* TODO: */
+ return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
+ struct kvm_guest_debug *dbg)
+{
+ /* TODO; To be implemented later. */
+ return -EINVAL;
+}
+
+void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+ /* TODO: */
+
+ kvm_riscv_stage2_update_pgtbl(vcpu);
+}
+
+void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
+{
+ /* TODO: */
+}
+
+static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
+{
+ if (kvm_request_pending(vcpu)) {
+ /* TODO: */
+
+ /*
+ * Clear IRQ_PENDING requests that were made to guarantee
+ * that a VCPU sees new virtual interrupts.
+ */
+ kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu);
+ }
+}
+
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+ int ret;
+ unsigned long scause, stval;
+
+ /* Process MMIO value returned from user-space */
+ if (run->exit_reason == KVM_EXIT_MMIO) {
+ ret = kvm_riscv_vcpu_mmio_return(vcpu, vcpu->run);
+ if (ret)
+ return ret;
+ }
+
+ if (run->immediate_exit)
+ return -EINTR;
+
+ vcpu_load(vcpu);
+
+ kvm_sigset_activate(vcpu);
+
+ ret = 1;
+ run->exit_reason = KVM_EXIT_UNKNOWN;
+ while (ret > 0) {
+ /* Check conditions before entering the guest */
+ cond_resched();
+
+ kvm_riscv_check_vcpu_requests(vcpu);
+
+ preempt_disable();
+
+ local_irq_disable();
+
+ /*
+ * Exit if we have a signal pending so that we can deliver
+ * the signal to user space.
+ */
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ run->exit_reason = KVM_EXIT_INTR;
+ }
+
+ /*
+ * Ensure we set mode to IN_GUEST_MODE after we disable
+ * interrupts and before the final VCPU requests check.
+ * See the comment in kvm_vcpu_exiting_guest_mode() and
+ * Documentation/virtual/kvm/vcpu-requests.rst
+ */
+ smp_store_mb(vcpu->mode, IN_GUEST_MODE);
+
+ if (ret <= 0 ||
+ kvm_request_pending(vcpu)) {
+ vcpu->mode = OUTSIDE_GUEST_MODE;
+ local_irq_enable();
+ preempt_enable();
+ continue;
+ }
+
+ guest_enter_irqoff();
+
+ __kvm_riscv_switch_to(&vcpu->arch);
+
+ vcpu->mode = OUTSIDE_GUEST_MODE;
+ vcpu->stat.exits++;
+
+ /* Save SCAUSE and STVAL because we might get an interrupt
+ * between __kvm_riscv_switch_to() and local_irq_enable()
+ * which can potentially overwrite SCAUSE and STVAL.
+ */
+ scause = csr_read(CSR_SCAUSE);
+ stval = csr_read(CSR_STVAL);
+
+ /*
+ * We may have taken a host interrupt in VS/VU-mode (i.e.
+ * while executing the guest). This interrupt is still
+ * pending, as we haven't serviced it yet!
+ *
+ * We're now back in HS-mode with interrupts disabled
+ * so enabling the interrupts now will have the effect
+ * of taking the interrupt again, in HS-mode this time.
+ */
+ local_irq_enable();
+
+ /*
+ * We do local_irq_enable() before calling guest_exit() so
+ * that if a timer interrupt hits while running the guest
+ * we account that tick as being spent in the guest. We
+ * enable preemption after calling guest_exit() so that if
+ * we get preempted we make sure ticks after that is not
+ * counted as guest time.
+ */
+ guest_exit();
+
+ preempt_enable();
+
+ ret = kvm_riscv_vcpu_exit(vcpu, run, scause, stval);
+ }
+
+ kvm_sigset_deactivate(vcpu);
+
+ vcpu_put(vcpu);
+ return ret;
+}
diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
new file mode 100644
index 000000000000..e4d7c8f0807a
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_exit.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Anup Patel <[email protected]>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+
+/**
+ * kvm_riscv_vcpu_mmio_return -- Handle MMIO loads after user space emulation
+ * or in-kernel IO emulation
+ *
+ * @vcpu: The VCPU pointer
+ * @run: The VCPU run struct containing the mmio data
+ */
+int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+ /* TODO: */
+ return 0;
+}
+
+/*
+ * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
+ * proper exit to userspace.
+ */
+int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
+ unsigned long scause, unsigned long stval)
+{
+ /* TODO: */
+ return 0;
+}
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
new file mode 100644
index 000000000000..66904def2f93
--- /dev/null
+++ b/arch/riscv/kvm/vm.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Anup Patel <[email protected]>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/uaccess.h>
+#include <linux/kvm_host.h>
+
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+ /* TODO: To be added later. */
+ return -ENOTSUPP;
+}
+
+int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
+{
+ int r;
+
+ r = kvm_riscv_stage2_alloc_pgd(kvm);
+ if (r)
+ return r;
+
+ return 0;
+}
+
+void kvm_arch_destroy_vm(struct kvm *kvm)
+{
+ int i;
+
+ for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+ if (kvm->vcpus[i]) {
+ kvm_arch_vcpu_destroy(kvm->vcpus[i]);
+ kvm->vcpus[i] = NULL;
+ }
+ }
+}
+
+int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
+{
+ int r;
+
+ switch (ext) {
+ case KVM_CAP_DEVICE_CTRL:
+ case KVM_CAP_USER_MEMORY:
+ case KVM_CAP_SYNC_MMU:
+ case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
+ case KVM_CAP_ONE_REG:
+ case KVM_CAP_READONLY_MEM:
+ case KVM_CAP_MP_STATE:
+ case KVM_CAP_IMMEDIATE_EXIT:
+ r = 1;
+ break;
+ case KVM_CAP_NR_VCPUS:
+ r = num_online_cpus();
+ break;
+ case KVM_CAP_MAX_VCPUS:
+ r = KVM_MAX_VCPUS;
+ break;
+ case KVM_CAP_NR_MEMSLOTS:
+ r = KVM_USER_MEM_SLOTS;
+ break;
+ default:
+ r = 0;
+ break;
+ }
+
+ return r;
+}
+
+long kvm_arch_vm_ioctl(struct file *filp,
+ unsigned int ioctl, unsigned long arg)
+{
+ return -EINVAL;
+}
+
+void kvm_riscv_halt_guest(struct kvm *kvm)
+{
+ int i;
+ struct kvm_vcpu *vcpu;
+
+ kvm_for_each_vcpu(i, vcpu, kvm)
+ vcpu->arch.pause = true;
+ kvm_make_all_cpus_request(kvm, KVM_REQ_SLEEP);
+}
+
+void kvm_riscv_resume_guest(struct kvm *kvm)
+{
+ int i;
+ struct kvm_vcpu *vcpu;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ vcpu->arch.pause = false;
+ swake_up_one(kvm_arch_vcpu_wq(vcpu));
+ }
+}
--
2.17.1

2019-07-29 11:58:15

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 05/16] RISC-V: KVM: Implement VCPU interrupts and requests handling

This patch implements VCPU interrupts and requests which are both
asynchronous events.

The VCPU interrupts can be set/unset using KVM_INTERRUPT ioctl from
user-space. In future, the in-kernel IRQCHIP emulation will use
kvm_riscv_vcpu_set_interrupt() and kvm_riscv_vcpu_unset_interrupt()
functions to set/unset VCPU interrupts.

Important VCPU requests implemented by this patch are:
KVM_REQ_IRQ_PENDING - set whenever some VCPU interrupt pending
KVM_REQ_SLEEP - set whenever VCPU itself goes to sleep state
KVM_REQ_VCPU_RESET - set whenever VCPU reset is requested

The WFI trap-n-emulate (added later) will use KVM_REQ_SLEEP request
and kvm_riscv_vcpu_has_interrupt() function.

The KVM_REQ_VCPU_RESET request will be used by SBI emulation (added
later) to power-up a VCPU in power-off state. The user-space can use
the GET_MPSTATE/SET_MPSTATE ioctls to get/set power state of a VCPU.

Signed-off-by: Anup Patel <[email protected]>
---
arch/riscv/include/asm/kvm_host.h | 13 +++
arch/riscv/include/uapi/asm/kvm.h | 3 +
arch/riscv/kvm/vcpu.c | 174 +++++++++++++++++++++++++++---
3 files changed, 177 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 244eabe62710..aa89f1922da1 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -125,6 +125,13 @@ struct kvm_vcpu_arch {
/* CPU CSR context upon Guest VCPU reset */
struct kvm_vcpu_csr guest_reset_csr;

+ /* VCPU interrupts */
+ raw_spinlock_t irqs_lock;
+ unsigned long irqs_pending;
+
+ /* VCPU power-off state */
+ bool power_off;
+
/* Don't run the VCPU (blocked) */
bool pause;
};
@@ -146,6 +153,12 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,

static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {}

+int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
+int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
+bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
+
void kvm_riscv_halt_guest(struct kvm *kvm);
void kvm_riscv_resume_guest(struct kvm *kvm);

diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index d15875818b6e..6dbc056d58ba 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -18,6 +18,9 @@

#define KVM_COALESCED_MMIO_PAGE_OFFSET 1

+#define KVM_INTERRUPT_SET -1U
+#define KVM_INTERRUPT_UNSET -2U
+
/* for KVM_GET_REGS and KVM_SET_REGS */
struct kvm_regs {
};
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 1ae806f28c0e..c6f57caa95f0 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -42,6 +42,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {

static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
{
+ unsigned long f;
struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
struct kvm_vcpu_csr *reset_csr = &vcpu->arch.guest_reset_csr;
struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
@@ -50,6 +51,10 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
memcpy(csr, reset_csr, sizeof(*csr));

memcpy(cntx, reset_cntx, sizeof(*cntx));
+
+ raw_spin_lock_irqsave(&vcpu->arch.irqs_lock, f);
+ vcpu->arch.irqs_pending = 0;
+ raw_spin_unlock_irqrestore(&vcpu->arch.irqs_lock, f);
}

struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
@@ -103,6 +108,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
cntx->hstatus |= HSTATUS_SP2P;
cntx->hstatus |= HSTATUS_SPV;

+ /* Setup VCPU irqs lock */
+ raw_spin_lock_init(&vcpu->arch.irqs_lock);
+
/* Setup reset state of HEDELEG and HIDELEG CSRs */
csr = &vcpu->arch.guest_reset_csr;
csr->hedeleg = 0;
@@ -131,8 +139,15 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)

int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
{
- /* TODO: */
- return 0;
+ int ret;
+ unsigned long f, irqs;
+
+ raw_spin_lock_irqsave(&vcpu->arch.irqs_lock, f);
+ irqs = vcpu->arch.irqs_pending & vcpu->arch.guest_csr.vsie;
+ ret = (irqs & (1UL << IRQ_S_TIMER)) ? 1 : 0;
+ raw_spin_unlock_irqrestore(&vcpu->arch.irqs_lock, f);
+
+ return ret;
}

void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
@@ -145,20 +160,18 @@ void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)

int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
{
- /* TODO: */
- return 0;
+ return (kvm_riscv_vcpu_has_interrupt(vcpu) &&
+ !vcpu->arch.power_off && !vcpu->arch.pause);
}

int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
{
- /* TODO: */
- return 0;
+ return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
}

bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
{
- /* TODO: */
- return false;
+ return (vcpu->arch.guest_context.sstatus & SR_SPP) ? true : false;
}

bool kvm_arch_has_vcpu_debugfs(void)
@@ -179,7 +192,21 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
long kvm_arch_vcpu_async_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
- /* TODO; */
+ struct kvm_vcpu *vcpu = filp->private_data;
+ void __user *argp = (void __user *)arg;
+
+ if (ioctl == KVM_INTERRUPT) {
+ struct kvm_interrupt irq;
+
+ if (copy_from_user(&irq, argp, sizeof(irq)))
+ return -EFAULT;
+
+ if (irq.irq == KVM_INTERRUPT_SET)
+ return kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_EXT);
+ else
+ return kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_EXT);
+ }
+
return -ENOIOCTLCMD;
}

@@ -228,18 +255,113 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
return -EINVAL;
}

+static void kvm_riscv_vcpu_flush_interrupts(struct kvm_vcpu *vcpu)
+{
+ unsigned long f;
+
+ raw_spin_lock_irqsave(&vcpu->arch.irqs_lock, f);
+ if (vcpu->arch.irqs_pending ^ vcpu->arch.guest_csr.vsip) {
+ csr_write(CSR_VSIP, vcpu->arch.irqs_pending);
+ vcpu->arch.guest_csr.vsip = vcpu->arch.irqs_pending;
+ }
+ raw_spin_unlock_irqrestore(&vcpu->arch.irqs_lock, f);
+}
+
+static void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu)
+{
+ vcpu->arch.guest_csr.vsip = csr_read(CSR_VSIP);
+ vcpu->arch.guest_csr.vsie = csr_read(CSR_VSIE);
+}
+
+int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
+{
+ unsigned long f;
+
+ if (irq != IRQ_S_SOFT &&
+ irq != IRQ_S_TIMER &&
+ irq != IRQ_S_EXT)
+ return -EINVAL;
+
+ raw_spin_lock_irqsave(&vcpu->arch.irqs_lock, f);
+ vcpu->arch.irqs_pending |= (1UL << irq);
+ raw_spin_unlock_irqrestore(&vcpu->arch.irqs_lock, f);
+
+ kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
+ kvm_vcpu_kick(vcpu);
+
+ return 0;
+}
+
+int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
+{
+ unsigned long f;
+
+ if (irq != IRQ_S_SOFT &&
+ irq != IRQ_S_TIMER &&
+ irq != IRQ_S_EXT)
+ return -EINVAL;
+
+ raw_spin_lock_irqsave(&vcpu->arch.irqs_lock, f);
+ vcpu->arch.irqs_pending &= ~(1UL << irq);
+ raw_spin_unlock_irqrestore(&vcpu->arch.irqs_lock, f);
+
+ return 0;
+}
+
+bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu)
+{
+ bool ret = false;
+ unsigned long f;
+
+ raw_spin_lock_irqsave(&vcpu->arch.irqs_lock, f);
+ if (vcpu->arch.irqs_pending & vcpu->arch.guest_csr.vsie)
+ ret = true;
+ raw_spin_unlock_irqrestore(&vcpu->arch.irqs_lock, f);
+
+ return ret;
+}
+
+void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu)
+{
+ vcpu->arch.power_off = true;
+ kvm_make_request(KVM_REQ_SLEEP, vcpu);
+ kvm_vcpu_kick(vcpu);
+}
+
+void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu)
+{
+ vcpu->arch.power_off = false;
+ kvm_vcpu_wake_up(vcpu);
+}
+
int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
{
- /* TODO: */
+ if (vcpu->arch.power_off)
+ mp_state->mp_state = KVM_MP_STATE_STOPPED;
+ else
+ mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
+
return 0;
}

int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
{
- /* TODO: */
- return 0;
+ int ret = 0;
+
+ switch (mp_state->mp_state) {
+ case KVM_MP_STATE_RUNNABLE:
+ vcpu->arch.power_off = false;
+ break;
+ case KVM_MP_STATE_STOPPED:
+ kvm_riscv_vcpu_power_off(vcpu);
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ return ret;
}

int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
@@ -263,8 +385,25 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)

static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
{
+ struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
+
if (kvm_request_pending(vcpu)) {
- /* TODO: */
+ if (kvm_check_request(KVM_REQ_SLEEP, vcpu)) {
+ swait_event_interruptible_exclusive(*wq,
+ ((!vcpu->arch.power_off) &&
+ (!vcpu->arch.pause)));
+
+ if (vcpu->arch.power_off || vcpu->arch.pause) {
+ /*
+ * Awaken to handle a signal, request to
+ * sleep again later.
+ */
+ kvm_make_request(KVM_REQ_SLEEP, vcpu);
+ }
+ }
+
+ if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu))
+ kvm_riscv_reset_vcpu(vcpu);

/*
* Clear IRQ_PENDING requests that were made to guarantee
@@ -317,6 +456,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
run->exit_reason = KVM_EXIT_INTR;
}

+ /*
+ * We might have got VCPU interrupts updated asynchronously
+ * so update it in HW.
+ */
+ kvm_riscv_vcpu_flush_interrupts(vcpu);
+
/*
* Ensure we set mode to IN_GUEST_MODE after we disable
* interrupts and before the final VCPU requests check.
@@ -347,6 +492,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
scause = csr_read(CSR_SCAUSE);
stval = csr_read(CSR_STVAL);

+ /* Syncup interrupts state with HW */
+ kvm_riscv_vcpu_sync_interrupts(vcpu);
+
/*
* We may have taken a host interrupt in VS/VU-mode (i.e.
* while executing the guest). This interrupt is still
--
2.17.1

2019-07-29 11:58:24

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 06/16] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls

For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access
VCPU config and registers from user-space.

We have two types of VCPU registers:
1. CONFIG - these are VCPU config and capabilities
2. CORE - these are VCPU general purpose registers

The CONFIG registers available to user-space are ISA and TIMEBASE. Out
of these, TIMEBASE is a read-only register which inform user-space about
VCPU timer base frequency. The ISA register is a read and write register
where user-space can only write the desired VCPU ISA capabilities before
running the VCPU.

The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7,
T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except
PC and MODE. The PC register represents program counter whereas the MODE
register represent VCPU privilege mode (i.e. S/U-mode).

In future, more VCPU register types will be added such as FP, CSRs, etc
for KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls.

Signed-off-by: Anup Patel <[email protected]>
---
arch/riscv/include/uapi/asm/kvm.h | 24 ++++
arch/riscv/kvm/vcpu.c | 177 +++++++++++++++++++++++++++++-
2 files changed, 199 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index 6dbc056d58ba..6c28a1b6e9be 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -23,8 +23,15 @@

/* for KVM_GET_REGS and KVM_SET_REGS */
struct kvm_regs {
+ /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
+ struct user_regs_struct regs;
+ unsigned long mode;
};

+/* Possible privilege modes for kvm_regs */
+#define KVM_RISCV_MODE_S 1
+#define KVM_RISCV_MODE_U 0
+
/* for KVM_GET_FPU and KVM_SET_FPU */
struct kvm_fpu {
};
@@ -45,6 +52,23 @@ struct kvm_sync_regs {
struct kvm_sregs {
};

+#define KVM_REG_SIZE(id) \
+ (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
+
+/* If you need to interpret the index values, here is the key: */
+#define KVM_REG_RISCV_TYPE_MASK 0x00000000FF000000
+#define KVM_REG_RISCV_TYPE_SHIFT 24
+
+/* Config registers are mapped as type 1 */
+#define KVM_REG_RISCV_CONFIG (0x01 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_CONFIG_ISA 0x0
+#define KVM_REG_RISCV_CONFIG_TIMEBASE 0x1
+
+/* Core registers are mapped as type 2 */
+#define KVM_REG_RISCV_CORE (0x02 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_CORE_REG(name) \
+ (offsetof(struct kvm_regs, name) / sizeof(unsigned long))
+
#endif

#endif /* __LINUX_KVM_RISCV_H */
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index c6f57caa95f0..37368eeb6c41 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -189,6 +189,157 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
return VM_FAULT_SIGBUS;
}

+static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu,
+ const struct kvm_one_reg *reg)
+{
+ unsigned long __user *uaddr =
+ (unsigned long __user *)(unsigned long)reg->addr;
+ unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+ KVM_REG_SIZE_MASK |
+ KVM_REG_RISCV_CONFIG);
+ unsigned long reg_val;
+
+ if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
+ return -EINVAL;
+
+ switch (reg_num) {
+ case KVM_REG_RISCV_CONFIG_ISA:
+ reg_val = vcpu->arch.isa;
+ break;
+ case KVM_REG_RISCV_CONFIG_TIMEBASE:
+ reg_val = riscv_timebase;
+ break;
+ default:
+ return -EINVAL;
+ };
+
+ if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int kvm_riscv_vcpu_set_reg_config(struct kvm_vcpu *vcpu,
+ const struct kvm_one_reg *reg)
+{
+ unsigned long __user *uaddr =
+ (unsigned long __user *)(unsigned long)reg->addr;
+ unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+ KVM_REG_SIZE_MASK |
+ KVM_REG_RISCV_CONFIG);
+ unsigned long reg_val;
+
+ if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
+ return -EINVAL;
+
+ if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
+ return -EFAULT;
+
+ switch (reg_num) {
+ case KVM_REG_RISCV_CONFIG_ISA:
+ if (!vcpu->arch.ran_atleast_once) {
+ vcpu->arch.isa = reg_val;
+ vcpu->arch.isa &= riscv_isa;
+ vcpu->arch.isa &= KVM_RISCV_ISA_ALLOWED;
+ } else {
+ return -ENOTSUPP;
+ }
+ break;
+ case KVM_REG_RISCV_CONFIG_TIMEBASE:
+ return -ENOTSUPP;
+ default:
+ return -EINVAL;
+ };
+
+ return 0;
+}
+
+static int kvm_riscv_vcpu_get_reg_core(struct kvm_vcpu *vcpu,
+ const struct kvm_one_reg *reg)
+{
+ struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
+ unsigned long __user *uaddr =
+ (unsigned long __user *)(unsigned long)reg->addr;
+ unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+ KVM_REG_SIZE_MASK |
+ KVM_REG_RISCV_CORE);
+ unsigned long reg_val;
+
+ if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
+ return -EINVAL;
+
+ if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc))
+ reg_val = cntx->sepc;
+ else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num &&
+ reg_num <= KVM_REG_RISCV_CORE_REG(regs.t6))
+ reg_val = ((unsigned long *)cntx)[reg_num];
+ else if (reg_num == KVM_REG_RISCV_CORE_REG(mode))
+ reg_val = (cntx->sstatus & SR_SPP) ?
+ KVM_RISCV_MODE_S : KVM_RISCV_MODE_U;
+ else
+ return -EINVAL;
+
+ if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int kvm_riscv_vcpu_set_reg_core(struct kvm_vcpu *vcpu,
+ const struct kvm_one_reg *reg)
+{
+ struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
+ unsigned long __user *uaddr =
+ (unsigned long __user *)(unsigned long)reg->addr;
+ unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+ KVM_REG_SIZE_MASK |
+ KVM_REG_RISCV_CORE);
+ unsigned long reg_val;
+
+ if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
+ return -EINVAL;
+
+ if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
+ return -EFAULT;
+
+ if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc))
+ cntx->sepc = reg_val;
+ else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num &&
+ reg_num <= KVM_REG_RISCV_CORE_REG(regs.t6))
+ ((unsigned long *)cntx)[reg_num] = reg_val;
+ else if (reg_num == KVM_REG_RISCV_CORE_REG(mode)) {
+ if (reg_val == KVM_RISCV_MODE_S)
+ cntx->sstatus |= SR_SPP;
+ else
+ cntx->sstatus &= ~SR_SPP;
+ } else
+ return -EINVAL;
+
+ return 0;
+}
+
+static int kvm_riscv_vcpu_set_reg(struct kvm_vcpu *vcpu,
+ const struct kvm_one_reg *reg)
+{
+ if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CONFIG)
+ return kvm_riscv_vcpu_set_reg_config(vcpu, reg);
+ else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CORE)
+ return kvm_riscv_vcpu_set_reg_core(vcpu, reg);
+
+ return -EINVAL;
+}
+
+static int kvm_riscv_vcpu_get_reg(struct kvm_vcpu *vcpu,
+ const struct kvm_one_reg *reg)
+{
+ if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CONFIG)
+ return kvm_riscv_vcpu_get_reg_config(vcpu, reg);
+ else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CORE)
+ return kvm_riscv_vcpu_get_reg_core(vcpu, reg);
+
+ return -EINVAL;
+}
+
long kvm_arch_vcpu_async_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -213,8 +364,30 @@ long kvm_arch_vcpu_async_ioctl(struct file *filp,
long kvm_arch_vcpu_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
- /* TODO: */
- return -EINVAL;
+ struct kvm_vcpu *vcpu = filp->private_data;
+ void __user *argp = (void __user *)arg;
+ long r = -EINVAL;
+
+ switch (ioctl) {
+ case KVM_SET_ONE_REG:
+ case KVM_GET_ONE_REG: {
+ struct kvm_one_reg reg;
+
+ r = -EFAULT;
+ if (copy_from_user(&reg, argp, sizeof(reg)))
+ break;
+
+ if (ioctl == KVM_SET_ONE_REG)
+ r = kvm_riscv_vcpu_set_reg(vcpu, &reg);
+ else
+ r = kvm_riscv_vcpu_get_reg(vcpu, &reg);
+ break;
+ }
+ default:
+ break;
+ }
+
+ return r;
}

int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
--
2.17.1

2019-07-29 11:58:38

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 10/16] RISC-V: KVM: Implement VMID allocator

We implement a simple VMID allocator for Guests/VMs which:
1. Detects number of VMID bits at boot-time
2. Uses atomic number to track VMID version and increments
VMID version whenever we run-out of VMIDs
3. Flushes Guest TLBs on all host CPUs whenever we run-out
of VMIDs
4. Force updates HW Stage2 VMID for each Guest VCPU whenever
VMID changes using VCPU request KVM_REQ_UPDATE_PGTBL

Signed-off-by: Anup Patel <[email protected]>
---
arch/riscv/include/asm/kvm_host.h | 21 +++++
arch/riscv/kvm/Makefile | 3 +-
arch/riscv/kvm/main.c | 4 +
arch/riscv/kvm/tlb.S | 42 ++++++++++
arch/riscv/kvm/vcpu.c | 6 ++
arch/riscv/kvm/vm.c | 6 ++
arch/riscv/kvm/vmid.c | 130 ++++++++++++++++++++++++++++++
7 files changed, 211 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/kvm/tlb.S
create mode 100644 arch/riscv/kvm/vmid.c

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 82e568ae0260..dcc31f9ca13d 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -28,6 +28,7 @@
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_IRQ_PENDING KVM_ARCH_REQ(1)
#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2)
+#define KVM_REQ_UPDATE_PGTBL KVM_ARCH_REQ(3)

struct kvm_vm_stat {
ulong remote_tlb_flush;
@@ -48,7 +49,15 @@ struct kvm_vcpu_stat {
struct kvm_arch_memory_slot {
};

+struct kvm_vmid {
+ unsigned long vmid_version;
+ unsigned long vmid;
+};
+
struct kvm_arch {
+ /* stage2 vmid */
+ struct kvm_vmid vmid;
+
/* stage2 page table */
pgd_t *pgd;
phys_addr_t pgd_phys;
@@ -158,6 +167,12 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}

+extern void __kvm_riscv_hfence_gvma_vmid_gpa(unsigned long vmid,
+ unsigned long gpa);
+extern void __kvm_riscv_hfence_gvma_vmid(unsigned long vmid);
+extern void __kvm_riscv_hfence_gvma_gpa(unsigned long gpa);
+extern void __kvm_riscv_hfence_gvma_all(void);
+
int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
bool is_write);
void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
@@ -165,6 +180,12 @@ int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
void kvm_riscv_stage2_update_pgtbl(struct kvm_vcpu *vcpu);

+void kvm_riscv_stage2_vmid_detect(void);
+unsigned long kvm_riscv_stage2_vmid_bits(void);
+int kvm_riscv_stage2_vmid_init(struct kvm *kvm);
+bool kvm_riscv_stage2_vmid_ver_changed(struct kvm_vmid *vmid);
+void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu);
+
int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
unsigned long scause, unsigned long stval);
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index 845579273727..c0f57f26c13d 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -8,6 +8,7 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm

kvm-objs := $(common-objs-y)

-kvm-objs += main.o vm.o mmu.o vcpu.o vcpu_exit.o vcpu_switch.o
+kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
+kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o

obj-$(CONFIG_KVM) += kvm.o
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index 8cac0571a264..c029686100e4 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -44,8 +44,12 @@ int kvm_arch_init(void *opaque)
return -ENODEV;
}

+ kvm_riscv_stage2_vmid_detect();
+
kvm_info("hypervisor extension available\n");

+ kvm_info("host has %ld VMID bits\n", kvm_riscv_stage2_vmid_bits());
+
return 0;
}

diff --git a/arch/riscv/kvm/tlb.S b/arch/riscv/kvm/tlb.S
new file mode 100644
index 000000000000..13740d8020f5
--- /dev/null
+++ b/arch/riscv/kvm/tlb.S
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Anup Patel <[email protected]>
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+
+ .text
+ .altmacro
+
+ /*
+ * Instruction encoding of hfence.gvma is:
+ * 0110001 rs2(5) rs1(5) 000 00000 1110011
+ */
+
+ENTRY(__kvm_riscv_hfence_gvma_vmid_gpa)
+ /* hfence.gvma a1, a0 */
+ .word 0x62a60073
+ ret
+ENDPROC(__kvm_riscv_hfence_gvma_vmid_gpa)
+
+ENTRY(__kvm_riscv_hfence_gvma_vmid)
+ /* hfence.gvma zero, a0 */
+ .word 0x62a00073
+ ret
+ENDPROC(__kvm_riscv_hfence_gvma_vmid)
+
+ENTRY(__kvm_riscv_hfence_gvma_gpa)
+ /* hfence.gvma a0 */
+ .word 0x62050073
+ ret
+ENDPROC(__kvm_riscv_hfence_gvma_gpa)
+
+ENTRY(__kvm_riscv_hfence_gvma_all)
+ /* hfence.gvma */
+ .word 0x62000073
+ ret
+ENDPROC(__kvm_riscv_hfence_gvma_all)
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 4ab9f803536e..f3b0cadc1973 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -607,6 +607,9 @@ static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu))
kvm_riscv_reset_vcpu(vcpu);

+ if (kvm_check_request(KVM_REQ_UPDATE_PGTBL, vcpu))
+ kvm_riscv_stage2_update_pgtbl(vcpu);
+
/*
* Clear IRQ_PENDING requests that were made to guarantee
* that a VCPU sees new virtual interrupts.
@@ -643,6 +646,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
/* Check conditions before entering the guest */
cond_resched();

+ kvm_riscv_stage2_vmid_update(vcpu);
+
kvm_riscv_check_vcpu_requests(vcpu);

preempt_disable();
@@ -673,6 +678,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
smp_store_mb(vcpu->mode, IN_GUEST_MODE);

if (ret <= 0 ||
+ kvm_riscv_stage2_vmid_ver_changed(&vcpu->kvm->arch.vmid) ||
kvm_request_pending(vcpu)) {
vcpu->mode = OUTSIDE_GUEST_MODE;
local_irq_enable();
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index 66904def2f93..4bc97ebc4b6e 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -26,6 +26,12 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (r)
return r;

+ r = kvm_riscv_stage2_vmid_init(kvm);
+ if (r) {
+ kvm_riscv_stage2_free_pgd(kvm);
+ return r;
+ }
+
return 0;
}

diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c
new file mode 100644
index 000000000000..a2b026fad1bd
--- /dev/null
+++ b/arch/riscv/kvm/vmid.c
@@ -0,0 +1,130 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Anup Patel <[email protected]>
+ */
+
+#include <linux/bitops.h>
+#include <linux/cpumask.h>
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/kvm_host.h>
+#include <asm/csr.h>
+
+static atomic_long_t vmid_version = ATOMIC_LONG_INIT(1);
+static unsigned long vmid_next;
+static unsigned long vmid_bits;
+static DEFINE_SPINLOCK(vmid_lock);
+
+void kvm_riscv_stage2_vmid_detect(void)
+{
+ unsigned long old;
+
+ /* Figure-out number of VMID bits in HW */
+ old = csr_read(CSR_HGATP);
+ csr_write(CSR_HGATP, old | HGATP_VMID_MASK);
+ vmid_bits = csr_read(CSR_HGATP);
+ vmid_bits = (vmid_bits & HGATP_VMID_MASK) >> HGATP_VMID_SHIFT;
+ vmid_bits = fls_long(vmid_bits);
+ csr_write(CSR_HGATP, old);
+
+ /* We polluted local TLB so flush all guest TLB */
+ __kvm_riscv_hfence_gvma_all();
+
+ /* We don't use VMID bits if they are not sufficient */
+ if ((1UL << vmid_bits) < num_possible_cpus())
+ vmid_bits = 0;
+}
+
+unsigned long kvm_riscv_stage2_vmid_bits(void)
+{
+ return vmid_bits;
+}
+
+int kvm_riscv_stage2_vmid_init(struct kvm *kvm)
+{
+ /* Mark the initial VMID and VMID version invalid */
+ kvm->arch.vmid.vmid_version = 0;
+ kvm->arch.vmid.vmid = 0;
+
+ return 0;
+}
+
+static void local_guest_tlb_flush(void *info)
+{
+ __kvm_riscv_hfence_gvma_all();
+}
+
+static void force_exit_and_guest_tlb_flush(const cpumask_t *mask)
+{
+ preempt_disable();
+ smp_call_function_many(mask, local_guest_tlb_flush, NULL, true);
+ preempt_enable();
+}
+
+bool kvm_riscv_stage2_vmid_ver_changed(struct kvm_vmid *vmid)
+{
+ ulong cur_vmid_version;
+
+ if (!vmid_bits)
+ return false;
+
+ cur_vmid_version = atomic_long_read(&vmid_version);
+
+ /* Ensure atomic read to VMID version is completed */
+ smp_rmb();
+
+ return unlikely(READ_ONCE(vmid->vmid_version) != cur_vmid_version);
+}
+
+void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu)
+{
+ int i;
+ struct kvm_vcpu *v;
+ struct kvm_vmid *vmid = &vcpu->kvm->arch.vmid;
+
+ if (!kvm_riscv_stage2_vmid_ver_changed(vmid))
+ return;
+
+ spin_lock(&vmid_lock);
+
+ /*
+ * We need to re-check the vmid_version here to ensure that if
+ * another vcpu already allocated a valid vmid for this vm.
+ */
+ if (!kvm_riscv_stage2_vmid_ver_changed(vmid)) {
+ spin_unlock(&vmid_lock);
+ return;
+ }
+
+ /* First user of a new VMID version? */
+ if (unlikely(vmid_next == 0)) {
+ atomic_long_inc(&vmid_version);
+ vmid_next = 1;
+
+ /*
+ * On SMP we know no other CPUs can use this CPU's or
+ * each other's VMID after forced exit returns since the
+ * vmid_lock blocks them from re-entry to the guest.
+ */
+ force_exit_and_guest_tlb_flush(cpu_all_mask);
+ }
+
+ vmid->vmid = vmid_next;
+ vmid_next++;
+ vmid_next &= (1 << vmid_bits) - 1;
+
+ /* Ensure VMID next update is completed */
+ smp_wmb();
+
+ WRITE_ONCE(vmid->vmid_version, atomic_long_read(&vmid_version));
+
+ spin_unlock(&vmid_lock);
+
+ /* Request stage2 page table update for all VCPUs */
+ kvm_for_each_vcpu(i, v, vcpu->kvm)
+ kvm_make_request(KVM_REQ_UPDATE_PGTBL, v);
+}
--
2.17.1

2019-07-29 11:58:44

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 11/16] RISC-V: KVM: Implement stage2 page table programming

This patch implements all required functions for programming
the stage2 page table for each Guest/VM.

At high-level, the flow of stage2 related functions is similar
from KVM ARM/ARM64 implementation but the stage2 page table
format is quite different for KVM RISC-V.

Signed-off-by: Anup Patel <[email protected]>
---
arch/riscv/include/asm/kvm_host.h | 10 +
arch/riscv/include/asm/pgtable-bits.h | 1 +
arch/riscv/kvm/mmu.c | 636 +++++++++++++++++++++++++-
3 files changed, 637 insertions(+), 10 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index dcc31f9ca13d..354d179c43cf 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -69,6 +69,13 @@ struct kvm_mmio_decode {
int shift;
};

+#define KVM_MMU_PAGE_CACHE_NR_OBJS 32
+
+struct kvm_mmu_page_cache {
+ int nobjs;
+ void *objects[KVM_MMU_PAGE_CACHE_NR_OBJS];
+};
+
struct kvm_cpu_context {
unsigned long zero;
unsigned long ra;
@@ -154,6 +161,9 @@ struct kvm_vcpu_arch {
/* MMIO instruction details */
struct kvm_mmio_decode mmio_decode;

+ /* Cache pages needed to program page tables with spinlock held */
+ struct kvm_mmu_page_cache mmu_page_cache;
+
/* VCPU power-off state */
bool power_off;

diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
index bbaeb5d35842..be49d62fcc2b 100644
--- a/arch/riscv/include/asm/pgtable-bits.h
+++ b/arch/riscv/include/asm/pgtable-bits.h
@@ -26,6 +26,7 @@

#define _PAGE_SPECIAL _PAGE_SOFT
#define _PAGE_TABLE _PAGE_PRESENT
+#define _PAGE_LEAF (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)

/*
* _PAGE_PROT_NONE is set on not-present pages (and ignored by the hardware) to
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 963f3c373781..9561c5e85f75 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -18,6 +18,432 @@
#include <asm/page.h>
#include <asm/pgtable.h>

+#ifdef CONFIG_64BIT
+#define stage2_have_pmd true
+#define stage2_gpa_size ((phys_addr_t)(1ULL << 39))
+#define stage2_cache_min_pages 2
+#else
+#define pmd_index(x) 0
+#define pfn_pmd(x, y) ({ pmd_t __x = { 0 }; __x; })
+#define stage2_have_pmd false
+#define stage2_gpa_size ((phys_addr_t)(1ULL << 32))
+#define stage2_cache_min_pages 1
+#endif
+
+static int stage2_cache_topup(struct kvm_mmu_page_cache *pcache,
+ int min, int max)
+{
+ void *page;
+
+ BUG_ON(max > KVM_MMU_PAGE_CACHE_NR_OBJS);
+ if (pcache->nobjs >= min)
+ return 0;
+ while (pcache->nobjs < max) {
+ page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+ if (!page)
+ return -ENOMEM;
+ pcache->objects[pcache->nobjs++] = page;
+ }
+
+ return 0;
+}
+
+static void stage2_cache_flush(struct kvm_mmu_page_cache *pcache)
+{
+ while (pcache && pcache->nobjs)
+ free_page((unsigned long)pcache->objects[--pcache->nobjs]);
+}
+
+static void *stage2_cache_alloc(struct kvm_mmu_page_cache *pcache)
+{
+ void *p;
+
+ if (!pcache)
+ return NULL;
+
+ BUG_ON(!pcache->nobjs);
+ p = pcache->objects[--pcache->nobjs];
+
+ return p;
+}
+
+struct local_guest_tlb_info {
+ struct kvm_vmid *vmid;
+ gpa_t addr;
+};
+
+static void local_guest_tlb_flush_vmid_gpa(void *info)
+{
+ struct local_guest_tlb_info *infop = info;
+
+ __kvm_riscv_hfence_gvma_vmid_gpa(READ_ONCE(infop->vmid->vmid_version),
+ infop->addr);
+}
+
+static void stage2_remote_tlb_flush(struct kvm *kvm, gpa_t addr)
+{
+ struct local_guest_tlb_info info;
+ struct kvm_vmid *vmid = &kvm->arch.vmid;
+
+ /* TODO: This should be SBI call */
+ info.vmid = vmid;
+ info.addr = addr;
+ preempt_disable();
+ smp_call_function_many(cpu_all_mask, local_guest_tlb_flush_vmid_gpa,
+ &info, true);
+ preempt_enable();
+}
+
+static int stage2_set_pgd(struct kvm *kvm, gpa_t addr, const pgd_t *new_pgd)
+{
+ pgd_t *pgdp = &kvm->arch.pgd[pgd_index(addr)];
+
+ *pgdp = *new_pgd;
+ if (pgd_val(*pgdp) & _PAGE_LEAF)
+ stage2_remote_tlb_flush(kvm, addr);
+
+ return 0;
+}
+
+static int stage2_set_pmd(struct kvm *kvm, struct kvm_mmu_page_cache *pcache,
+ gpa_t addr, const pmd_t *new_pmd)
+{
+ int rc;
+ pmd_t *pmdp;
+ pgd_t new_pgd;
+ pgd_t *pgdp = &kvm->arch.pgd[pgd_index(addr)];
+
+ if (!pgd_val(*pgdp)) {
+ pmdp = stage2_cache_alloc(pcache);
+ if (!pmdp)
+ return -ENOMEM;
+ new_pgd = pfn_pgd(PFN_DOWN(__pa(pmdp)), __pgprot(_PAGE_TABLE));
+ rc = stage2_set_pgd(kvm, addr, &new_pgd);
+ if (rc)
+ return rc;
+ }
+
+ if (pgd_val(*pgdp) & _PAGE_LEAF)
+ return -EEXIST;
+
+ pmdp = (void *)pgd_page_vaddr(*pgdp);
+ pmdp = &pmdp[pmd_index(addr)];
+
+ *pmdp = *new_pmd;
+ if (pmd_val(*pmdp) & _PAGE_LEAF)
+ stage2_remote_tlb_flush(kvm, addr);
+
+ return 0;
+}
+
+static int stage2_set_pte(struct kvm *kvm,
+ struct kvm_mmu_page_cache *pcache,
+ gpa_t addr, const pte_t *new_pte)
+{
+ int rc;
+ pte_t *ptep;
+ pmd_t new_pmd;
+ pmd_t *pmdp;
+ pgd_t new_pgd;
+ pgd_t *pgdp = &kvm->arch.pgd[pgd_index(addr)];
+
+ if (!pgd_val(*pgdp)) {
+ pmdp = stage2_cache_alloc(pcache);
+ if (!pmdp)
+ return -ENOMEM;
+ new_pgd = pfn_pgd(PFN_DOWN(__pa(pmdp)), __pgprot(_PAGE_TABLE));
+ rc = stage2_set_pgd(kvm, addr, &new_pgd);
+ if (rc)
+ return rc;
+ }
+
+ if (pgd_val(*pgdp) & _PAGE_LEAF)
+ return -EEXIST;
+
+ if (stage2_have_pmd) {
+ pmdp = (void *)pgd_page_vaddr(*pgdp);
+ pmdp = &pmdp[pmd_index(addr)];
+ if (!pmd_present(*pmdp)) {
+ ptep = stage2_cache_alloc(pcache);
+ if (!ptep)
+ return -ENOMEM;
+ new_pmd = pfn_pmd(PFN_DOWN(__pa(ptep)),
+ __pgprot(_PAGE_TABLE));
+ rc = stage2_set_pmd(kvm, pcache, addr, &new_pmd);
+ if (rc)
+ return rc;
+ }
+
+ if (pmd_val(*pmdp) & _PAGE_LEAF)
+ return -EEXIST;
+
+ ptep = (void *)pmd_page_vaddr(*pmdp);
+ } else {
+ ptep = (void *)pgd_page_vaddr(*pgdp);
+ }
+
+ ptep = &ptep[pte_index(addr)];
+
+ *ptep = *new_pte;
+ if (pte_val(*ptep) & _PAGE_LEAF)
+ stage2_remote_tlb_flush(kvm, addr);
+
+ return 0;
+}
+
+static int stage2_map_page(struct kvm *kvm,
+ struct kvm_mmu_page_cache *pcache,
+ gpa_t gpa, phys_addr_t hpa,
+ unsigned long page_size, pgprot_t prot)
+{
+ pte_t new_pte;
+ pmd_t new_pmd;
+ pgd_t new_pgd;
+
+ if (page_size == PAGE_SIZE) {
+ new_pte = pfn_pte(PFN_DOWN(hpa), prot);
+ return stage2_set_pte(kvm, pcache, gpa, &new_pte);
+ }
+
+ if (stage2_have_pmd && page_size == PMD_SIZE) {
+ new_pmd = pfn_pmd(PFN_DOWN(hpa), prot);
+ return stage2_set_pmd(kvm, pcache, gpa, &new_pmd);
+ }
+
+ if (page_size == PGDIR_SIZE) {
+ new_pgd = pfn_pgd(PFN_DOWN(hpa), prot);
+ return stage2_set_pgd(kvm, gpa, &new_pgd);
+ }
+
+ return -EINVAL;
+}
+
+enum stage2_op {
+ STAGE2_OP_NOP = 0, /* Nothing */
+ STAGE2_OP_CLEAR, /* Clear/Unmap */
+ STAGE2_OP_WP, /* Write-protect */
+};
+
+static void stage2_op_pte(struct kvm *kvm, gpa_t addr, pte_t *ptep,
+ enum stage2_op op)
+{
+ BUG_ON(addr & (PAGE_SIZE - 1));
+
+ if (!pte_present(*ptep))
+ return;
+
+ if (op == STAGE2_OP_CLEAR)
+ set_pte(ptep, __pte(0));
+ else if (op == STAGE2_OP_WP)
+ set_pte(ptep, __pte(pte_val(*ptep) & ~_PAGE_WRITE));
+ stage2_remote_tlb_flush(kvm, addr);
+}
+
+static void stage2_op_pmd(struct kvm *kvm, gpa_t addr, pmd_t *pmdp,
+ enum stage2_op op)
+{
+ int i;
+ pte_t *ptep;
+
+ BUG_ON(addr & (PMD_SIZE - 1));
+
+ if (!pmd_present(*pmdp))
+ return;
+
+ if (pmd_val(*pmdp) & _PAGE_LEAF)
+ ptep = NULL;
+ else
+ ptep = (pte_t *)pmd_page_vaddr(*pmdp);
+
+ if (op == STAGE2_OP_CLEAR)
+ set_pmd(pmdp, __pmd(0));
+
+ if (ptep) {
+ for (i = 0; i < PTRS_PER_PTE; i++)
+ stage2_op_pte(kvm, addr + i * PAGE_SIZE, &ptep[i], op);
+ if (op == STAGE2_OP_CLEAR)
+ put_page(virt_to_page(ptep));
+ } else {
+ if (op == STAGE2_OP_WP)
+ set_pmd(pmdp, __pmd(pmd_val(*pmdp) & ~_PAGE_WRITE));
+ stage2_remote_tlb_flush(kvm, addr);
+ }
+}
+
+static void stage2_op_pgd(struct kvm *kvm, gpa_t addr, pgd_t *pgdp,
+ enum stage2_op op)
+{
+ int i;
+ pte_t *ptep;
+ pmd_t *pmdp;
+
+ BUG_ON(addr & (PGDIR_SIZE - 1));
+
+ if (!pgd_val(*pgdp))
+ return;
+
+ ptep = NULL;
+ pmdp = NULL;
+ if (!(pgd_val(*pgdp) & _PAGE_LEAF)) {
+ if (stage2_have_pmd)
+ pmdp = (pmd_t *)pgd_page_vaddr(*pgdp);
+ else
+ ptep = (pte_t *)pgd_page_vaddr(*pgdp);
+ }
+
+ if (op == STAGE2_OP_CLEAR)
+ set_pgd(pgdp, __pgd(0));
+
+ if (pmdp) {
+ for (i = 0; i < PTRS_PER_PMD; i++)
+ stage2_op_pmd(kvm, addr + i * PMD_SIZE, &pmdp[i], op);
+ if (op == STAGE2_OP_CLEAR)
+ put_page(virt_to_page(pmdp));
+ } else if (ptep) {
+ for (i = 0; i < PTRS_PER_PTE; i++)
+ stage2_op_pte(kvm, addr + i * PAGE_SIZE, &ptep[i], op);
+ if (op == STAGE2_OP_CLEAR)
+ put_page(virt_to_page(ptep));
+ } else {
+ if (op == STAGE2_OP_WP)
+ set_pgd(pgdp, __pgd(pgd_val(*pgdp) & ~_PAGE_WRITE));
+ stage2_remote_tlb_flush(kvm, addr);
+ }
+}
+
+static void stage2_unmap_range(struct kvm *kvm, gpa_t start, gpa_t size)
+{
+ pmd_t *pmdp;
+ pte_t *ptep;
+ pgd_t *pgdp;
+ gpa_t addr = start, end = start + size;
+
+ while (addr < end) {
+ pgdp = &kvm->arch.pgd[pgd_index(addr)];
+ if (!pgd_val(*pgdp)) {
+ addr += PGDIR_SIZE;
+ continue;
+ } else if (!(addr & (PGDIR_SIZE - 1)) &&
+ ((end - addr) >= PGDIR_SIZE)) {
+ stage2_op_pgd(kvm, addr, pgdp, STAGE2_OP_CLEAR);
+ addr += PGDIR_SIZE;
+ continue;
+ }
+
+ if (stage2_have_pmd) {
+ pmdp = (pmd_t *)pgd_page_vaddr(*pgdp);
+ if (!pmd_present(*pmdp)) {
+ addr += PMD_SIZE;
+ continue;
+ } else if (!(addr & (PMD_SIZE - 1)) &&
+ ((end - addr) >= PMD_SIZE)) {
+ stage2_op_pmd(kvm, addr, pmdp,
+ STAGE2_OP_CLEAR);
+ addr += PMD_SIZE;
+ continue;
+ }
+ ptep = (pte_t *)pmd_page_vaddr(*pmdp);
+ } else {
+ ptep = (pte_t *)pgd_page_vaddr(*pgdp);
+ }
+
+ stage2_op_pte(kvm, addr, ptep, STAGE2_OP_CLEAR);
+ addr += PAGE_SIZE;
+ }
+}
+
+static void stage2_wp_range(struct kvm *kvm, gpa_t start, gpa_t end)
+{
+ pmd_t *pmdp;
+ pte_t *ptep;
+ pgd_t *pgdp;
+ gpa_t addr = start;
+
+ while (addr < end) {
+ pgdp = &kvm->arch.pgd[pgd_index(addr)];
+ if (!pgd_val(*pgdp)) {
+ addr += PGDIR_SIZE;
+ continue;
+ } else if (!(addr & (PGDIR_SIZE - 1)) &&
+ ((end - addr) >= PGDIR_SIZE)) {
+ stage2_op_pgd(kvm, addr, pgdp, STAGE2_OP_WP);
+ addr += PGDIR_SIZE;
+ continue;
+ }
+
+ if (stage2_have_pmd) {
+ pmdp = (pmd_t *)pgd_page_vaddr(*pgdp);
+ if (!pmd_present(*pmdp)) {
+ addr += PMD_SIZE;
+ continue;
+ } else if (!(addr & (PMD_SIZE - 1)) &&
+ ((end - addr) >= PMD_SIZE)) {
+ stage2_op_pmd(kvm, addr, pmdp, STAGE2_OP_WP);
+ addr += PMD_SIZE;
+ continue;
+ }
+ ptep = (pte_t *)pmd_page_vaddr(*pmdp);
+ } else {
+ ptep = (pte_t *)pgd_page_vaddr(*pgdp);
+ }
+
+ stage2_op_pte(kvm, addr, ptep, STAGE2_OP_WP);
+ addr += PAGE_SIZE;
+ }
+}
+
+void stage2_wp_memory_region(struct kvm *kvm, int slot)
+{
+ struct kvm_memslots *slots = kvm_memslots(kvm);
+ struct kvm_memory_slot *memslot = id_to_memslot(slots, slot);
+ phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
+ phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+
+ spin_lock(&kvm->mmu_lock);
+ stage2_wp_range(kvm, start, end);
+ spin_unlock(&kvm->mmu_lock);
+ kvm_flush_remote_tlbs(kvm);
+}
+
+int stage2_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa,
+ unsigned long size, bool writable)
+{
+ pte_t pte;
+ int ret = 0;
+ unsigned long pfn;
+ phys_addr_t addr, end;
+ struct kvm_mmu_page_cache pcache = { 0, };
+
+ end = (gpa + size + PAGE_SIZE - 1) & PAGE_MASK;
+ pfn = __phys_to_pfn(hpa);
+
+ for (addr = gpa; addr < end; addr += PAGE_SIZE) {
+ pte = pfn_pte(pfn, PAGE_KERNEL);
+
+ if (!writable)
+ pte = pte_wrprotect(pte);
+
+ ret = stage2_cache_topup(&pcache,
+ stage2_cache_min_pages,
+ KVM_MMU_PAGE_CACHE_NR_OBJS);
+ if (ret)
+ goto out;
+
+ spin_lock(&kvm->mmu_lock);
+ ret = stage2_set_pte(kvm, &pcache, addr, &pte);
+ spin_unlock(&kvm->mmu_lock);
+ if (ret)
+ goto out;
+
+ pfn++;
+ }
+
+out:
+ stage2_cache_flush(&pcache);
+ return ret;
+
+}
+
void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
struct kvm_memory_slot *dont)
{
@@ -35,7 +461,7 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)

void kvm_arch_flush_shadow_all(struct kvm *kvm)
{
- /* TODO: */
+ kvm_riscv_stage2_free_pgd(kvm);
}

void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
@@ -49,7 +475,13 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
const struct kvm_memory_slot *new,
enum kvm_mr_change change)
{
- /* TODO: */
+ /*
+ * At this point memslot has been committed and there is an
+ * allocated dirty_bitmap[], dirty pages will be be tracked while the
+ * memory slot is write protected.
+ */
+ if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES)
+ stage2_wp_memory_region(kvm, mem->slot);
}

int kvm_arch_prepare_memory_region(struct kvm *kvm,
@@ -57,34 +489,218 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
const struct kvm_userspace_memory_region *mem,
enum kvm_mr_change change)
{
- /* TODO: */
- return 0;
+ hva_t hva = mem->userspace_addr;
+ hva_t reg_end = hva + mem->memory_size;
+ bool writable = !(mem->flags & KVM_MEM_READONLY);
+ int ret = 0;
+
+ if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
+ change != KVM_MR_FLAGS_ONLY)
+ return 0;
+
+ /*
+ * Prevent userspace from creating a memory region outside of the GPA
+ * space addressable by the KVM guest GPA space.
+ */
+ if ((memslot->base_gfn + memslot->npages) >=
+ (stage2_gpa_size >> PAGE_SHIFT))
+ return -EFAULT;
+
+ down_read(&current->mm->mmap_sem);
+
+ /*
+ * A memory region could potentially cover multiple VMAs, and
+ * any holes between them, so iterate over all of them to find
+ * out if we can map any of them right now.
+ *
+ * +--------------------------------------------+
+ * +---------------+----------------+ +----------------+
+ * | : VMA 1 | VMA 2 | | VMA 3 : |
+ * +---------------+----------------+ +----------------+
+ * | memory region |
+ * +--------------------------------------------+
+ */
+ do {
+ struct vm_area_struct *vma = find_vma(current->mm, hva);
+ hva_t vm_start, vm_end;
+
+ if (!vma || vma->vm_start >= reg_end)
+ break;
+
+ /*
+ * Mapping a read-only VMA is only allowed if the
+ * memory region is configured as read-only.
+ */
+ if (writable && !(vma->vm_flags & VM_WRITE)) {
+ ret = -EPERM;
+ break;
+ }
+
+ /* Take the intersection of this VMA with the memory region */
+ vm_start = max(hva, vma->vm_start);
+ vm_end = min(reg_end, vma->vm_end);
+
+ if (vma->vm_flags & VM_PFNMAP) {
+ gpa_t gpa = mem->guest_phys_addr +
+ (vm_start - mem->userspace_addr);
+ phys_addr_t pa;
+
+ pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
+ pa += vm_start - vma->vm_start;
+
+ /* IO region dirty page logging not allowed */
+ if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ret = stage2_ioremap(kvm, gpa, pa,
+ vm_end - vm_start, writable);
+ if (ret)
+ break;
+ }
+ hva = vm_end;
+ } while (hva < reg_end);
+
+ if (change == KVM_MR_FLAGS_ONLY)
+ goto out;
+
+ spin_lock(&kvm->mmu_lock);
+ if (ret)
+ stage2_unmap_range(kvm, mem->guest_phys_addr,
+ mem->memory_size);
+ spin_unlock(&kvm->mmu_lock);
+
+out:
+ up_read(&current->mm->mmap_sem);
+ return ret;
}

int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
bool is_write)
{
- /* TODO: */
- return 0;
+ int ret;
+ short lsb;
+ kvm_pfn_t hfn;
+ bool writeable;
+ gfn_t gfn = gpa >> PAGE_SHIFT;
+ struct vm_area_struct *vma;
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_mmu_page_cache *pcache = &vcpu->arch.mmu_page_cache;
+ unsigned long vma_pagesize;
+
+ down_read(&current->mm->mmap_sem);
+
+ vma = find_vma_intersection(current->mm, hva, hva + 1);
+ if (unlikely(!vma)) {
+ kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
+ up_read(&current->mm->mmap_sem);
+ return -EFAULT;
+ }
+
+ vma_pagesize = vma_kernel_pagesize(vma);
+
+ up_read(&current->mm->mmap_sem);
+
+ if (vma_pagesize != PGDIR_SIZE &&
+ vma_pagesize != PMD_SIZE &&
+ vma_pagesize != PAGE_SIZE) {
+ kvm_err("Invalid VMA page size 0x%lx\n", vma_pagesize);
+ return -EFAULT;
+ }
+
+ /* We need minimum second+third level pages */
+ ret = stage2_cache_topup(pcache, stage2_cache_min_pages,
+ KVM_MMU_PAGE_CACHE_NR_OBJS);
+ if (ret) {
+ kvm_err("Failed to topup stage2 cache\n");
+ return ret;
+ }
+
+ hfn = gfn_to_pfn_prot(kvm, gfn, is_write, &writeable);
+ if (hfn == KVM_PFN_ERR_HWPOISON) {
+ if (is_vm_hugetlb_page(vma))
+ lsb = huge_page_shift(hstate_vma(vma));
+ else
+ lsb = PAGE_SHIFT;
+
+ send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva,
+ lsb, current);
+ return 0;
+ }
+ if (is_error_noslot_pfn(hfn))
+ return -EFAULT;
+ if (!writeable && is_write)
+ return -EPERM;
+
+ spin_lock(&kvm->mmu_lock);
+
+ if (writeable) {
+ kvm_set_pfn_dirty(hfn);
+ ret = stage2_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT,
+ vma_pagesize, PAGE_WRITE_EXEC);
+ } else {
+ ret = stage2_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT,
+ vma_pagesize, PAGE_READ_EXEC);
+ }
+
+ if (ret)
+ kvm_err("Failed to map in stage2\n");
+
+ spin_unlock(&kvm->mmu_lock);
+ kvm_set_pfn_accessed(hfn);
+ kvm_release_pfn_clean(hfn);
+ return ret;
}

void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu)
{
- /* TODO: */
+ stage2_cache_flush(&vcpu->arch.mmu_page_cache);
}

int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm)
{
- /* TODO: */
+ if (kvm->arch.pgd != NULL) {
+ kvm_err("kvm_arch already initialized?\n");
+ return -EINVAL;
+ }
+
+ kvm->arch.pgd = alloc_pages_exact(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO);
+ if (!kvm->arch.pgd)
+ return -ENOMEM;
+ kvm->arch.pgd_phys = virt_to_phys(kvm->arch.pgd);
+
return 0;
}

void kvm_riscv_stage2_free_pgd(struct kvm *kvm)
{
- /* TODO: */
+ void *pgd = NULL;
+
+ spin_lock(&kvm->mmu_lock);
+ if (kvm->arch.pgd) {
+ stage2_unmap_range(kvm, 0UL, stage2_gpa_size);
+ pgd = READ_ONCE(kvm->arch.pgd);
+ kvm->arch.pgd = NULL;
+ kvm->arch.pgd_phys = 0;
+ }
+ spin_unlock(&kvm->mmu_lock);
+
+ /* Free the HW pgd, one page at a time */
+ if (pgd)
+ free_pages_exact(pgd, PAGE_SIZE);
}

void kvm_riscv_stage2_update_pgtbl(struct kvm_vcpu *vcpu)
{
- /* TODO: */
+ unsigned long hgatp = HGATP_MODE;
+ struct kvm_arch *k = &vcpu->kvm->arch;
+
+ hgatp |= (k->vmid.vmid << HGATP_VMID_SHIFT) & HGATP_VMID_MASK;
+ hgatp |= (k->pgd_phys >> PAGE_SHIFT) & HGATP_PPN;
+
+ csr_write(CSR_HGATP, hgatp);
+
+ if (!kvm_riscv_stage2_vmid_bits())
+ __kvm_riscv_hfence_gvma_all();
}
--
2.17.1

2019-07-29 11:59:28

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 15/16] RISC-V: KVM: Add SBI v0.1 support

From: Atish Patra <[email protected]>

The KVM host kernel running in HS-mode needs to handle SBI calls coming
from guest kernel running in VS-mode.

This patch adds SBI v0.1 support in KVM RISC-V. All the SBI calls are
implemented correctly except remote tlb flushes. For remote TLB flushes,
we are doing full TLB flush and this will be optimized in future.

Signed-off-by: Atish Patra <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
---
arch/riscv/include/asm/kvm_host.h | 2 +
arch/riscv/kvm/Makefile | 2 +-
arch/riscv/kvm/vcpu_exit.c | 3 +
arch/riscv/kvm/vcpu_sbi.c | 118 ++++++++++++++++++++++++++++++
4 files changed, 124 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/kvm/vcpu_sbi.c

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 1bb4befa89da..22a62ffc09f5 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -227,4 +227,6 @@ void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
void kvm_riscv_halt_guest(struct kvm *kvm);
void kvm_riscv_resume_guest(struct kvm *kvm);

+int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu);
+
#endif /* __RISCV_KVM_HOST_H__ */
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index 3e0c7558320d..b56dc1650d2c 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
kvm-objs := $(common-objs-y)

kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
-kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
+kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o vcpu_sbi.o

obj-$(CONFIG_KVM) += kvm.o
diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
index 2d09640c98b2..003e43facdfc 100644
--- a/arch/riscv/kvm/vcpu_exit.c
+++ b/arch/riscv/kvm/vcpu_exit.c
@@ -531,6 +531,9 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
(vcpu->arch.guest_context.hstatus & HSTATUS_STL))
ret = stage2_page_fault(vcpu, run, scause, stval);
break;
+ case EXC_SUPERVISOR_SYSCALL:
+ if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
+ ret = kvm_riscv_vcpu_sbi_ecall(vcpu);
default:
break;
};
diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
new file mode 100644
index 000000000000..8dfdbf744378
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_sbi.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * Copyright (c) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Atish Patra <[email protected]>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <asm/csr.h>
+#include <asm/kvm_vcpu_timer.h>
+
+#define SBI_VERSION_MAJOR 0
+#define SBI_VERSION_MINOR 1
+
+static unsigned long kvm_sbi_unpriv_load(const unsigned long *addr,
+ struct kvm_vcpu *vcpu)
+{
+ unsigned long flags, val;
+ unsigned long __hstatus, __sstatus;
+
+ local_irq_save(flags);
+ __hstatus = csr_read(CSR_HSTATUS);
+ __sstatus = csr_read(CSR_SSTATUS);
+ csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
+ csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus);
+ val = *addr;
+ csr_write(CSR_HSTATUS, __hstatus);
+ csr_write(CSR_SSTATUS, __sstatus);
+ local_irq_restore(flags);
+
+ return val;
+}
+
+static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu, u32 type)
+{
+ int i;
+ struct kvm_vcpu *tmp;
+
+ kvm_for_each_vcpu(i, tmp, vcpu->kvm)
+ tmp->arch.power_off = true;
+ kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
+
+ memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
+ vcpu->run->system_event.type = type;
+ vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+}
+
+int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu)
+{
+ int ret = 1;
+ u64 next_cycle;
+ int vcpuid;
+ struct kvm_vcpu *remote_vcpu;
+ ulong dhart_mask;
+ struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
+
+ if (!cp)
+ return -EINVAL;
+ switch (cp->a7) {
+ case SBI_SET_TIMER:
+#if __riscv_xlen == 32
+ next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
+#else
+ next_cycle = (u64)cp->a0;
+#endif
+ kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle);
+ break;
+ case SBI_CONSOLE_PUTCHAR:
+ /* Not implemented */
+ cp->a0 = -ENOTSUPP;
+ break;
+ case SBI_CONSOLE_GETCHAR:
+ /* Not implemented */
+ cp->a0 = -ENOTSUPP;
+ break;
+ case SBI_CLEAR_IPI:
+ kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_SOFT);
+ break;
+ case SBI_SEND_IPI:
+ dhart_mask = kvm_sbi_unpriv_load((unsigned long *)cp->a0, vcpu);
+ for_each_set_bit(vcpuid, &dhart_mask, BITS_PER_LONG) {
+ remote_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, vcpuid);
+ kvm_riscv_vcpu_set_interrupt(remote_vcpu, IRQ_S_SOFT);
+ }
+ break;
+ case SBI_SHUTDOWN:
+ kvm_sbi_system_shutdown(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN);
+ ret = 0;
+ break;
+ case SBI_REMOTE_FENCE_I:
+ sbi_remote_fence_i(NULL);
+ break;
+
+ /*TODO:There should be a way to call remote hfence.bvma.
+ * Preferred method is now a SBI call. Until then, just flush
+ * all tlbs.
+ */
+ case SBI_REMOTE_SFENCE_VMA:
+ /*TODO: Parse vma range.*/
+ sbi_remote_sfence_vma(NULL, 0, 0);
+ break;
+ case SBI_REMOTE_SFENCE_VMA_ASID:
+ /*TODO: Parse vma range for given ASID */
+ sbi_remote_sfence_vma(NULL, 0, 0);
+ break;
+ default:
+ cp->a0 = ENOTSUPP;
+ break;
+ };
+
+ if (ret >= 0)
+ cp->sepc += 4;
+
+ return ret;
+}
--
2.17.1

2019-07-29 11:59:28

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 13/16] RISC-V: KVM: Add timer functionality

From: Atish Patra <[email protected]>

The RISC-V hypervisor specification doesn't have any virtual timer
feature.

Due to this, the guest VCPU timer will be programmed via SBI calls.
The host will use a separate hrtimer event for each guest VCPU to
provide timer functionality. We inject a virtual timer interrupt to
the guest VCPU whenever the guest VCPU hrtimer event expires.

The following features are not supported yet and will be added in
future:
1. A time offset to adjust guest time from host time
2. A saved next event in guest vcpu for vm migration

Signed-off-by: Atish Patra <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
---
arch/riscv/include/asm/kvm_host.h | 4 +
arch/riscv/include/asm/kvm_vcpu_timer.h | 32 +++++++
arch/riscv/kvm/Makefile | 2 +-
arch/riscv/kvm/vcpu.c | 6 ++
arch/riscv/kvm/vcpu_timer.c | 106 ++++++++++++++++++++++++
drivers/clocksource/timer-riscv.c | 6 ++
include/clocksource/timer-riscv.h | 14 ++++
7 files changed, 169 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
create mode 100644 arch/riscv/kvm/vcpu_timer.c
create mode 100644 include/clocksource/timer-riscv.h

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 58f61ce28461..193a7ff0eb31 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -12,6 +12,7 @@
#include <linux/types.h>
#include <linux/kvm.h>
#include <linux/kvm_types.h>
+#include <asm/kvm_vcpu_timer.h>

#ifdef CONFIG_64BIT
#define KVM_MAX_VCPUS (1U << 16)
@@ -158,6 +159,9 @@ struct kvm_vcpu_arch {
raw_spinlock_t irqs_lock;
unsigned long irqs_pending;

+ /* VCPU Timer */
+ struct kvm_vcpu_timer timer;
+
/* MMIO instruction details */
struct kvm_mmio_decode mmio_decode;

diff --git a/arch/riscv/include/asm/kvm_vcpu_timer.h b/arch/riscv/include/asm/kvm_vcpu_timer.h
new file mode 100644
index 000000000000..df67ea86988e
--- /dev/null
+++ b/arch/riscv/include/asm/kvm_vcpu_timer.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Atish Patra <[email protected]>
+ */
+
+#ifndef __KVM_VCPU_RISCV_TIMER_H
+#define __KVM_VCPU_RISCV_TIMER_H
+
+#include <linux/hrtimer.h>
+
+#define VCPU_TIMER_PROGRAM_THRESHOLD_NS 1000
+
+struct kvm_vcpu_timer {
+ bool init_done;
+ /* Check if the timer is programmed */
+ bool is_set;
+ struct hrtimer hrt;
+ /* Mult & Shift values to get nanosec from cycles */
+ u32 mult;
+ u32 shift;
+};
+
+int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu,
+ unsigned long ncycles);
+
+#endif
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index c0f57f26c13d..3e0c7558320d 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
kvm-objs := $(common-objs-y)

kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
-kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o
+kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o

obj-$(CONFIG_KVM) += kvm.o
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index f3b0cadc1973..ed1f06b17953 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -52,6 +52,8 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)

memcpy(cntx, reset_cntx, sizeof(*cntx));

+ kvm_riscv_vcpu_timer_reset(vcpu);
+
raw_spin_lock_irqsave(&vcpu->arch.irqs_lock, f);
vcpu->arch.irqs_pending = 0;
raw_spin_unlock_irqrestore(&vcpu->arch.irqs_lock, f);
@@ -125,6 +127,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
csr->hideleg |= SIE_STIE;
csr->hideleg |= SIE_SEIE;

+ /* Setup VCPU timer */
+ kvm_riscv_vcpu_timer_init(vcpu);
+
/* Reset VCPU */
kvm_riscv_reset_vcpu(vcpu);

@@ -133,6 +138,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)

void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
+ kvm_riscv_vcpu_timer_deinit(vcpu);
kvm_riscv_stage2_flush_cache(vcpu);
kmem_cache_free(kvm_vcpu_cache, vcpu);
}
diff --git a/arch/riscv/kvm/vcpu_timer.c b/arch/riscv/kvm/vcpu_timer.c
new file mode 100644
index 000000000000..a45ca06e1aa6
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_timer.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Atish Patra <[email protected]>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <clocksource/timer-riscv.h>
+#include <asm/csr.h>
+#include <asm/kvm_vcpu_timer.h>
+
+static enum hrtimer_restart kvm_riscv_vcpu_hrtimer_expired(struct hrtimer *h)
+{
+ struct kvm_vcpu_timer *t = container_of(h, struct kvm_vcpu_timer, hrt);
+ struct kvm_vcpu *vcpu = container_of(t, struct kvm_vcpu, arch.timer);
+
+ t->is_set = false;
+ kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_TIMER);
+
+ return HRTIMER_NORESTART;
+}
+
+static u64 kvm_riscv_delta_cycles2ns(u64 cycles, struct kvm_vcpu_timer *t)
+{
+ unsigned long flags;
+ u64 cycles_now, cycles_delta, delta_ns;
+
+ local_irq_save(flags);
+ cycles_now = get_cycles64();
+ if (cycles_now < cycles)
+ cycles_delta = cycles - cycles_now;
+ else
+ cycles_delta = 0;
+ delta_ns = (cycles_delta * t->mult) >> t->shift;
+ local_irq_restore(flags);
+
+ return delta_ns;
+}
+
+static int kvm_riscv_vcpu_timer_cancel(struct kvm_vcpu_timer *t)
+{
+ if (!t->init_done || !t->is_set)
+ return -EINVAL;
+
+ hrtimer_cancel(&t->hrt);
+ t->is_set = false;
+
+ return 0;
+}
+
+int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu,
+ unsigned long ncycles)
+{
+ struct kvm_vcpu_timer *t = &vcpu->arch.timer;
+ u64 delta_ns = kvm_riscv_delta_cycles2ns(ncycles, t);
+
+ if (!t->init_done)
+ return -EINVAL;
+
+ kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_TIMER);
+
+ if (delta_ns > VCPU_TIMER_PROGRAM_THRESHOLD_NS) {
+ hrtimer_start(&t->hrt, ktime_add_ns(ktime_get(), delta_ns),
+ HRTIMER_MODE_ABS);
+ t->is_set = true;
+ } else
+ kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_TIMER);
+
+ return 0;
+}
+
+int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_timer *t = &vcpu->arch.timer;
+
+ if (t->init_done)
+ return -EINVAL;
+
+ hrtimer_init(&t->hrt, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+ t->hrt.function = kvm_riscv_vcpu_hrtimer_expired;
+ t->init_done = true;
+ t->is_set = false;
+
+ riscv_cs_get_mult_shift(&t->mult, &t->shift);
+
+ return 0;
+}
+
+int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu)
+{
+ int ret;
+
+ ret = kvm_riscv_vcpu_timer_cancel(&vcpu->arch.timer);
+ vcpu->arch.timer.init_done = false;
+
+ return ret;
+}
+
+int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu)
+{
+ return kvm_riscv_vcpu_timer_cancel(&vcpu->arch.timer);
+}
diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
index 09e031176bc6..749b25876cad 100644
--- a/drivers/clocksource/timer-riscv.c
+++ b/drivers/clocksource/timer-riscv.c
@@ -80,6 +80,12 @@ static int riscv_timer_dying_cpu(unsigned int cpu)
return 0;
}

+void riscv_cs_get_mult_shift(u32 *mult, u32 *shift)
+{
+ *mult = riscv_clocksource.mult;
+ *shift = riscv_clocksource.shift;
+}
+
/* called directly from the low-level interrupt handler */
void riscv_timer_interrupt(void)
{
diff --git a/include/clocksource/timer-riscv.h b/include/clocksource/timer-riscv.h
new file mode 100644
index 000000000000..ecb9f70e2f98
--- /dev/null
+++ b/include/clocksource/timer-riscv.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Atish Patra <[email protected]>
+ */
+
+#ifndef __KVM_TIMER_RISCV_H
+#define __KVM_TIMER_RISCV_H
+
+void riscv_cs_get_mult_shift(u32 *mult, u32 *shift);
+
+#endif
--
2.17.1

2019-07-29 11:59:31

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 09/16] RISC-V: KVM: Handle WFI exits for VCPU

We get illegal instruction trap whenever Guest/VM executes WFI
instruction.

This patch handles WFI trap by blocking the trapped VCPU using
kvm_vcpu_block() API. The blocked VCPU will be automatically
resumed whenever a VCPU interrupt is injected from user-space
or from in-kernel IRQCHIP emulation.

Signed-off-by: Anup Patel <[email protected]>
---
arch/riscv/kvm/vcpu_exit.c | 86 ++++++++++++++++++++++++++++++++++++++
1 file changed, 86 insertions(+)

diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
index 4dafefa59338..2d09640c98b2 100644
--- a/arch/riscv/kvm/vcpu_exit.c
+++ b/arch/riscv/kvm/vcpu_exit.c
@@ -12,6 +12,9 @@
#include <linux/kvm_host.h>
#include <asm/csr.h>

+#define INSN_MASK_WFI 0xffffff00
+#define INSN_MATCH_WFI 0x10500000
+
#define INSN_MATCH_LB 0x3
#define INSN_MASK_LB 0x707f
#define INSN_MATCH_LH 0x1003
@@ -178,6 +181,85 @@ static ulong get_insn(struct kvm_vcpu *vcpu)
return val;
}

+typedef int (*illegal_insn_func)(struct kvm_vcpu *vcpu,
+ struct kvm_run *run,
+ ulong insn);
+
+static int truly_illegal_insn(struct kvm_vcpu *vcpu,
+ struct kvm_run *run,
+ ulong insn)
+{
+ /* TODO: Redirect trap to Guest VCPU */
+ return -ENOTSUPP;
+}
+
+static int system_opcode_insn(struct kvm_vcpu *vcpu,
+ struct kvm_run *run,
+ ulong insn)
+{
+ if ((insn & INSN_MASK_WFI) == INSN_MATCH_WFI) {
+ vcpu->stat.wfi_exit_stat++;
+ if (!kvm_riscv_vcpu_has_interrupt(vcpu)) {
+ kvm_vcpu_block(vcpu);
+ kvm_clear_request(KVM_REQ_UNHALT, vcpu);
+ }
+ vcpu->arch.guest_context.sepc += INSN_LEN(insn);
+ return 1;
+ }
+
+ return truly_illegal_insn(vcpu, run, insn);
+}
+
+static illegal_insn_func illegal_insn_table[32] = {
+ truly_illegal_insn, /* 0 */
+ truly_illegal_insn, /* 1 */
+ truly_illegal_insn, /* 2 */
+ truly_illegal_insn, /* 3 */
+ truly_illegal_insn, /* 4 */
+ truly_illegal_insn, /* 5 */
+ truly_illegal_insn, /* 6 */
+ truly_illegal_insn, /* 7 */
+ truly_illegal_insn, /* 8 */
+ truly_illegal_insn, /* 9 */
+ truly_illegal_insn, /* 10 */
+ truly_illegal_insn, /* 11 */
+ truly_illegal_insn, /* 12 */
+ truly_illegal_insn, /* 13 */
+ truly_illegal_insn, /* 14 */
+ truly_illegal_insn, /* 15 */
+ truly_illegal_insn, /* 16 */
+ truly_illegal_insn, /* 17 */
+ truly_illegal_insn, /* 18 */
+ truly_illegal_insn, /* 19 */
+ truly_illegal_insn, /* 20 */
+ truly_illegal_insn, /* 21 */
+ truly_illegal_insn, /* 22 */
+ truly_illegal_insn, /* 23 */
+ truly_illegal_insn, /* 24 */
+ truly_illegal_insn, /* 25 */
+ truly_illegal_insn, /* 26 */
+ truly_illegal_insn, /* 27 */
+ system_opcode_insn, /* 28 */
+ truly_illegal_insn, /* 29 */
+ truly_illegal_insn, /* 30 */
+ truly_illegal_insn /* 31 */
+};
+
+static int illegal_inst_fault(struct kvm_vcpu *vcpu, struct kvm_run *run,
+ unsigned long stval)
+{
+ ulong insn = stval;
+
+ if (unlikely((insn & 3) != 3)) {
+ if (insn == 0)
+ insn = get_insn(vcpu);
+ if ((insn & 3) != 3)
+ return truly_illegal_insn(vcpu, run, insn);
+ }
+
+ return illegal_insn_table[(insn & 0x7c) >> 2](vcpu, run, insn);
+}
+
static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run,
unsigned long fault_addr)
{
@@ -438,6 +520,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
ret = -EFAULT;
run->exit_reason = KVM_EXIT_UNKNOWN;
switch (scause) {
+ case EXC_INST_ILLEGAL:
+ if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
+ ret = illegal_inst_fault(vcpu, run, stval);
+ break;
case EXC_INST_PAGE_FAULT:
case EXC_LOAD_PAGE_FAULT:
case EXC_STORE_PAGE_FAULT:
--
2.17.1

2019-07-29 11:59:33

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 12/16] RISC-V: KVM: Implement MMU notifiers

This patch implements MMU notifiers for KVM RISC-V so that Guest
physical address space is in-sync with Host physical address space.

This will allow swapping, page migration, etc to work transparently
with KVM RISC-V.

Signed-off-by: Anup Patel <[email protected]>
---
arch/riscv/include/asm/kvm_host.h | 7 ++
arch/riscv/kvm/Kconfig | 1 +
arch/riscv/kvm/mmu.c | 200 +++++++++++++++++++++++++++++-
3 files changed, 207 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 354d179c43cf..58f61ce28461 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -177,6 +177,13 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}

+#define KVM_ARCH_WANT_MMU_NOTIFIER
+int kvm_unmap_hva_range(struct kvm *kvm,
+ unsigned long start, unsigned long end);
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
+int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
+
extern void __kvm_riscv_hfence_gvma_vmid_gpa(unsigned long vmid,
unsigned long gpa);
extern void __kvm_riscv_hfence_gvma_vmid(unsigned long vmid);
diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig
index 35fd30d0e432..002e14ee37f6 100644
--- a/arch/riscv/kvm/Kconfig
+++ b/arch/riscv/kvm/Kconfig
@@ -20,6 +20,7 @@ if VIRTUALIZATION
config KVM
tristate "Kernel-based Virtual Machine (KVM) support"
depends on OF
+ select MMU_NOTIFIER
select PREEMPT_NOTIFIERS
select ANON_INODES
select KVM_MMIO
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 9561c5e85f75..5c992d4b4317 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -67,6 +67,66 @@ static void *stage2_cache_alloc(struct kvm_mmu_page_cache *pcache)
return p;
}

+static int stage2_pgdp_test_and_clear_young(pgd_t *pgd)
+{
+ return ptep_test_and_clear_young(NULL, 0, (pte_t *)pgd);
+}
+
+static int stage2_pmdp_test_and_clear_young(pmd_t *pmd)
+{
+ return ptep_test_and_clear_young(NULL, 0, (pte_t *)pmd);
+}
+
+static int stage2_ptep_test_and_clear_young(pte_t *pte)
+{
+ return ptep_test_and_clear_young(NULL, 0, pte);
+}
+
+static bool stage2_get_leaf_entry(struct kvm *kvm, gpa_t addr,
+ pgd_t **pgdpp, pmd_t **pmdpp, pte_t **ptepp)
+{
+ pgd_t *pgdp;
+ pmd_t *pmdp;
+ pte_t *ptep;
+
+ *pgdpp = NULL;
+ *pmdpp = NULL;
+ *ptepp = NULL;
+
+ pgdp = &kvm->arch.pgd[pgd_index(addr)];
+ if (!pgd_val(*pgdp))
+ return false;
+ if (pgd_val(*pgdp) & _PAGE_LEAF) {
+ *pgdpp = pgdp;
+ return true;
+ }
+
+ if (stage2_have_pmd) {
+ pmdp = (void *)pgd_page_vaddr(*pgdp);
+ pmdp = &pmdp[pmd_index(addr)];
+ if (!pmd_present(*pmdp))
+ return false;
+ if (pmd_val(*pmdp) & _PAGE_LEAF) {
+ *pmdpp = pmdp;
+ return true;
+ }
+
+ ptep = (void *)pmd_page_vaddr(*pmdp);
+ } else {
+ ptep = (void *)pgd_page_vaddr(*pgdp);
+ }
+
+ ptep = &ptep[pte_index(addr)];
+ if (!pte_present(*ptep))
+ return false;
+ if (pte_val(*ptep) & _PAGE_LEAF) {
+ *ptepp = ptep;
+ return true;
+ }
+
+ return false;
+}
+
struct local_guest_tlb_info {
struct kvm_vmid *vmid;
gpa_t addr;
@@ -444,6 +504,38 @@ int stage2_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa,

}

+static int handle_hva_to_gpa(struct kvm *kvm,
+ unsigned long start,
+ unsigned long end,
+ int (*handler)(struct kvm *kvm,
+ gpa_t gpa, u64 size,
+ void *data),
+ void *data)
+{
+ struct kvm_memslots *slots;
+ struct kvm_memory_slot *memslot;
+ int ret = 0;
+
+ slots = kvm_memslots(kvm);
+
+ /* we only care about the pages that the guest sees */
+ kvm_for_each_memslot(memslot, slots) {
+ unsigned long hva_start, hva_end;
+ gfn_t gpa;
+
+ hva_start = max(start, memslot->userspace_addr);
+ hva_end = min(end, memslot->userspace_addr +
+ (memslot->npages << PAGE_SHIFT));
+ if (hva_start >= hva_end)
+ continue;
+
+ gpa = hva_to_gfn_memslot(hva_start, memslot) << PAGE_SHIFT;
+ ret |= handler(kvm, gpa, (u64)(hva_end - hva_start), data);
+ }
+
+ return ret;
+}
+
void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
struct kvm_memory_slot *dont)
{
@@ -576,6 +668,106 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
return ret;
}

+static int kvm_unmap_hva_handler(struct kvm *kvm,
+ gpa_t gpa, u64 size, void *data)
+{
+ stage2_unmap_range(kvm, gpa, size);
+ return 0;
+}
+
+int kvm_unmap_hva_range(struct kvm *kvm,
+ unsigned long start, unsigned long end)
+{
+ if (!kvm->arch.pgd)
+ return 0;
+
+ handle_hva_to_gpa(kvm, start, end,
+ &kvm_unmap_hva_handler, NULL);
+ return 0;
+}
+
+static int kvm_set_spte_handler(struct kvm *kvm,
+ gpa_t gpa, u64 size, void *data)
+{
+ pte_t *pte = (pte_t *)data;
+
+ WARN_ON(size != PAGE_SIZE);
+ stage2_set_pte(kvm, NULL, gpa, pte);
+
+ return 0;
+}
+
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+{
+ unsigned long end = hva + PAGE_SIZE;
+ kvm_pfn_t pfn = pte_pfn(pte);
+ pte_t stage2_pte;
+
+ if (!kvm->arch.pgd)
+ return 0;
+
+ stage2_pte = pfn_pte(pfn, PAGE_WRITE_EXEC);
+ handle_hva_to_gpa(kvm, hva, end,
+ &kvm_set_spte_handler, &stage2_pte);
+
+ return 0;
+}
+
+static int kvm_age_hva_handler(struct kvm *kvm,
+ gpa_t gpa, u64 size, void *data)
+{
+ pgd_t *pgd;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PGDIR_SIZE);
+ if (!stage2_get_leaf_entry(kvm, gpa, &pgd, &pmd, &pte))
+ return 0;
+
+ if (pgd)
+ return stage2_pgdp_test_and_clear_young(pgd);
+ else if (pmd)
+ return stage2_pmdp_test_and_clear_young(pmd);
+ else
+ return stage2_ptep_test_and_clear_young(pte);
+}
+
+int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
+{
+ if (!kvm->arch.pgd)
+ return 0;
+
+ return handle_hva_to_gpa(kvm, start, end, kvm_age_hva_handler, NULL);
+}
+
+static int kvm_test_age_hva_handler(struct kvm *kvm,
+ gpa_t gpa, u64 size, void *data)
+{
+ pgd_t *pgd;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ WARN_ON(size != PAGE_SIZE && size != PMD_SIZE);
+ if (!stage2_get_leaf_entry(kvm, gpa, &pgd, &pmd, &pte))
+ return 0;
+
+ if (pgd)
+ return pte_young(*((pte_t *)pgd));
+ else if (pmd)
+ return pte_young(*((pte_t *)pmd));
+ else
+ return pte_young(*pte);
+}
+
+int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+{
+ if (!kvm->arch.pgd)
+ return 0;
+
+ return handle_hva_to_gpa(kvm, hva, hva,
+ kvm_test_age_hva_handler, NULL);
+}
+
int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
bool is_write)
{
@@ -587,7 +779,7 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
struct vm_area_struct *vma;
struct kvm *kvm = vcpu->kvm;
struct kvm_mmu_page_cache *pcache = &vcpu->arch.mmu_page_cache;
- unsigned long vma_pagesize;
+ unsigned long vma_pagesize, mmu_seq;

down_read(&current->mm->mmap_sem);

@@ -617,6 +809,8 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
return ret;
}

+ mmu_seq = kvm->mmu_notifier_seq;
+
hfn = gfn_to_pfn_prot(kvm, gfn, is_write, &writeable);
if (hfn == KVM_PFN_ERR_HWPOISON) {
if (is_vm_hugetlb_page(vma))
@@ -635,6 +829,9 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,

spin_lock(&kvm->mmu_lock);

+ if (mmu_notifier_retry(kvm, mmu_seq))
+ goto out_unlock;
+
if (writeable) {
kvm_set_pfn_dirty(hfn);
ret = stage2_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT,
@@ -647,6 +844,7 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
if (ret)
kvm_err("Failed to map in stage2\n");

+out_unlock:
spin_unlock(&kvm->mmu_lock);
kvm_set_pfn_accessed(hfn);
kvm_release_pfn_clean(hfn);
--
2.17.1

2019-07-29 11:59:38

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 14/16] RISC-V: KVM: FP lazy save/restore

From: Atish Patra <[email protected]>

This patch adds floating point (F and D extension) context save/restore
for guest VCPUs. The FP context is saved and restored lazily only when
kernel enter/exits the in-kernel run loop and not during the KVM world
switch. This way FP save/restore has minimal impact on KVM performance.

Signed-off-by: Atish Patra <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
---
arch/riscv/include/asm/kvm_host.h | 5 +
arch/riscv/kernel/asm-offsets.c | 72 +++++++++++++
arch/riscv/kvm/vcpu.c | 75 +++++++++++++
arch/riscv/kvm/vcpu_switch.S | 174 ++++++++++++++++++++++++++++++
4 files changed, 326 insertions(+)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 193a7ff0eb31..1bb4befa89da 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -113,6 +113,7 @@ struct kvm_cpu_context {
unsigned long sepc;
unsigned long sstatus;
unsigned long hstatus;
+ union __riscv_fp_state fp;
};

struct kvm_vcpu_csr {
@@ -212,6 +213,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
unsigned long scause, unsigned long stval);

void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch);
+void __kvm_riscv_vcpu_fp_f_save(struct kvm_cpu_context *context);
+void __kvm_riscv_vcpu_fp_f_restore(struct kvm_cpu_context *context);
+void __kvm_riscv_vcpu_fp_d_save(struct kvm_cpu_context *context);
+void __kvm_riscv_vcpu_fp_d_restore(struct kvm_cpu_context *context);

int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 711656710190..9980069a1acf 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -185,6 +185,78 @@ void asm_offsets(void)
OFFSET(KVM_ARCH_HOST_SSCRATCH, kvm_vcpu_arch, host_sscratch);
OFFSET(KVM_ARCH_HOST_STVEC, kvm_vcpu_arch, host_stvec);

+ /* F extension */
+
+ OFFSET(KVM_ARCH_FP_F_F0, kvm_cpu_context, fp.f.f[0]);
+ OFFSET(KVM_ARCH_FP_F_F1, kvm_cpu_context, fp.f.f[1]);
+ OFFSET(KVM_ARCH_FP_F_F2, kvm_cpu_context, fp.f.f[2]);
+ OFFSET(KVM_ARCH_FP_F_F3, kvm_cpu_context, fp.f.f[3]);
+ OFFSET(KVM_ARCH_FP_F_F4, kvm_cpu_context, fp.f.f[4]);
+ OFFSET(KVM_ARCH_FP_F_F5, kvm_cpu_context, fp.f.f[5]);
+ OFFSET(KVM_ARCH_FP_F_F6, kvm_cpu_context, fp.f.f[6]);
+ OFFSET(KVM_ARCH_FP_F_F7, kvm_cpu_context, fp.f.f[7]);
+ OFFSET(KVM_ARCH_FP_F_F8, kvm_cpu_context, fp.f.f[8]);
+ OFFSET(KVM_ARCH_FP_F_F9, kvm_cpu_context, fp.f.f[9]);
+ OFFSET(KVM_ARCH_FP_F_F10, kvm_cpu_context, fp.f.f[10]);
+ OFFSET(KVM_ARCH_FP_F_F11, kvm_cpu_context, fp.f.f[11]);
+ OFFSET(KVM_ARCH_FP_F_F12, kvm_cpu_context, fp.f.f[12]);
+ OFFSET(KVM_ARCH_FP_F_F13, kvm_cpu_context, fp.f.f[13]);
+ OFFSET(KVM_ARCH_FP_F_F14, kvm_cpu_context, fp.f.f[14]);
+ OFFSET(KVM_ARCH_FP_F_F15, kvm_cpu_context, fp.f.f[15]);
+ OFFSET(KVM_ARCH_FP_F_F16, kvm_cpu_context, fp.f.f[16]);
+ OFFSET(KVM_ARCH_FP_F_F17, kvm_cpu_context, fp.f.f[17]);
+ OFFSET(KVM_ARCH_FP_F_F18, kvm_cpu_context, fp.f.f[18]);
+ OFFSET(KVM_ARCH_FP_F_F19, kvm_cpu_context, fp.f.f[19]);
+ OFFSET(KVM_ARCH_FP_F_F20, kvm_cpu_context, fp.f.f[20]);
+ OFFSET(KVM_ARCH_FP_F_F21, kvm_cpu_context, fp.f.f[21]);
+ OFFSET(KVM_ARCH_FP_F_F22, kvm_cpu_context, fp.f.f[22]);
+ OFFSET(KVM_ARCH_FP_F_F23, kvm_cpu_context, fp.f.f[23]);
+ OFFSET(KVM_ARCH_FP_F_F24, kvm_cpu_context, fp.f.f[24]);
+ OFFSET(KVM_ARCH_FP_F_F25, kvm_cpu_context, fp.f.f[25]);
+ OFFSET(KVM_ARCH_FP_F_F26, kvm_cpu_context, fp.f.f[26]);
+ OFFSET(KVM_ARCH_FP_F_F27, kvm_cpu_context, fp.f.f[27]);
+ OFFSET(KVM_ARCH_FP_F_F28, kvm_cpu_context, fp.f.f[28]);
+ OFFSET(KVM_ARCH_FP_F_F29, kvm_cpu_context, fp.f.f[29]);
+ OFFSET(KVM_ARCH_FP_F_F30, kvm_cpu_context, fp.f.f[30]);
+ OFFSET(KVM_ARCH_FP_F_F31, kvm_cpu_context, fp.f.f[31]);
+ OFFSET(KVM_ARCH_FP_F_FCSR, kvm_cpu_context, fp.f.fcsr);
+
+ /* D extension */
+
+ OFFSET(KVM_ARCH_FP_D_F0, kvm_cpu_context, fp.d.f[0]);
+ OFFSET(KVM_ARCH_FP_D_F1, kvm_cpu_context, fp.d.f[1]);
+ OFFSET(KVM_ARCH_FP_D_F2, kvm_cpu_context, fp.d.f[2]);
+ OFFSET(KVM_ARCH_FP_D_F3, kvm_cpu_context, fp.d.f[3]);
+ OFFSET(KVM_ARCH_FP_D_F4, kvm_cpu_context, fp.d.f[4]);
+ OFFSET(KVM_ARCH_FP_D_F5, kvm_cpu_context, fp.d.f[5]);
+ OFFSET(KVM_ARCH_FP_D_F6, kvm_cpu_context, fp.d.f[6]);
+ OFFSET(KVM_ARCH_FP_D_F7, kvm_cpu_context, fp.d.f[7]);
+ OFFSET(KVM_ARCH_FP_D_F8, kvm_cpu_context, fp.d.f[8]);
+ OFFSET(KVM_ARCH_FP_D_F9, kvm_cpu_context, fp.d.f[9]);
+ OFFSET(KVM_ARCH_FP_D_F10, kvm_cpu_context, fp.d.f[10]);
+ OFFSET(KVM_ARCH_FP_D_F11, kvm_cpu_context, fp.d.f[11]);
+ OFFSET(KVM_ARCH_FP_D_F12, kvm_cpu_context, fp.d.f[12]);
+ OFFSET(KVM_ARCH_FP_D_F13, kvm_cpu_context, fp.d.f[13]);
+ OFFSET(KVM_ARCH_FP_D_F14, kvm_cpu_context, fp.d.f[14]);
+ OFFSET(KVM_ARCH_FP_D_F15, kvm_cpu_context, fp.d.f[15]);
+ OFFSET(KVM_ARCH_FP_D_F16, kvm_cpu_context, fp.d.f[16]);
+ OFFSET(KVM_ARCH_FP_D_F17, kvm_cpu_context, fp.d.f[17]);
+ OFFSET(KVM_ARCH_FP_D_F18, kvm_cpu_context, fp.d.f[18]);
+ OFFSET(KVM_ARCH_FP_D_F19, kvm_cpu_context, fp.d.f[19]);
+ OFFSET(KVM_ARCH_FP_D_F20, kvm_cpu_context, fp.d.f[20]);
+ OFFSET(KVM_ARCH_FP_D_F21, kvm_cpu_context, fp.d.f[21]);
+ OFFSET(KVM_ARCH_FP_D_F22, kvm_cpu_context, fp.d.f[22]);
+ OFFSET(KVM_ARCH_FP_D_F23, kvm_cpu_context, fp.d.f[23]);
+ OFFSET(KVM_ARCH_FP_D_F24, kvm_cpu_context, fp.d.f[24]);
+ OFFSET(KVM_ARCH_FP_D_F25, kvm_cpu_context, fp.d.f[25]);
+ OFFSET(KVM_ARCH_FP_D_F26, kvm_cpu_context, fp.d.f[26]);
+ OFFSET(KVM_ARCH_FP_D_F27, kvm_cpu_context, fp.d.f[27]);
+ OFFSET(KVM_ARCH_FP_D_F28, kvm_cpu_context, fp.d.f[28]);
+ OFFSET(KVM_ARCH_FP_D_F29, kvm_cpu_context, fp.d.f[29]);
+ OFFSET(KVM_ARCH_FP_D_F30, kvm_cpu_context, fp.d.f[30]);
+ OFFSET(KVM_ARCH_FP_D_F31, kvm_cpu_context, fp.d.f[31]);
+ OFFSET(KVM_ARCH_FP_D_FCSR, kvm_cpu_context, fp.d.fcsr);
+
/*
* THREAD_{F,X}* might be larger than a S-type offset can handle, but
* these are used in performance-sensitive assembly so we can't resort
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index ed1f06b17953..82719ada3baa 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -31,6 +31,72 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
};

+#ifdef CONFIG_FPU
+static void kvm_riscv_vcpu_fp_reset(struct kvm_vcpu *vcpu)
+{
+ unsigned long isa = vcpu->arch.isa;
+ struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
+
+ cntx->sstatus &= ~SR_FS;
+ if ((riscv_isa_extension_available(F) && (isa & RISCV_ISA_EXT_F)) ||
+ (riscv_isa_extension_available(D) && (isa & RISCV_ISA_EXT_D)))
+ cntx->sstatus |= SR_FS_INITIAL;
+ else
+ cntx->sstatus |= SR_FS_OFF;
+}
+
+static void kvm_riscv_vcpu_fp_clean(struct kvm_cpu_context *cntx)
+{
+ cntx->sstatus &= ~SR_FS;
+ cntx->sstatus |= SR_FS_CLEAN;
+}
+
+static void kvm_riscv_vcpu_guest_fp_save(struct kvm_cpu_context *cntx)
+{
+ if ((cntx->sstatus & SR_FS) == SR_FS_DIRTY) {
+ if (riscv_isa_extension_available(D))
+ __kvm_riscv_vcpu_fp_d_save(cntx);
+ else if (riscv_isa_extension_available(F))
+ __kvm_riscv_vcpu_fp_f_save(cntx);
+ kvm_riscv_vcpu_fp_clean(cntx);
+ }
+}
+
+static void kvm_riscv_vcpu_guest_fp_restore(struct kvm_cpu_context *cntx)
+{
+ if ((cntx->sstatus & SR_FS) != SR_FS_OFF) {
+ if (riscv_isa_extension_available(D))
+ __kvm_riscv_vcpu_fp_d_restore(cntx);
+ else if (riscv_isa_extension_available(F))
+ __kvm_riscv_vcpu_fp_f_restore(cntx);
+ kvm_riscv_vcpu_fp_clean(cntx);
+ }
+}
+
+static void kvm_riscv_vcpu_host_fp_save(struct kvm_cpu_context *cntx)
+{
+ /* No need to check host sstatus as it can be modified outside */
+ if (riscv_isa_extension_available(D))
+ __kvm_riscv_vcpu_fp_d_save(cntx);
+ else if (riscv_isa_extension_available(F))
+ __kvm_riscv_vcpu_fp_f_save(cntx);
+}
+
+static void kvm_riscv_vcpu_host_fp_restore(struct kvm_cpu_context *cntx)
+{
+ if (riscv_isa_extension_available(D))
+ __kvm_riscv_vcpu_fp_d_restore(cntx);
+ else if (riscv_isa_extension_available(F))
+ __kvm_riscv_vcpu_fp_f_restore(cntx);
+}
+#else
+static void kvm_riscv_vcpu_fp_reset(struct kvm_vcpu *vcpu) {}
+static void kvm_riscv_vcpu_guest_fp_save(struct kvm_cpu_context *cntx) {}
+static void kvm_riscv_vcpu_guest_fp_restore(struct kvm_cpu_context *cntx) {}
+static void kvm_riscv_vcpu_host_fp_save(struct kvm_cpu_context *cntx) {}
+static void kvm_riscv_vcpu_host_fp_restore(struct kvm_cpu_context *cntx) {}
+#endif
+
#define KVM_RISCV_ISA_ALLOWED (RISCV_ISA_EXT_A | \
RISCV_ISA_EXT_C | \
RISCV_ISA_EXT_D | \
@@ -52,6 +118,8 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)

memcpy(cntx, reset_cntx, sizeof(*cntx));

+ kvm_riscv_vcpu_fp_reset(vcpu);
+
kvm_riscv_vcpu_timer_reset(vcpu);

raw_spin_lock_irqsave(&vcpu->arch.irqs_lock, f);
@@ -247,6 +315,7 @@ static int kvm_riscv_vcpu_set_reg_config(struct kvm_vcpu *vcpu,
vcpu->arch.isa = reg_val;
vcpu->arch.isa &= riscv_isa;
vcpu->arch.isa &= KVM_RISCV_ISA_ALLOWED;
+ kvm_riscv_vcpu_fp_reset(vcpu);
} else {
return -ENOTSUPP;
}
@@ -566,6 +635,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
csr_write(CSR_VSIP, csr->vsip);
csr_write(CSR_VSATP, csr->vsatp);

+ kvm_riscv_vcpu_host_fp_save(&vcpu->arch.host_context);
+ kvm_riscv_vcpu_guest_fp_restore(&vcpu->arch.guest_context);
+
kvm_riscv_stage2_update_pgtbl(vcpu);

vcpu->cpu = cpu;
@@ -577,6 +649,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)

vcpu->cpu = -1;

+ kvm_riscv_vcpu_guest_fp_save(&vcpu->arch.guest_context);
+ kvm_riscv_vcpu_host_fp_restore(&vcpu->arch.host_context);
+
csr_write(CSR_HGATP, 0);
csr_write(CSR_HIDELEG, 0);
csr_write(CSR_HEDELEG, 0);
diff --git a/arch/riscv/kvm/vcpu_switch.S b/arch/riscv/kvm/vcpu_switch.S
index c5b85605bf73..4ad337ea34c2 100644
--- a/arch/riscv/kvm/vcpu_switch.S
+++ b/arch/riscv/kvm/vcpu_switch.S
@@ -191,3 +191,177 @@ __kvm_switch_return:
/* Return to C code */
ret
ENDPROC(__kvm_riscv_switch_to)
+
+#ifdef CONFIG_FPU
+ .align 3
+ .global __kvm_riscv_vcpu_fp_f_save
+__kvm_riscv_vcpu_fp_f_save:
+ csrr t2, CSR_SSTATUS
+ li t1, SR_FS
+ csrs CSR_SSTATUS, t1
+ frcsr t0
+ fsw f0, KVM_ARCH_FP_F_F0(a0)
+ fsw f1, KVM_ARCH_FP_F_F1(a0)
+ fsw f2, KVM_ARCH_FP_F_F2(a0)
+ fsw f3, KVM_ARCH_FP_F_F3(a0)
+ fsw f4, KVM_ARCH_FP_F_F4(a0)
+ fsw f5, KVM_ARCH_FP_F_F5(a0)
+ fsw f6, KVM_ARCH_FP_F_F6(a0)
+ fsw f7, KVM_ARCH_FP_F_F7(a0)
+ fsw f8, KVM_ARCH_FP_F_F8(a0)
+ fsw f9, KVM_ARCH_FP_F_F9(a0)
+ fsw f10, KVM_ARCH_FP_F_F10(a0)
+ fsw f11, KVM_ARCH_FP_F_F11(a0)
+ fsw f12, KVM_ARCH_FP_F_F12(a0)
+ fsw f13, KVM_ARCH_FP_F_F13(a0)
+ fsw f14, KVM_ARCH_FP_F_F14(a0)
+ fsw f15, KVM_ARCH_FP_F_F15(a0)
+ fsw f16, KVM_ARCH_FP_F_F16(a0)
+ fsw f17, KVM_ARCH_FP_F_F17(a0)
+ fsw f18, KVM_ARCH_FP_F_F18(a0)
+ fsw f19, KVM_ARCH_FP_F_F19(a0)
+ fsw f20, KVM_ARCH_FP_F_F20(a0)
+ fsw f21, KVM_ARCH_FP_F_F21(a0)
+ fsw f22, KVM_ARCH_FP_F_F22(a0)
+ fsw f23, KVM_ARCH_FP_F_F23(a0)
+ fsw f24, KVM_ARCH_FP_F_F24(a0)
+ fsw f25, KVM_ARCH_FP_F_F25(a0)
+ fsw f26, KVM_ARCH_FP_F_F26(a0)
+ fsw f27, KVM_ARCH_FP_F_F27(a0)
+ fsw f28, KVM_ARCH_FP_F_F28(a0)
+ fsw f29, KVM_ARCH_FP_F_F29(a0)
+ fsw f30, KVM_ARCH_FP_F_F30(a0)
+ fsw f31, KVM_ARCH_FP_F_F31(a0)
+ sw t0, KVM_ARCH_FP_F_FCSR(a0)
+ csrw CSR_SSTATUS, t2
+ ret
+
+ .align 3
+ .global __kvm_riscv_vcpu_fp_d_save
+__kvm_riscv_vcpu_fp_d_save:
+ csrr t2, CSR_SSTATUS
+ li t1, SR_FS
+ csrs CSR_SSTATUS, t1
+ frcsr t0
+ fsd f0, KVM_ARCH_FP_D_F0(a0)
+ fsd f1, KVM_ARCH_FP_D_F1(a0)
+ fsd f2, KVM_ARCH_FP_D_F2(a0)
+ fsd f3, KVM_ARCH_FP_D_F3(a0)
+ fsd f4, KVM_ARCH_FP_D_F4(a0)
+ fsd f5, KVM_ARCH_FP_D_F5(a0)
+ fsd f6, KVM_ARCH_FP_D_F6(a0)
+ fsd f7, KVM_ARCH_FP_D_F7(a0)
+ fsd f8, KVM_ARCH_FP_D_F8(a0)
+ fsd f9, KVM_ARCH_FP_D_F9(a0)
+ fsd f10, KVM_ARCH_FP_D_F10(a0)
+ fsd f11, KVM_ARCH_FP_D_F11(a0)
+ fsd f12, KVM_ARCH_FP_D_F12(a0)
+ fsd f13, KVM_ARCH_FP_D_F13(a0)
+ fsd f14, KVM_ARCH_FP_D_F14(a0)
+ fsd f15, KVM_ARCH_FP_D_F15(a0)
+ fsd f16, KVM_ARCH_FP_D_F16(a0)
+ fsd f17, KVM_ARCH_FP_D_F17(a0)
+ fsd f18, KVM_ARCH_FP_D_F18(a0)
+ fsd f19, KVM_ARCH_FP_D_F19(a0)
+ fsd f20, KVM_ARCH_FP_D_F20(a0)
+ fsd f21, KVM_ARCH_FP_D_F21(a0)
+ fsd f22, KVM_ARCH_FP_D_F22(a0)
+ fsd f23, KVM_ARCH_FP_D_F23(a0)
+ fsd f24, KVM_ARCH_FP_D_F24(a0)
+ fsd f25, KVM_ARCH_FP_D_F25(a0)
+ fsd f26, KVM_ARCH_FP_D_F26(a0)
+ fsd f27, KVM_ARCH_FP_D_F27(a0)
+ fsd f28, KVM_ARCH_FP_D_F28(a0)
+ fsd f29, KVM_ARCH_FP_D_F29(a0)
+ fsd f30, KVM_ARCH_FP_D_F30(a0)
+ fsd f31, KVM_ARCH_FP_D_F31(a0)
+ sw t0, KVM_ARCH_FP_D_FCSR(a0)
+ csrw CSR_SSTATUS, t2
+ ret
+
+ .align 3
+ .global __kvm_riscv_vcpu_fp_f_restore
+__kvm_riscv_vcpu_fp_f_restore:
+ csrr t2, CSR_SSTATUS
+ li t1, SR_FS
+ lw t0, KVM_ARCH_FP_F_FCSR(a0)
+ csrs CSR_SSTATUS, t1
+ flw f0, KVM_ARCH_FP_F_F0(a0)
+ flw f1, KVM_ARCH_FP_F_F1(a0)
+ flw f2, KVM_ARCH_FP_F_F2(a0)
+ flw f3, KVM_ARCH_FP_F_F3(a0)
+ flw f4, KVM_ARCH_FP_F_F4(a0)
+ flw f5, KVM_ARCH_FP_F_F5(a0)
+ flw f6, KVM_ARCH_FP_F_F6(a0)
+ flw f7, KVM_ARCH_FP_F_F7(a0)
+ flw f8, KVM_ARCH_FP_F_F8(a0)
+ flw f9, KVM_ARCH_FP_F_F9(a0)
+ flw f10, KVM_ARCH_FP_F_F10(a0)
+ flw f11, KVM_ARCH_FP_F_F11(a0)
+ flw f12, KVM_ARCH_FP_F_F12(a0)
+ flw f13, KVM_ARCH_FP_F_F13(a0)
+ flw f14, KVM_ARCH_FP_F_F14(a0)
+ flw f15, KVM_ARCH_FP_F_F15(a0)
+ flw f16, KVM_ARCH_FP_F_F16(a0)
+ flw f17, KVM_ARCH_FP_F_F17(a0)
+ flw f18, KVM_ARCH_FP_F_F18(a0)
+ flw f19, KVM_ARCH_FP_F_F19(a0)
+ flw f20, KVM_ARCH_FP_F_F20(a0)
+ flw f21, KVM_ARCH_FP_F_F21(a0)
+ flw f22, KVM_ARCH_FP_F_F22(a0)
+ flw f23, KVM_ARCH_FP_F_F23(a0)
+ flw f24, KVM_ARCH_FP_F_F24(a0)
+ flw f25, KVM_ARCH_FP_F_F25(a0)
+ flw f26, KVM_ARCH_FP_F_F26(a0)
+ flw f27, KVM_ARCH_FP_F_F27(a0)
+ flw f28, KVM_ARCH_FP_F_F28(a0)
+ flw f29, KVM_ARCH_FP_F_F29(a0)
+ flw f30, KVM_ARCH_FP_F_F30(a0)
+ flw f31, KVM_ARCH_FP_F_F31(a0)
+ fscsr t0
+ csrw CSR_SSTATUS, t2
+ ret
+
+ .align 3
+ .global __kvm_riscv_vcpu_fp_d_restore
+__kvm_riscv_vcpu_fp_d_restore:
+ csrr t2, CSR_SSTATUS
+ li t1, SR_FS
+ lw t0, KVM_ARCH_FP_D_FCSR(a0)
+ csrs CSR_SSTATUS, t1
+ fld f0, KVM_ARCH_FP_D_F0(a0)
+ fld f1, KVM_ARCH_FP_D_F1(a0)
+ fld f2, KVM_ARCH_FP_D_F2(a0)
+ fld f3, KVM_ARCH_FP_D_F3(a0)
+ fld f4, KVM_ARCH_FP_D_F4(a0)
+ fld f5, KVM_ARCH_FP_D_F5(a0)
+ fld f6, KVM_ARCH_FP_D_F6(a0)
+ fld f7, KVM_ARCH_FP_D_F7(a0)
+ fld f8, KVM_ARCH_FP_D_F8(a0)
+ fld f9, KVM_ARCH_FP_D_F9(a0)
+ fld f10, KVM_ARCH_FP_D_F10(a0)
+ fld f11, KVM_ARCH_FP_D_F11(a0)
+ fld f12, KVM_ARCH_FP_D_F12(a0)
+ fld f13, KVM_ARCH_FP_D_F13(a0)
+ fld f14, KVM_ARCH_FP_D_F14(a0)
+ fld f15, KVM_ARCH_FP_D_F15(a0)
+ fld f16, KVM_ARCH_FP_D_F16(a0)
+ fld f17, KVM_ARCH_FP_D_F17(a0)
+ fld f18, KVM_ARCH_FP_D_F18(a0)
+ fld f19, KVM_ARCH_FP_D_F19(a0)
+ fld f20, KVM_ARCH_FP_D_F20(a0)
+ fld f21, KVM_ARCH_FP_D_F21(a0)
+ fld f22, KVM_ARCH_FP_D_F22(a0)
+ fld f23, KVM_ARCH_FP_D_F23(a0)
+ fld f24, KVM_ARCH_FP_D_F24(a0)
+ fld f25, KVM_ARCH_FP_D_F25(a0)
+ fld f26, KVM_ARCH_FP_D_F26(a0)
+ fld f27, KVM_ARCH_FP_D_F27(a0)
+ fld f28, KVM_ARCH_FP_D_F28(a0)
+ fld f29, KVM_ARCH_FP_D_F29(a0)
+ fld f30, KVM_ARCH_FP_D_F30(a0)
+ fld f31, KVM_ARCH_FP_D_F31(a0)
+ fscsr t0
+ csrw CSR_SSTATUS, t2
+ ret
+#endif
--
2.17.1

2019-07-29 12:00:50

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 16/16] RISC-V: Enable VIRTIO drivers in RV64 and RV32 defconfig

This patch enables more VIRTIO drivers (such as console, rpmsg, 9p,
rng, etc.) which are usable on KVM RISC-V Guest and Xvisor RISC-V
Guest.

Signed-off-by: Anup Patel <[email protected]>
---
arch/riscv/configs/defconfig | 23 ++++++++++++++++++-----
arch/riscv/configs/rv32_defconfig | 13 +++++++++++++
2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
index b7b749b18853..420a0dbef386 100644
--- a/arch/riscv/configs/defconfig
+++ b/arch/riscv/configs/defconfig
@@ -29,15 +29,19 @@ CONFIG_IP_PNP_DHCP=y
CONFIG_IP_PNP_BOOTP=y
CONFIG_IP_PNP_RARP=y
CONFIG_NETLINK_DIAG=y
+CONFIG_NET_9P=y
+CONFIG_NET_9P_VIRTIO=y
CONFIG_PCI=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCI_HOST_GENERIC=y
CONFIG_PCIE_XILINX=y
CONFIG_DEVTMPFS=y
+CONFIG_DEVTMPFS_MOUNT=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_VIRTIO_BLK=y
CONFIG_BLK_DEV_SD=y
CONFIG_BLK_DEV_SR=y
+CONFIG_SCSI_VIRTIO=y
CONFIG_ATA=y
CONFIG_SATA_AHCI=y
CONFIG_SATA_AHCI_PLATFORM=y
@@ -53,9 +57,15 @@ CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_OF_PLATFORM=y
CONFIG_SERIAL_EARLYCON_RISCV_SBI=y
CONFIG_HVC_RISCV_SBI=y
+CONFIG_VIRTIO_CONSOLE=y
+CONFIG_HW_RANDOM=y
+CONFIG_HW_RANDOM_VIRTIO=y
+CONFIG_SPI=y
+CONFIG_SPI_SIFIVE=y
# CONFIG_PTP_1588_CLOCK is not set
CONFIG_DRM=y
CONFIG_DRM_RADEON=y
+CONFIG_DRM_VIRTIO_GPU=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_USB=y
CONFIG_USB_XHCI_HCD=y
@@ -66,8 +76,14 @@ CONFIG_USB_OHCI_HCD=y
CONFIG_USB_OHCI_HCD_PLATFORM=y
CONFIG_USB_STORAGE=y
CONFIG_USB_UAS=y
+CONFIG_MMC=y
+CONFIG_MMC_SPI=y
+CONFIG_VIRTIO_PCI=y
+CONFIG_VIRTIO_BALLOON=y
+CONFIG_VIRTIO_INPUT=y
CONFIG_VIRTIO_MMIO=y
-CONFIG_SPI_SIFIVE=y
+CONFIG_RPMSG_CHAR=y
+CONFIG_RPMSG_VIRTIO=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_AUTOFS4_FS=y
@@ -80,11 +96,8 @@ CONFIG_NFS_V4=y
CONFIG_NFS_V4_1=y
CONFIG_NFS_V4_2=y
CONFIG_ROOT_NFS=y
+CONFIG_9P_FS=y
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRYPTO_DEV_VIRTIO=y
CONFIG_PRINTK_TIME=y
-CONFIG_SPI=y
-CONFIG_MMC_SPI=y
-CONFIG_MMC=y
-CONFIG_DEVTMPFS_MOUNT=y
# CONFIG_RCU_TRACE is not set
diff --git a/arch/riscv/configs/rv32_defconfig b/arch/riscv/configs/rv32_defconfig
index d5449ef805a3..b28267404d55 100644
--- a/arch/riscv/configs/rv32_defconfig
+++ b/arch/riscv/configs/rv32_defconfig
@@ -29,6 +29,8 @@ CONFIG_IP_PNP_DHCP=y
CONFIG_IP_PNP_BOOTP=y
CONFIG_IP_PNP_RARP=y
CONFIG_NETLINK_DIAG=y
+CONFIG_NET_9P=y
+CONFIG_NET_9P_VIRTIO=y
CONFIG_PCI=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCI_HOST_GENERIC=y
@@ -38,6 +40,7 @@ CONFIG_BLK_DEV_LOOP=y
CONFIG_VIRTIO_BLK=y
CONFIG_BLK_DEV_SD=y
CONFIG_BLK_DEV_SR=y
+CONFIG_SCSI_VIRTIO=y
CONFIG_ATA=y
CONFIG_SATA_AHCI=y
CONFIG_SATA_AHCI_PLATFORM=y
@@ -53,9 +56,13 @@ CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_OF_PLATFORM=y
CONFIG_SERIAL_EARLYCON_RISCV_SBI=y
CONFIG_HVC_RISCV_SBI=y
+CONFIG_VIRTIO_CONSOLE=y
+CONFIG_HW_RANDOM=y
+CONFIG_HW_RANDOM_VIRTIO=y
# CONFIG_PTP_1588_CLOCK is not set
CONFIG_DRM=y
CONFIG_DRM_RADEON=y
+CONFIG_DRM_VIRTIO_GPU=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_USB=y
CONFIG_USB_XHCI_HCD=y
@@ -66,7 +73,12 @@ CONFIG_USB_OHCI_HCD=y
CONFIG_USB_OHCI_HCD_PLATFORM=y
CONFIG_USB_STORAGE=y
CONFIG_USB_UAS=y
+CONFIG_VIRTIO_PCI=y
+CONFIG_VIRTIO_BALLOON=y
+CONFIG_VIRTIO_INPUT=y
CONFIG_VIRTIO_MMIO=y
+CONFIG_RPMSG_CHAR=y
+CONFIG_RPMSG_VIRTIO=y
CONFIG_SIFIVE_PLIC=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
@@ -80,6 +92,7 @@ CONFIG_NFS_V4=y
CONFIG_NFS_V4_1=y
CONFIG_NFS_V4_2=y
CONFIG_ROOT_NFS=y
+CONFIG_9P_FS=y
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRYPTO_DEV_VIRTIO=y
CONFIG_PRINTK_TIME=y
--
2.17.1

2019-07-29 13:29:23

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 01/16] KVM: RISC-V: Add KVM_REG_RISCV for ONE_REG interface

We will be using ONE_REG interface accessing VCPU registers from
user-space hence we add KVM_REG_RISCV for RISC-V VCPU registers.

Signed-off-by: Anup Patel <[email protected]>
---
include/uapi/linux/kvm.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a7c19540ce21..1b918ed94399 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1142,6 +1142,7 @@ struct kvm_dirty_tlb {
#define KVM_REG_S390 0x5000000000000000ULL
#define KVM_REG_ARM64 0x6000000000000000ULL
#define KVM_REG_MIPS 0x7000000000000000ULL
+#define KVM_REG_RISCV 0x8000000000000000ULL

#define KVM_REG_SIZE_SHIFT 52
#define KVM_REG_SIZE_MASK 0x00f0000000000000ULL
--
2.17.1

2019-07-29 13:29:41

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 04/16] RISC-V: KVM: Implement VCPU create, init and destroy functions

This patch implements VCPU create, init and destroy functions
required by generic KVM module. We don't have much dynamic
resources in struct kvm_vcpu_arch so thest functions are quite
simple for KVM RISC-V.

Signed-off-by: Anup Patel <[email protected]>
---
arch/riscv/include/asm/kvm_host.h | 70 ++++++++++++++++++++++++++
arch/riscv/kvm/vcpu.c | 83 +++++++++++++++++++++++++++++--
2 files changed, 149 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 81acfb307d5c..244eabe62710 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -54,7 +54,77 @@ struct kvm_arch {
phys_addr_t pgd_phys;
};

+struct kvm_cpu_context {
+ unsigned long zero;
+ unsigned long ra;
+ unsigned long sp;
+ unsigned long gp;
+ unsigned long tp;
+ unsigned long t0;
+ unsigned long t1;
+ unsigned long t2;
+ unsigned long s0;
+ unsigned long s1;
+ unsigned long a0;
+ unsigned long a1;
+ unsigned long a2;
+ unsigned long a3;
+ unsigned long a4;
+ unsigned long a5;
+ unsigned long a6;
+ unsigned long a7;
+ unsigned long s2;
+ unsigned long s3;
+ unsigned long s4;
+ unsigned long s5;
+ unsigned long s6;
+ unsigned long s7;
+ unsigned long s8;
+ unsigned long s9;
+ unsigned long s10;
+ unsigned long s11;
+ unsigned long t3;
+ unsigned long t4;
+ unsigned long t5;
+ unsigned long t6;
+ unsigned long sepc;
+ unsigned long sstatus;
+ unsigned long hstatus;
+};
+
+struct kvm_vcpu_csr {
+ unsigned long hedeleg;
+ unsigned long hideleg;
+ unsigned long vsstatus;
+ unsigned long vsie;
+ unsigned long vstvec;
+ unsigned long vsscratch;
+ unsigned long vsepc;
+ unsigned long vscause;
+ unsigned long vstval;
+ unsigned long vsip;
+ unsigned long vsatp;
+};
+
struct kvm_vcpu_arch {
+ /* VCPU ran atleast once */
+ bool ran_atleast_once;
+
+ /* ISA feature bits (similar to MISA) */
+ unsigned long isa;
+
+ /* CPU context of Guest VCPU */
+ struct kvm_cpu_context guest_context;
+
+ /* CPU CSR context of Guest VCPU */
+ struct kvm_vcpu_csr guest_csr;
+
+ /* CPU context upon Guest VCPU reset */
+ struct kvm_cpu_context guest_reset_context;
+
+ /* CPU CSR context upon Guest VCPU reset */
+ struct kvm_vcpu_csr guest_reset_csr;
+
/* Don't run the VCPU (blocked) */
bool pause;
};
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 9fea9128d964..1ae806f28c0e 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -31,10 +31,48 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
};

+#define KVM_RISCV_ISA_ALLOWED (RISCV_ISA_EXT_A | \
+ RISCV_ISA_EXT_C | \
+ RISCV_ISA_EXT_D | \
+ RISCV_ISA_EXT_F | \
+ RISCV_ISA_EXT_I | \
+ RISCV_ISA_EXT_M | \
+ RISCV_ISA_EXT_S | \
+ RISCV_ISA_EXT_U)
+
+static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+ struct kvm_vcpu_csr *reset_csr = &vcpu->arch.guest_reset_csr;
+ struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
+ struct kvm_cpu_context *reset_cntx = &vcpu->arch.guest_reset_context;
+
+ memcpy(csr, reset_csr, sizeof(*csr));
+
+ memcpy(cntx, reset_cntx, sizeof(*cntx));
+}
+
struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
{
- /* TODO: */
- return NULL;
+ int err;
+ struct kvm_vcpu *vcpu;
+
+ vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
+ if (!vcpu) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ err = kvm_vcpu_init(vcpu, kvm, id);
+ if (err)
+ goto free_vcpu;
+
+ return vcpu;
+
+free_vcpu:
+ kmem_cache_free(kvm_vcpu_cache, vcpu);
+out:
+ return ERR_PTR(err);
}

int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
@@ -48,13 +86,47 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)

int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
{
- /* TODO: */
+ struct kvm_cpu_context *cntx;
+ struct kvm_vcpu_csr *csr;
+
+ /* Mark this VCPU never ran */
+ vcpu->arch.ran_atleast_once = false;
+
+ /* Setup ISA features available to VCPU */
+ vcpu->arch.isa = riscv_isa & KVM_RISCV_ISA_ALLOWED;
+
+ /* Setup reset state of shadow SSTATUS and HSTATUS CSRs */
+ cntx = &vcpu->arch.guest_reset_context;
+ cntx->sstatus = SR_SPP | SR_SPIE;
+ cntx->hstatus = 0;
+ cntx->hstatus |= HSTATUS_SP2V;
+ cntx->hstatus |= HSTATUS_SP2P;
+ cntx->hstatus |= HSTATUS_SPV;
+
+ /* Setup reset state of HEDELEG and HIDELEG CSRs */
+ csr = &vcpu->arch.guest_reset_csr;
+ csr->hedeleg = 0;
+ csr->hedeleg |= (1UL << EXC_INST_MISALIGNED);
+ csr->hedeleg |= (1UL << EXC_BREAKPOINT);
+ csr->hedeleg |= (1UL << EXC_SYSCALL);
+ csr->hedeleg |= (1UL << EXC_INST_PAGE_FAULT);
+ csr->hedeleg |= (1UL << EXC_LOAD_PAGE_FAULT);
+ csr->hedeleg |= (1UL << EXC_STORE_PAGE_FAULT);
+ csr->hideleg = 0;
+ csr->hideleg |= SIE_SSIE;
+ csr->hideleg |= SIE_STIE;
+ csr->hideleg |= SIE_SEIE;
+
+ /* Reset VCPU */
+ kvm_riscv_reset_vcpu(vcpu);
+
return 0;
}

void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
- /* TODO: */
+ kvm_riscv_stage2_flush_cache(vcpu);
+ kmem_cache_free(kvm_vcpu_cache, vcpu);
}

int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
@@ -207,6 +279,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
int ret;
unsigned long scause, stval;

+ /* Mark this VCPU ran atleast once */
+ vcpu->arch.ran_atleast_once = true;
+
/* Process MMIO value returned from user-space */
if (run->exit_reason == KVM_EXIT_MMIO) {
ret = kvm_riscv_vcpu_mmio_return(vcpu, vcpu->run);
--
2.17.1

2019-07-29 14:44:10

by Andreas Schwab

[permalink] [raw]
Subject: Re: [RFC PATCH 13/16] RISC-V: KVM: Add timer functionality

On Jul 29 2019, Anup Patel <[email protected]> wrote:

> From: Atish Patra <[email protected]>
>
> The RISC-V hypervisor specification doesn't have any virtual timer
> feature.
>
> Due to this, the guest VCPU timer will be programmed via SBI calls.
> The host will use a separate hrtimer event for each guest VCPU to
> provide timer functionality. We inject a virtual timer interrupt to
> the guest VCPU whenever the guest VCPU hrtimer event expires.
>
> The following features are not supported yet and will be added in
> future:
> 1. A time offset to adjust guest time from host time
> 2. A saved next event in guest vcpu for vm migration

I'm getting this error:

In file included from <command-line>:
./include/clocksource/timer-riscv.h:12:30: error: unknown type name ‘u32’
12 | void riscv_cs_get_mult_shift(u32 *mult, u32 *shift);
| ^~~
./include/clocksource/timer-riscv.h:12:41: error: unknown type name ‘u32’
12 | void riscv_cs_get_mult_shift(u32 *mult, u32 *shift);
| ^~~
make[1]: *** [scripts/Makefile.build:301: include/clocksource/timer-riscv.h.s] Error 1

Andreas.

--
Andreas Schwab, SUSE Labs, [email protected]
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

2019-07-29 15:42:04

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 07/16] RISC-V: KVM: Implement VCPU world-switch

This patch implements the VCPU world-switch for KVM RISC-V.

The KVM RISC-V world-switch (i.e. __kvm_riscv_switch_to()) mostly
switches general purpose registers, SSTATUS, STVEC, SSCRATCH and
HSTATUS CSRs. Other CSRs are switched via vcpu_load() and vcpu_put()
interface in kvm_arch_vcpu_load() and kvm_arch_vcpu_put() functions
respectively.

Signed-off-by: Anup Patel <[email protected]>
---
arch/riscv/include/asm/kvm_host.h | 9 +-
arch/riscv/kernel/asm-offsets.c | 76 ++++++++++++
arch/riscv/kvm/Makefile | 2 +-
arch/riscv/kvm/vcpu.c | 33 ++++-
arch/riscv/kvm/vcpu_switch.S | 193 ++++++++++++++++++++++++++++++
5 files changed, 309 insertions(+), 4 deletions(-)
create mode 100644 arch/riscv/kvm/vcpu_switch.S

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index aa89f1922da1..006785bd6474 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -113,6 +113,13 @@ struct kvm_vcpu_arch {
/* ISA feature bits (similar to MISA) */
unsigned long isa;

+ /* SSCRATCH and STVEC of Host */
+ unsigned long host_sscratch;
+ unsigned long host_stvec;
+
+ /* CPU context of Host */
+ struct kvm_cpu_context host_context;
+
/* CPU context of Guest VCPU */
struct kvm_cpu_context guest_context;

@@ -151,7 +158,7 @@ int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
unsigned long scause, unsigned long stval);

-static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {}
+void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch);

int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 9f5628c38ac9..711656710190 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -7,7 +7,9 @@
#define GENERATING_ASM_OFFSETS

#include <linux/kbuild.h>
+#include <linux/mm.h>
#include <linux/sched.h>
+#include <asm/kvm_host.h>
#include <asm/thread_info.h>
#include <asm/ptrace.h>

@@ -109,6 +111,80 @@ void asm_offsets(void)
OFFSET(PT_SBADADDR, pt_regs, sbadaddr);
OFFSET(PT_SCAUSE, pt_regs, scause);

+ OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
+ OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
+ OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
+ OFFSET(KVM_ARCH_GUEST_GP, kvm_vcpu_arch, guest_context.gp);
+ OFFSET(KVM_ARCH_GUEST_TP, kvm_vcpu_arch, guest_context.tp);
+ OFFSET(KVM_ARCH_GUEST_T0, kvm_vcpu_arch, guest_context.t0);
+ OFFSET(KVM_ARCH_GUEST_T1, kvm_vcpu_arch, guest_context.t1);
+ OFFSET(KVM_ARCH_GUEST_T2, kvm_vcpu_arch, guest_context.t2);
+ OFFSET(KVM_ARCH_GUEST_S0, kvm_vcpu_arch, guest_context.s0);
+ OFFSET(KVM_ARCH_GUEST_S1, kvm_vcpu_arch, guest_context.s1);
+ OFFSET(KVM_ARCH_GUEST_A0, kvm_vcpu_arch, guest_context.a0);
+ OFFSET(KVM_ARCH_GUEST_A1, kvm_vcpu_arch, guest_context.a1);
+ OFFSET(KVM_ARCH_GUEST_A2, kvm_vcpu_arch, guest_context.a2);
+ OFFSET(KVM_ARCH_GUEST_A3, kvm_vcpu_arch, guest_context.a3);
+ OFFSET(KVM_ARCH_GUEST_A4, kvm_vcpu_arch, guest_context.a4);
+ OFFSET(KVM_ARCH_GUEST_A5, kvm_vcpu_arch, guest_context.a5);
+ OFFSET(KVM_ARCH_GUEST_A6, kvm_vcpu_arch, guest_context.a6);
+ OFFSET(KVM_ARCH_GUEST_A7, kvm_vcpu_arch, guest_context.a7);
+ OFFSET(KVM_ARCH_GUEST_S2, kvm_vcpu_arch, guest_context.s2);
+ OFFSET(KVM_ARCH_GUEST_S3, kvm_vcpu_arch, guest_context.s3);
+ OFFSET(KVM_ARCH_GUEST_S4, kvm_vcpu_arch, guest_context.s4);
+ OFFSET(KVM_ARCH_GUEST_S5, kvm_vcpu_arch, guest_context.s5);
+ OFFSET(KVM_ARCH_GUEST_S6, kvm_vcpu_arch, guest_context.s6);
+ OFFSET(KVM_ARCH_GUEST_S7, kvm_vcpu_arch, guest_context.s7);
+ OFFSET(KVM_ARCH_GUEST_S8, kvm_vcpu_arch, guest_context.s8);
+ OFFSET(KVM_ARCH_GUEST_S9, kvm_vcpu_arch, guest_context.s9);
+ OFFSET(KVM_ARCH_GUEST_S10, kvm_vcpu_arch, guest_context.s10);
+ OFFSET(KVM_ARCH_GUEST_S11, kvm_vcpu_arch, guest_context.s11);
+ OFFSET(KVM_ARCH_GUEST_T3, kvm_vcpu_arch, guest_context.t3);
+ OFFSET(KVM_ARCH_GUEST_T4, kvm_vcpu_arch, guest_context.t4);
+ OFFSET(KVM_ARCH_GUEST_T5, kvm_vcpu_arch, guest_context.t5);
+ OFFSET(KVM_ARCH_GUEST_T6, kvm_vcpu_arch, guest_context.t6);
+ OFFSET(KVM_ARCH_GUEST_SEPC, kvm_vcpu_arch, guest_context.sepc);
+ OFFSET(KVM_ARCH_GUEST_SSTATUS, kvm_vcpu_arch, guest_context.sstatus);
+ OFFSET(KVM_ARCH_GUEST_HSTATUS, kvm_vcpu_arch, guest_context.hstatus);
+
+ OFFSET(KVM_ARCH_HOST_ZERO, kvm_vcpu_arch, host_context.zero);
+ OFFSET(KVM_ARCH_HOST_RA, kvm_vcpu_arch, host_context.ra);
+ OFFSET(KVM_ARCH_HOST_SP, kvm_vcpu_arch, host_context.sp);
+ OFFSET(KVM_ARCH_HOST_GP, kvm_vcpu_arch, host_context.gp);
+ OFFSET(KVM_ARCH_HOST_TP, kvm_vcpu_arch, host_context.tp);
+ OFFSET(KVM_ARCH_HOST_T0, kvm_vcpu_arch, host_context.t0);
+ OFFSET(KVM_ARCH_HOST_T1, kvm_vcpu_arch, host_context.t1);
+ OFFSET(KVM_ARCH_HOST_T2, kvm_vcpu_arch, host_context.t2);
+ OFFSET(KVM_ARCH_HOST_S0, kvm_vcpu_arch, host_context.s0);
+ OFFSET(KVM_ARCH_HOST_S1, kvm_vcpu_arch, host_context.s1);
+ OFFSET(KVM_ARCH_HOST_A0, kvm_vcpu_arch, host_context.a0);
+ OFFSET(KVM_ARCH_HOST_A1, kvm_vcpu_arch, host_context.a1);
+ OFFSET(KVM_ARCH_HOST_A2, kvm_vcpu_arch, host_context.a2);
+ OFFSET(KVM_ARCH_HOST_A3, kvm_vcpu_arch, host_context.a3);
+ OFFSET(KVM_ARCH_HOST_A4, kvm_vcpu_arch, host_context.a4);
+ OFFSET(KVM_ARCH_HOST_A5, kvm_vcpu_arch, host_context.a5);
+ OFFSET(KVM_ARCH_HOST_A6, kvm_vcpu_arch, host_context.a6);
+ OFFSET(KVM_ARCH_HOST_A7, kvm_vcpu_arch, host_context.a7);
+ OFFSET(KVM_ARCH_HOST_S2, kvm_vcpu_arch, host_context.s2);
+ OFFSET(KVM_ARCH_HOST_S3, kvm_vcpu_arch, host_context.s3);
+ OFFSET(KVM_ARCH_HOST_S4, kvm_vcpu_arch, host_context.s4);
+ OFFSET(KVM_ARCH_HOST_S5, kvm_vcpu_arch, host_context.s5);
+ OFFSET(KVM_ARCH_HOST_S6, kvm_vcpu_arch, host_context.s6);
+ OFFSET(KVM_ARCH_HOST_S7, kvm_vcpu_arch, host_context.s7);
+ OFFSET(KVM_ARCH_HOST_S8, kvm_vcpu_arch, host_context.s8);
+ OFFSET(KVM_ARCH_HOST_S9, kvm_vcpu_arch, host_context.s9);
+ OFFSET(KVM_ARCH_HOST_S10, kvm_vcpu_arch, host_context.s10);
+ OFFSET(KVM_ARCH_HOST_S11, kvm_vcpu_arch, host_context.s11);
+ OFFSET(KVM_ARCH_HOST_T3, kvm_vcpu_arch, host_context.t3);
+ OFFSET(KVM_ARCH_HOST_T4, kvm_vcpu_arch, host_context.t4);
+ OFFSET(KVM_ARCH_HOST_T5, kvm_vcpu_arch, host_context.t5);
+ OFFSET(KVM_ARCH_HOST_T6, kvm_vcpu_arch, host_context.t6);
+ OFFSET(KVM_ARCH_HOST_SEPC, kvm_vcpu_arch, host_context.sepc);
+ OFFSET(KVM_ARCH_HOST_SSTATUS, kvm_vcpu_arch, host_context.sstatus);
+ OFFSET(KVM_ARCH_HOST_HSTATUS, kvm_vcpu_arch, host_context.hstatus);
+ OFFSET(KVM_ARCH_HOST_SSCRATCH, kvm_vcpu_arch, host_sscratch);
+ OFFSET(KVM_ARCH_HOST_STVEC, kvm_vcpu_arch, host_stvec);
+
/*
* THREAD_{F,X}* might be larger than a S-type offset can handle, but
* these are used in performance-sensitive assembly so we can't resort
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index 37b5a59d4f4f..845579273727 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -8,6 +8,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm

kvm-objs := $(common-objs-y)

-kvm-objs += main.o vm.o mmu.o vcpu.o vcpu_exit.o
+kvm-objs += main.o vm.o mmu.o vcpu.o vcpu_exit.o vcpu_switch.o

obj-$(CONFIG_KVM) += kvm.o
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 37368eeb6c41..4ab9f803536e 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -546,14 +546,43 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,

void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
- /* TODO: */
+ struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+
+ csr_write(CSR_HIDELEG, csr->hideleg);
+ csr_write(CSR_HEDELEG, csr->hedeleg);
+ csr_write(CSR_VSSTATUS, csr->vsstatus);
+ csr_write(CSR_VSIE, csr->vsie);
+ csr_write(CSR_VSTVEC, csr->vstvec);
+ csr_write(CSR_VSSCRATCH, csr->vsscratch);
+ csr_write(CSR_VSEPC, csr->vsepc);
+ csr_write(CSR_VSCAUSE, csr->vscause);
+ csr_write(CSR_VSTVAL, csr->vstval);
+ csr_write(CSR_VSIP, csr->vsip);
+ csr_write(CSR_VSATP, csr->vsatp);

kvm_riscv_stage2_update_pgtbl(vcpu);
+
+ vcpu->cpu = cpu;
}

void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
{
- /* TODO: */
+ struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+
+ vcpu->cpu = -1;
+
+ csr_write(CSR_HGATP, 0);
+ csr_write(CSR_HIDELEG, 0);
+ csr_write(CSR_HEDELEG, 0);
+ csr->vsstatus = csr_read(CSR_VSSTATUS);
+ csr->vsie = csr_read(CSR_VSIE);
+ csr->vstvec = csr_read(CSR_VSTVEC);
+ csr->vsscratch = csr_read(CSR_VSSCRATCH);
+ csr->vsepc = csr_read(CSR_VSEPC);
+ csr->vscause = csr_read(CSR_VSCAUSE);
+ csr->vstval = csr_read(CSR_VSTVAL);
+ csr->vsip = csr_read(CSR_VSIP);
+ csr->vsatp = csr_read(CSR_VSATP);
}

static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
diff --git a/arch/riscv/kvm/vcpu_switch.S b/arch/riscv/kvm/vcpu_switch.S
new file mode 100644
index 000000000000..c5b85605bf73
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_switch.S
@@ -0,0 +1,193 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Anup Patel <[email protected]>
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm/asm-offsets.h>
+#include <asm/csr.h>
+
+ .text
+ .altmacro
+
+ENTRY(__kvm_riscv_switch_to)
+ /* Save Host GPRs (except A0 and T0-T6) */
+ REG_S ra, (KVM_ARCH_HOST_RA)(a0)
+ REG_S sp, (KVM_ARCH_HOST_SP)(a0)
+ REG_S gp, (KVM_ARCH_HOST_GP)(a0)
+ REG_S tp, (KVM_ARCH_HOST_TP)(a0)
+ REG_S s0, (KVM_ARCH_HOST_S0)(a0)
+ REG_S s1, (KVM_ARCH_HOST_S1)(a0)
+ REG_S a1, (KVM_ARCH_HOST_A1)(a0)
+ REG_S a2, (KVM_ARCH_HOST_A2)(a0)
+ REG_S a3, (KVM_ARCH_HOST_A3)(a0)
+ REG_S a4, (KVM_ARCH_HOST_A4)(a0)
+ REG_S a5, (KVM_ARCH_HOST_A5)(a0)
+ REG_S a6, (KVM_ARCH_HOST_A6)(a0)
+ REG_S a7, (KVM_ARCH_HOST_A7)(a0)
+ REG_S s2, (KVM_ARCH_HOST_S2)(a0)
+ REG_S s3, (KVM_ARCH_HOST_S3)(a0)
+ REG_S s4, (KVM_ARCH_HOST_S4)(a0)
+ REG_S s5, (KVM_ARCH_HOST_S5)(a0)
+ REG_S s6, (KVM_ARCH_HOST_S6)(a0)
+ REG_S s7, (KVM_ARCH_HOST_S7)(a0)
+ REG_S s8, (KVM_ARCH_HOST_S8)(a0)
+ REG_S s9, (KVM_ARCH_HOST_S9)(a0)
+ REG_S s10, (KVM_ARCH_HOST_S10)(a0)
+ REG_S s11, (KVM_ARCH_HOST_S11)(a0)
+
+ /* Save Host SSTATUS, HSTATUS, SCRATCH and STVEC */
+ csrr t0, CSR_SSTATUS
+ REG_S t0, (KVM_ARCH_HOST_SSTATUS)(a0)
+ csrr t1, CSR_HSTATUS
+ REG_S t1, (KVM_ARCH_HOST_HSTATUS)(a0)
+ csrr t2, CSR_SSCRATCH
+ REG_S t2, (KVM_ARCH_HOST_SSCRATCH)(a0)
+ csrr t3, CSR_STVEC
+ REG_S t3, (KVM_ARCH_HOST_STVEC)(a0)
+
+ /* Change Host exception vector to return path */
+ la t4, __kvm_switch_return
+ csrw CSR_STVEC, t4
+
+ /* Restore Guest HSTATUS, SSTATUS and SEPC */
+ REG_L t4, (KVM_ARCH_GUEST_SEPC)(a0)
+ csrw CSR_SEPC, t4
+ REG_L t5, (KVM_ARCH_GUEST_SSTATUS)(a0)
+ csrw CSR_SSTATUS, t5
+ REG_L t6, (KVM_ARCH_GUEST_HSTATUS)(a0)
+ csrw CSR_HSTATUS, t6
+
+ /* Restore Guest GPRs (except A0) */
+ REG_L ra, (KVM_ARCH_GUEST_RA)(a0)
+ REG_L sp, (KVM_ARCH_GUEST_SP)(a0)
+ REG_L gp, (KVM_ARCH_GUEST_GP)(a0)
+ REG_L tp, (KVM_ARCH_GUEST_TP)(a0)
+ REG_L t0, (KVM_ARCH_GUEST_T0)(a0)
+ REG_L t1, (KVM_ARCH_GUEST_T1)(a0)
+ REG_L t2, (KVM_ARCH_GUEST_T2)(a0)
+ REG_L s0, (KVM_ARCH_GUEST_S0)(a0)
+ REG_L s1, (KVM_ARCH_GUEST_S1)(a0)
+ REG_L a1, (KVM_ARCH_GUEST_A1)(a0)
+ REG_L a2, (KVM_ARCH_GUEST_A2)(a0)
+ REG_L a3, (KVM_ARCH_GUEST_A3)(a0)
+ REG_L a4, (KVM_ARCH_GUEST_A4)(a0)
+ REG_L a5, (KVM_ARCH_GUEST_A5)(a0)
+ REG_L a6, (KVM_ARCH_GUEST_A6)(a0)
+ REG_L a7, (KVM_ARCH_GUEST_A7)(a0)
+ REG_L s2, (KVM_ARCH_GUEST_S2)(a0)
+ REG_L s3, (KVM_ARCH_GUEST_S3)(a0)
+ REG_L s4, (KVM_ARCH_GUEST_S4)(a0)
+ REG_L s5, (KVM_ARCH_GUEST_S5)(a0)
+ REG_L s6, (KVM_ARCH_GUEST_S6)(a0)
+ REG_L s7, (KVM_ARCH_GUEST_S7)(a0)
+ REG_L s8, (KVM_ARCH_GUEST_S8)(a0)
+ REG_L s9, (KVM_ARCH_GUEST_S9)(a0)
+ REG_L s10, (KVM_ARCH_GUEST_S10)(a0)
+ REG_L s11, (KVM_ARCH_GUEST_S11)(a0)
+ REG_L t3, (KVM_ARCH_GUEST_T3)(a0)
+ REG_L t4, (KVM_ARCH_GUEST_T4)(a0)
+ REG_L t5, (KVM_ARCH_GUEST_T5)(a0)
+ REG_L t6, (KVM_ARCH_GUEST_T6)(a0)
+
+ /* Save Host A0 in SSCRATCH */
+ csrw CSR_SSCRATCH, a0
+
+ /* Restore Guest A0 */
+ REG_L a0, (KVM_ARCH_GUEST_A0)(a0)
+
+ /* Resume Guest */
+ sret
+
+ /* Back to Host */
+ .align 2
+__kvm_switch_return:
+ /* Swap Guest A0 with SSCRATCH */
+ csrrw a0, CSR_SSCRATCH, a0
+
+ /* Save Guest GPRs (except A0) */
+ REG_S ra, (KVM_ARCH_GUEST_RA)(a0)
+ REG_S sp, (KVM_ARCH_GUEST_SP)(a0)
+ REG_S gp, (KVM_ARCH_GUEST_GP)(a0)
+ REG_S tp, (KVM_ARCH_GUEST_TP)(a0)
+ REG_S t0, (KVM_ARCH_GUEST_T0)(a0)
+ REG_S t1, (KVM_ARCH_GUEST_T1)(a0)
+ REG_S t2, (KVM_ARCH_GUEST_T2)(a0)
+ REG_S s0, (KVM_ARCH_GUEST_S0)(a0)
+ REG_S s1, (KVM_ARCH_GUEST_S1)(a0)
+ REG_S a1, (KVM_ARCH_GUEST_A1)(a0)
+ REG_S a2, (KVM_ARCH_GUEST_A2)(a0)
+ REG_S a3, (KVM_ARCH_GUEST_A3)(a0)
+ REG_S a4, (KVM_ARCH_GUEST_A4)(a0)
+ REG_S a5, (KVM_ARCH_GUEST_A5)(a0)
+ REG_S a6, (KVM_ARCH_GUEST_A6)(a0)
+ REG_S a7, (KVM_ARCH_GUEST_A7)(a0)
+ REG_S s2, (KVM_ARCH_GUEST_S2)(a0)
+ REG_S s3, (KVM_ARCH_GUEST_S3)(a0)
+ REG_S s4, (KVM_ARCH_GUEST_S4)(a0)
+ REG_S s5, (KVM_ARCH_GUEST_S5)(a0)
+ REG_S s6, (KVM_ARCH_GUEST_S6)(a0)
+ REG_S s7, (KVM_ARCH_GUEST_S7)(a0)
+ REG_S s8, (KVM_ARCH_GUEST_S8)(a0)
+ REG_S s9, (KVM_ARCH_GUEST_S9)(a0)
+ REG_S s10, (KVM_ARCH_GUEST_S10)(a0)
+ REG_S s11, (KVM_ARCH_GUEST_S11)(a0)
+ REG_S t3, (KVM_ARCH_GUEST_T3)(a0)
+ REG_S t4, (KVM_ARCH_GUEST_T4)(a0)
+ REG_S t5, (KVM_ARCH_GUEST_T5)(a0)
+ REG_S t6, (KVM_ARCH_GUEST_T6)(a0)
+
+ /* Save Guest A0 */
+ csrr t0, CSR_SSCRATCH
+ REG_S t0, (KVM_ARCH_GUEST_A0)(a0)
+
+ /* Save Guest HSTATUS, SSTATUS, and SEPC */
+ csrr t0, CSR_SEPC
+ REG_S t0, (KVM_ARCH_GUEST_SEPC)(a0)
+ csrr t1, CSR_SSTATUS
+ REG_S t1, (KVM_ARCH_GUEST_SSTATUS)(a0)
+ csrr t2, CSR_HSTATUS
+ REG_S t2, (KVM_ARCH_GUEST_HSTATUS)(a0)
+
+ /* Restore Host SSTATUS, HSTATUS, SCRATCH and STVEC */
+ REG_L t3, (KVM_ARCH_HOST_SSTATUS)(a0)
+ csrw CSR_SSTATUS, t3
+ REG_L t4, (KVM_ARCH_HOST_HSTATUS)(a0)
+ csrw CSR_HSTATUS, t4
+ REG_L t5, (KVM_ARCH_HOST_SSCRATCH)(a0)
+ csrw CSR_SSCRATCH, t5
+ REG_L t6, (KVM_ARCH_HOST_STVEC)(a0)
+ csrw CSR_STVEC, t6
+
+ /* Restore Host GPRs (except A0 and T0-T6) */
+ REG_L ra, (KVM_ARCH_HOST_RA)(a0)
+ REG_L sp, (KVM_ARCH_HOST_SP)(a0)
+ REG_L gp, (KVM_ARCH_HOST_GP)(a0)
+ REG_L tp, (KVM_ARCH_HOST_TP)(a0)
+ REG_L s0, (KVM_ARCH_HOST_S0)(a0)
+ REG_L s1, (KVM_ARCH_HOST_S1)(a0)
+ REG_L a1, (KVM_ARCH_HOST_A1)(a0)
+ REG_L a2, (KVM_ARCH_HOST_A2)(a0)
+ REG_L a3, (KVM_ARCH_HOST_A3)(a0)
+ REG_L a4, (KVM_ARCH_HOST_A4)(a0)
+ REG_L a5, (KVM_ARCH_HOST_A5)(a0)
+ REG_L a6, (KVM_ARCH_HOST_A6)(a0)
+ REG_L a7, (KVM_ARCH_HOST_A7)(a0)
+ REG_L s2, (KVM_ARCH_HOST_S2)(a0)
+ REG_L s3, (KVM_ARCH_HOST_S3)(a0)
+ REG_L s4, (KVM_ARCH_HOST_S4)(a0)
+ REG_L s5, (KVM_ARCH_HOST_S5)(a0)
+ REG_L s6, (KVM_ARCH_HOST_S6)(a0)
+ REG_L s7, (KVM_ARCH_HOST_S7)(a0)
+ REG_L s8, (KVM_ARCH_HOST_S8)(a0)
+ REG_L s9, (KVM_ARCH_HOST_S9)(a0)
+ REG_L s10, (KVM_ARCH_HOST_S10)(a0)
+ REG_L s11, (KVM_ARCH_HOST_S11)(a0)
+
+ /* Return to C code */
+ ret
+ENDPROC(__kvm_riscv_switch_to)
--
2.17.1

2019-07-29 15:42:06

by Anup Patel

[permalink] [raw]
Subject: [RFC PATCH 08/16] RISC-V: KVM: Handle MMIO exits for VCPU

We will get stage2 page faults whenever Guest/VM access SW emulated
MMIO device or unmapped Guest RAM.

This patch implements MMIO read/write emulation by extracting MMIO
details from the trapped load/store instruction and forwarding the
MMIO read/write to user-space. The actual MMIO emulation will happen
in user-space and KVM kernel module will only take care of register
updates before resuming the trapped VCPU.

The handling for stage2 page faults for unmapped Guest RAM will be
implemeted by a separate patch later.

Signed-off-by: Anup Patel <[email protected]>
---
arch/riscv/include/asm/kvm_host.h | 11 +
arch/riscv/kvm/mmu.c | 7 +
arch/riscv/kvm/vcpu_exit.c | 435 +++++++++++++++++++++++++++++-
3 files changed, 450 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 006785bd6474..82e568ae0260 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -54,6 +54,12 @@ struct kvm_arch {
phys_addr_t pgd_phys;
};

+struct kvm_mmio_decode {
+ unsigned long insn;
+ int len;
+ int shift;
+};
+
struct kvm_cpu_context {
unsigned long zero;
unsigned long ra;
@@ -136,6 +142,9 @@ struct kvm_vcpu_arch {
raw_spinlock_t irqs_lock;
unsigned long irqs_pending;

+ /* MMIO instruction details */
+ struct kvm_mmio_decode mmio_decode;
+
/* VCPU power-off state */
bool power_off;

@@ -149,6 +158,8 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}

+int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
+ bool is_write);
void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index cead012a8399..963f3c373781 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -61,6 +61,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
return 0;
}

+int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva,
+ bool is_write)
+{
+ /* TODO: */
+ return 0;
+}
+
void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu)
{
/* TODO: */
diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
index e4d7c8f0807a..4dafefa59338 100644
--- a/arch/riscv/kvm/vcpu_exit.c
+++ b/arch/riscv/kvm/vcpu_exit.c
@@ -6,9 +6,370 @@
* Anup Patel <[email protected]>
*/

+#include <linux/bitops.h>
#include <linux/errno.h>
#include <linux/err.h>
#include <linux/kvm_host.h>
+#include <asm/csr.h>
+
+#define INSN_MATCH_LB 0x3
+#define INSN_MASK_LB 0x707f
+#define INSN_MATCH_LH 0x1003
+#define INSN_MASK_LH 0x707f
+#define INSN_MATCH_LW 0x2003
+#define INSN_MASK_LW 0x707f
+#define INSN_MATCH_LD 0x3003
+#define INSN_MASK_LD 0x707f
+#define INSN_MATCH_LBU 0x4003
+#define INSN_MASK_LBU 0x707f
+#define INSN_MATCH_LHU 0x5003
+#define INSN_MASK_LHU 0x707f
+#define INSN_MATCH_LWU 0x6003
+#define INSN_MASK_LWU 0x707f
+#define INSN_MATCH_SB 0x23
+#define INSN_MASK_SB 0x707f
+#define INSN_MATCH_SH 0x1023
+#define INSN_MASK_SH 0x707f
+#define INSN_MATCH_SW 0x2023
+#define INSN_MASK_SW 0x707f
+#define INSN_MATCH_SD 0x3023
+#define INSN_MASK_SD 0x707f
+
+#define INSN_MATCH_C_LD 0x6000
+#define INSN_MASK_C_LD 0xe003
+#define INSN_MATCH_C_SD 0xe000
+#define INSN_MASK_C_SD 0xe003
+#define INSN_MATCH_C_LW 0x4000
+#define INSN_MASK_C_LW 0xe003
+#define INSN_MATCH_C_SW 0xc000
+#define INSN_MASK_C_SW 0xe003
+#define INSN_MATCH_C_LDSP 0x6002
+#define INSN_MASK_C_LDSP 0xe003
+#define INSN_MATCH_C_SDSP 0xe002
+#define INSN_MASK_C_SDSP 0xe003
+#define INSN_MATCH_C_LWSP 0x4002
+#define INSN_MASK_C_LWSP 0xe003
+#define INSN_MATCH_C_SWSP 0xc002
+#define INSN_MASK_C_SWSP 0xe003
+
+#define INSN_LEN(insn) ((((insn) & 0x3) < 0x3) ? 2 : 4)
+
+#ifdef CONFIG_64BIT
+#define LOG_REGBYTES 3
+#else
+#define LOG_REGBYTES 2
+#endif
+#define REGBYTES (1 << LOG_REGBYTES)
+
+#define SH_RD 7
+#define SH_RS1 15
+#define SH_RS2 20
+#define SH_RS2C 2
+
+#define RV_X(x, s, n) (((x) >> (s)) & ((1 << (n)) - 1))
+#define RVC_LW_IMM(x) ((RV_X(x, 6, 1) << 2) | \
+ (RV_X(x, 10, 3) << 3) | \
+ (RV_X(x, 5, 1) << 6))
+#define RVC_LD_IMM(x) ((RV_X(x, 10, 3) << 3) | \
+ (RV_X(x, 5, 2) << 6))
+#define RVC_LWSP_IMM(x) ((RV_X(x, 4, 3) << 2) | \
+ (RV_X(x, 12, 1) << 5) | \
+ (RV_X(x, 2, 2) << 6))
+#define RVC_LDSP_IMM(x) ((RV_X(x, 5, 2) << 3) | \
+ (RV_X(x, 12, 1) << 5) | \
+ (RV_X(x, 2, 3) << 6))
+#define RVC_SWSP_IMM(x) ((RV_X(x, 9, 4) << 2) | \
+ (RV_X(x, 7, 2) << 6))
+#define RVC_SDSP_IMM(x) ((RV_X(x, 10, 3) << 3) | \
+ (RV_X(x, 7, 3) << 6))
+#define RVC_RS1S(insn) (8 + RV_X(insn, SH_RD, 3))
+#define RVC_RS2S(insn) (8 + RV_X(insn, SH_RS2C, 3))
+#define RVC_RS2(insn) RV_X(insn, SH_RS2C, 5)
+
+#define SHIFT_RIGHT(x, y) \
+ ((y) < 0 ? ((x) << -(y)) : ((x) >> (y)))
+
+#define REG_MASK \
+ ((1 << (5 + LOG_REGBYTES)) - (1 << LOG_REGBYTES))
+
+#define REG_OFFSET(insn, pos) \
+ (SHIFT_RIGHT((insn), (pos) - LOG_REGBYTES) & REG_MASK)
+
+#define REG_PTR(insn, pos, regs) \
+ (ulong *)((ulong)(regs) + REG_OFFSET(insn, pos))
+
+#define GET_RM(insn) (((insn) >> 12) & 7)
+
+#define GET_RS1(insn, regs) (*REG_PTR(insn, SH_RS1, regs))
+#define GET_RS2(insn, regs) (*REG_PTR(insn, SH_RS2, regs))
+#define GET_RS1S(insn, regs) (*REG_PTR(RVC_RS1S(insn), 0, regs))
+#define GET_RS2S(insn, regs) (*REG_PTR(RVC_RS2S(insn), 0, regs))
+#define GET_RS2C(insn, regs) (*REG_PTR(insn, SH_RS2C, regs))
+#define GET_SP(regs) (*REG_PTR(2, 0, regs))
+#define SET_RD(insn, regs, val) (*REG_PTR(insn, SH_RD, regs) = (val))
+#define IMM_I(insn) ((s32)(insn) >> 20)
+#define IMM_S(insn) (((s32)(insn) >> 25 << 5) | \
+ (s32)(((insn) >> 7) & 0x1f))
+#define MASK_FUNCT3 0x7000
+
+#define STR(x) XSTR(x)
+#define XSTR(x) #x
+
+static ulong get_insn(struct kvm_vcpu *vcpu)
+{
+ ulong __sepc = vcpu->arch.guest_context.sepc;
+ ulong __hstatus, __sstatus, __vsstatus;
+#ifdef CONFIG_RISCV_ISA_C
+ ulong rvc_mask = 3, tmp;
+#endif
+ ulong flags, val;
+
+ local_irq_save(flags);
+
+ __vsstatus = csr_read(CSR_VSSTATUS);
+ __sstatus = csr_read(CSR_SSTATUS);
+ __hstatus = csr_read(CSR_HSTATUS);
+
+ csr_write(CSR_VSSTATUS, __vsstatus | SR_MXR);
+ csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus | SR_MXR);
+ csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
+
+#ifndef CONFIG_RISCV_ISA_C
+ asm ("\n"
+#ifdef CONFIG_64BIT
+ STR(LWU) " %[insn], (%[addr])\n"
+#else
+ STR(LW) " %[insn], (%[addr])\n"
+#endif
+ : [insn] "=&r" (val) : [addr] "r" (__sepc));
+#else
+ asm ("and %[tmp], %[addr], 2\n"
+ "bnez %[tmp], 1f\n"
+#ifdef CONFIG_64BIT
+ STR(LWU) " %[insn], (%[addr])\n"
+#else
+ STR(LW) " %[insn], (%[addr])\n"
+#endif
+ "and %[tmp], %[insn], %[rvc_mask]\n"
+ "beq %[tmp], %[rvc_mask], 2f\n"
+ "sll %[insn], %[insn], %[xlen_minus_16]\n"
+ "srl %[insn], %[insn], %[xlen_minus_16]\n"
+ "j 2f\n"
+ "1:\n"
+ "lhu %[insn], (%[addr])\n"
+ "and %[tmp], %[insn], %[rvc_mask]\n"
+ "bne %[tmp], %[rvc_mask], 2f\n"
+ "lhu %[tmp], 2(%[addr])\n"
+ "sll %[tmp], %[tmp], 16\n"
+ "add %[insn], %[insn], %[tmp]\n"
+ "2:"
+ : [vsstatus] "+&r" (__vsstatus), [insn] "=&r" (val),
+ [tmp] "=&r" (tmp)
+ : [addr] "r" (__sepc), [rvc_mask] "r" (rvc_mask),
+ [xlen_minus_16] "i" (__riscv_xlen - 16));
+#endif
+
+ csr_write(CSR_HSTATUS, __hstatus);
+ csr_write(CSR_SSTATUS, __sstatus);
+ csr_write(CSR_VSSTATUS, __vsstatus);
+
+ local_irq_restore(flags);
+
+ return val;
+}
+
+static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run,
+ unsigned long fault_addr)
+{
+ int shift = 0, len = 0;
+ ulong insn = get_insn(vcpu);
+
+ /* Decode length of MMIO and shift */
+ if ((insn & INSN_MASK_LW) == INSN_MATCH_LW) {
+ len = 4;
+ shift = 8 * (sizeof(ulong) - len);
+ } else if ((insn & INSN_MASK_LB) == INSN_MATCH_LB) {
+ len = 1;
+ shift = 8 * (sizeof(ulong) - len);
+ } else if ((insn & INSN_MASK_LBU) == INSN_MATCH_LBU) {
+ len = 1;
+ shift = 8 * (sizeof(ulong) - len);
+#ifdef CONFIG_64BIT
+ } else if ((insn & INSN_MASK_LD) == INSN_MATCH_LD) {
+ len = 8;
+ shift = 8 * (sizeof(ulong) - len);
+ } else if ((insn & INSN_MASK_LWU) == INSN_MATCH_LWU) {
+ len = 4;
+#endif
+ } else if ((insn & INSN_MASK_LH) == INSN_MATCH_LH) {
+ len = 2;
+ shift = 8 * (sizeof(ulong) - len);
+ } else if ((insn & INSN_MASK_LHU) == INSN_MATCH_LHU) {
+ len = 2;
+#ifdef CONFIG_RISCV_ISA_C
+#ifdef CONFIG_64BIT
+ } else if ((insn & INSN_MASK_C_LD) == INSN_MATCH_C_LD) {
+ len = 8;
+ shift = 8 * (sizeof(ulong) - len);
+ insn = RVC_RS2S(insn) << SH_RD;
+ } else if ((insn & INSN_MASK_C_LDSP) == INSN_MATCH_C_LDSP &&
+ ((insn >> SH_RD) & 0x1f)) {
+ len = 8;
+ shift = 8 * (sizeof(ulong) - len);
+#endif
+ } else if ((insn & INSN_MASK_C_LW) == INSN_MATCH_C_LW) {
+ len = 4;
+ shift = 8 * (sizeof(ulong) - len);
+ insn = RVC_RS2S(insn) << SH_RD;
+ } else if ((insn & INSN_MASK_C_LWSP) == INSN_MATCH_C_LWSP &&
+ ((insn >> SH_RD) & 0x1f)) {
+ len = 4;
+ shift = 8 * (sizeof(ulong) - len);
+#endif
+ } else {
+ return -ENOTSUPP;
+ }
+
+ /* Fault address should be aligned to length of MMIO */
+ if (fault_addr & (len - 1))
+ return -EIO;
+
+ /* Save instruction decode info */
+ vcpu->arch.mmio_decode.insn = insn;
+ vcpu->arch.mmio_decode.shift = shift;
+ vcpu->arch.mmio_decode.len = len;
+
+ /* Exit to userspace for MMIO emulation */
+ vcpu->stat.mmio_exit_user++;
+ run->exit_reason = KVM_EXIT_MMIO;
+ run->mmio.is_write = false;
+ run->mmio.phys_addr = fault_addr;
+ run->mmio.len = len;
+
+ /* Move to next instruction */
+ vcpu->arch.guest_context.sepc += INSN_LEN(insn);
+
+ return 0;
+}
+
+static int emulate_store(struct kvm_vcpu *vcpu, struct kvm_run *run,
+ unsigned long fault_addr)
+{
+ u8 data8;
+ u16 data16;
+ u32 data32;
+ u64 data64;
+ ulong data;
+ int len = 0;
+ ulong insn = get_insn(vcpu);
+
+ data = GET_RS2(insn, &vcpu->arch.guest_context);
+ data8 = data16 = data32 = data64 = data;
+
+ if ((insn & INSN_MASK_SW) == INSN_MATCH_SW) {
+ len = 4;
+ } else if ((insn & INSN_MASK_SB) == INSN_MATCH_SB) {
+ len = 1;
+#ifdef CONFIG_64BIT
+ } else if ((insn & INSN_MASK_SD) == INSN_MATCH_SD) {
+ len = 8;
+#endif
+ } else if ((insn & INSN_MASK_SH) == INSN_MATCH_SH) {
+ len = 2;
+#ifdef CONFIG_RISCV_ISA_C
+#ifdef CONFIG_64BIT
+ } else if ((insn & INSN_MASK_C_SD) == INSN_MATCH_C_SD) {
+ len = 8;
+ data64 = GET_RS2S(insn, &vcpu->arch.guest_context);
+ } else if ((insn & INSN_MASK_C_SDSP) == INSN_MATCH_C_SDSP &&
+ ((insn >> SH_RD) & 0x1f)) {
+ len = 8;
+ data64 = GET_RS2C(insn, &vcpu->arch.guest_context);
+#endif
+ } else if ((insn & INSN_MASK_C_SW) == INSN_MATCH_C_SW) {
+ len = 4;
+ data32 = GET_RS2S(insn, &vcpu->arch.guest_context);
+ } else if ((insn & INSN_MASK_C_SWSP) == INSN_MATCH_C_SWSP &&
+ ((insn >> SH_RD) & 0x1f)) {
+ len = 4;
+ data32 = GET_RS2C(insn, &vcpu->arch.guest_context);
+#endif
+ } else {
+ return -ENOTSUPP;
+ }
+
+ /* Fault address should be aligned to length of MMIO */
+ if (fault_addr & (len - 1))
+ return -EIO;
+
+ /* Clear instruction decode info */
+ vcpu->arch.mmio_decode.insn = 0;
+ vcpu->arch.mmio_decode.shift = 0;
+ vcpu->arch.mmio_decode.len = 0;
+
+ /* Copy data to kvm_run instance */
+ switch (len) {
+ case 1:
+ *((u8 *)run->mmio.data) = data8;
+ break;
+ case 2:
+ *((u16 *)run->mmio.data) = data16;
+ break;
+ case 4:
+ *((u32 *)run->mmio.data) = data32;
+ break;
+ case 8:
+ *((u64 *)run->mmio.data) = data64;
+ break;
+ default:
+ return -ENOTSUPP;
+ };
+
+ /* Exit to userspace for MMIO emulation */
+ vcpu->stat.mmio_exit_user++;
+ run->exit_reason = KVM_EXIT_MMIO;
+ run->mmio.is_write = true;
+ run->mmio.phys_addr = fault_addr;
+ run->mmio.len = len;
+
+ /* Move to next instruction */
+ vcpu->arch.guest_context.sepc += INSN_LEN(insn);
+
+ return 0;
+}
+
+static int stage2_page_fault(struct kvm_vcpu *vcpu, struct kvm_run *run,
+ unsigned long scause, unsigned long stval)
+{
+ struct kvm_memory_slot *memslot;
+ unsigned long hva;
+ bool writable;
+ gfn_t gfn;
+ int ret;
+
+ gfn = stval >> PAGE_SHIFT;
+ memslot = gfn_to_memslot(vcpu->kvm, gfn);
+ hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
+
+ if (kvm_is_error_hva(hva) ||
+ (scause == EXC_STORE_PAGE_FAULT && !writable)) {
+ switch (scause) {
+ case EXC_LOAD_PAGE_FAULT:
+ return emulate_load(vcpu, run, stval);
+ case EXC_STORE_PAGE_FAULT:
+ return emulate_store(vcpu, run, stval);
+ default:
+ return -ENOTSUPP;
+ };
+ }
+
+ ret = kvm_riscv_stage2_map(vcpu, stval, hva,
+ (scause == EXC_STORE_PAGE_FAULT) ? true : false);
+ if (ret < 0)
+ return ret;
+
+ return 1;
+}

/**
* kvm_riscv_vcpu_mmio_return -- Handle MMIO loads after user space emulation
@@ -19,7 +380,44 @@
*/
int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
- /* TODO: */
+ u8 data8;
+ u16 data16;
+ u32 data32;
+ u64 data64;
+ ulong insn;
+ int len, shift;
+
+ if (run->mmio.is_write)
+ return 0;
+
+ insn = vcpu->arch.mmio_decode.insn;
+ len = vcpu->arch.mmio_decode.len;
+ shift = vcpu->arch.mmio_decode.shift;
+ switch (len) {
+ case 1:
+ data8 = *((u8 *)run->mmio.data);
+ SET_RD(insn, &vcpu->arch.guest_context,
+ (ulong)data8 << shift >> shift);
+ break;
+ case 2:
+ data16 = *((u16 *)run->mmio.data);
+ SET_RD(insn, &vcpu->arch.guest_context,
+ (ulong)data16 << shift >> shift);
+ break;
+ case 4:
+ data32 = *((u32 *)run->mmio.data);
+ SET_RD(insn, &vcpu->arch.guest_context,
+ (ulong)data32 << shift >> shift);
+ break;
+ case 8:
+ data64 = *((u64 *)run->mmio.data);
+ SET_RD(insn, &vcpu->arch.guest_context,
+ (ulong)data64 << shift >> shift);
+ break;
+ default:
+ return -ENOTSUPP;
+ };
+
return 0;
}

@@ -30,6 +428,37 @@ int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
unsigned long scause, unsigned long stval)
{
- /* TODO: */
- return 0;
+ int ret;
+
+ /* If we got host interrupt then do nothing */
+ if (scause & SCAUSE_IRQ_FLAG)
+ return 1;
+
+ /* Handle guest traps */
+ ret = -EFAULT;
+ run->exit_reason = KVM_EXIT_UNKNOWN;
+ switch (scause) {
+ case EXC_INST_PAGE_FAULT:
+ case EXC_LOAD_PAGE_FAULT:
+ case EXC_STORE_PAGE_FAULT:
+ if ((vcpu->arch.guest_context.hstatus & HSTATUS_SPV) &&
+ (vcpu->arch.guest_context.hstatus & HSTATUS_STL))
+ ret = stage2_page_fault(vcpu, run, scause, stval);
+ break;
+ default:
+ break;
+ };
+
+ /* Print details in-case of error */
+ if (ret < 0) {
+ kvm_err("VCPU exit error %d\n", ret);
+ kvm_err("SEPC=0x%lx SSTATUS=0x%lx HSTATUS=0x%lx\n",
+ vcpu->arch.guest_context.sepc,
+ vcpu->arch.guest_context.sstatus,
+ vcpu->arch.guest_context.hstatus);
+ kvm_err("SCAUSE=0x%lx STVAL=0x%lx\n",
+ scause, stval);
+ }
+
+ return ret;
}
--
2.17.1

2019-07-29 19:52:37

by Atish Patra

[permalink] [raw]
Subject: Re: [RFC PATCH 15/16] RISC-V: KVM: Add SBI v0.1 support

On Mon, 2019-07-29 at 21:40 +0200, Paolo Bonzini wrote:
> On 29/07/19 13:57, Anup Patel wrote:
> > + csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus |
> > HSTATUS_SPRV);
> > + csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus);
> > + val = *addr;
>
> What happens if this load faults?
>

It should redirect the trap back to VS mode. Currently, it is not
implemented. It is on the TO-DO list for future iteration of the
series.

Regards,
Atish
> Paolo

2019-07-29 20:08:05

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 15/16] RISC-V: KVM: Add SBI v0.1 support

On 29/07/19 13:57, Anup Patel wrote:
> + csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
> + csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus);
> + val = *addr;

What happens if this load faults?

Paolo

2019-07-29 20:10:06

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 15/16] RISC-V: KVM: Add SBI v0.1 support

On 29/07/19 21:51, Atish Patra wrote:
> On Mon, 2019-07-29 at 21:40 +0200, Paolo Bonzini wrote:
>> On 29/07/19 13:57, Anup Patel wrote:
>>> + csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus |
>>> HSTATUS_SPRV);
>>> + csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus);
>>> + val = *addr;
>>
>> What happens if this load faults?
>>
>
> It should redirect the trap back to VS mode. Currently, it is not
> implemented. It is on the TO-DO list for future iteration of the
> series.

Ok, please add TODO comments for the more important tasks like this one
(and/or post a somewhat complete list in reply to 00/16).

Thanks!

Paolo

2019-07-29 21:09:52

by Atish Patra

[permalink] [raw]
Subject: Re: [RFC PATCH 15/16] RISC-V: KVM: Add SBI v0.1 support

On Mon, 2019-07-29 at 22:08 +0200, Paolo Bonzini wrote:
> On 29/07/19 21:51, Atish Patra wrote:
> > On Mon, 2019-07-29 at 21:40 +0200, Paolo Bonzini wrote:
> > > On 29/07/19 13:57, Anup Patel wrote:
> > > > + csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus
> > > > |
> > > > HSTATUS_SPRV);
> > > > + csr_write(CSR_SSTATUS, vcpu-
> > > > >arch.guest_context.sstatus);
> > > > + val = *addr;
> > >
> > > What happens if this load faults?
> > >
> >
> > It should redirect the trap back to VS mode. Currently, it is not
> > implemented. It is on the TO-DO list for future iteration of the
> > series.
>
> Ok, please add TODO comments for the more important tasks like this
> one
> (and/or post a somewhat complete list in reply to 00/16).
>

Sure. Will add a TODO comment here and put the complete TODO list in
cover letter as well.

Regards,
Atish
> Thanks!
>
> Paolo
>

2019-07-29 22:41:29

by Atish Patra

[permalink] [raw]
Subject: Re: [RFC PATCH 13/16] RISC-V: KVM: Add timer functionality

On Mon, 2019-07-29 at 16:40 +0200, Andreas Schwab wrote:
> On Jul 29 2019, Anup Patel <[email protected]> wrote:
>
> > From: Atish Patra <[email protected]>
> >
> > The RISC-V hypervisor specification doesn't have any virtual timer
> > feature.
> >
> > Due to this, the guest VCPU timer will be programmed via SBI calls.
> > The host will use a separate hrtimer event for each guest VCPU to
> > provide timer functionality. We inject a virtual timer interrupt to
> > the guest VCPU whenever the guest VCPU hrtimer event expires.
> >
> > The following features are not supported yet and will be added in
> > future:
> > 1. A time offset to adjust guest time from host time
> > 2. A saved next event in guest vcpu for vm migration
>
> I'm getting this error:
>
> In file included from <command-line>:
> ./include/clocksource/timer-riscv.h:12:30: error: unknown type name
> ‘u32’
> 12 | void riscv_cs_get_mult_shift(u32 *mult, u32 *shift);
> | ^~~
> ./include/clocksource/timer-riscv.h:12:41: error: unknown type name
> ‘u32’
> 12 | void riscv_cs_get_mult_shift(u32 *mult, u32 *shift);
> | ^~~
> make[1]: *** [scripts/Makefile.build:301: include/clocksource/timer-
> riscv.h.s] Error 1
>
> Andreas.
>

Strange. We never saw this error. But I think we should add this one to
the header file (include/clocksource/timer-riscv.h)

#include <linux/types.h>

Can you try it at your end and confirm please ?

Regards,
Atish

2019-07-30 05:54:42

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] KVM RISC-V Support

On Tue, Jul 30, 2019 at 3:17 AM Paolo Bonzini <[email protected]> wrote:
>
> On 29/07/19 13:56, Anup Patel wrote:
> > This series adds initial KVM RISC-V support. Currently, we are able to boot
> > RISC-V 64bit Linux Guests with multiple VCPUs.
> >
> > Few key aspects of KVM RISC-V added by this series are:
> > 1. Minimal possible KVM world-switch which touches only GPRs and few CSRs.
> > 2. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure.
> > 3. KVM ONE_REG interface for VCPU register access from user-space.
> > 4. PLIC emulation is done in user-space. In-kernel PLIC emulation, will
> > be added in future.
> > 5. Timer and IPI emuation is done in-kernel.
> > 6. MMU notifiers supported.
> > 7. FP lazy save/restore supported.
> > 8. SBI v0.1 emulation for KVM Guest available.
> >
> > More feature additions and enhancments will follow after this series and
> > eventually KVM RISC-V will be at-par with other architectures.
>
> This looks clean and it shouldn't take long to have it merged. Please
> sort out the MAINTAINERS additions. It would also be nice if
> tools/testing/selftests/kvm/ worked with RISC-V from the beginning;
> there have been recent ARM and s390 ports that you can take some
> inspiration from.

Thanks Paolo.

We will certainly include a patch in v2 series for MAINTAINERS entry.

We referred existing implementation of KVM ARM/ARM64, KVM powerpc
and KVM mips when we started KVM RISC-V port.

Here's a brief TODO list which we want to immediately work upon after this
series:
1. Handle trap from unpriv access in SBI v0.1 emulation
2. In-kernel PLIC emulation
3. SBI v0.2 emulation in-kernel
4. SBI v0.2 hart hotplug emulation in-kernel
5. ..... and so on .....

We will include above TODO list in v2 series cover letter as well.

Apart from above, we also have a more exhaustive TODO list based on study
of other KVM ports which we want to discuss at upcoming LPC 2019.

We were thinking to keep KVM RISC-V disabled by default (i.e. keep it
experimental) until we have validated it on some FPGA or real HW. For now,
users can explicitly enable it and play-around on QEMU emulation. I hope
this is fine with most people ?

Regards,
Anup

2019-07-30 08:17:51

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] KVM RISC-V Support

On 29/07/19 13:56, Anup Patel wrote:
> This series adds initial KVM RISC-V support. Currently, we are able to boot
> RISC-V 64bit Linux Guests with multiple VCPUs.
>
> Few key aspects of KVM RISC-V added by this series are:
> 1. Minimal possible KVM world-switch which touches only GPRs and few CSRs.
> 2. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure.
> 3. KVM ONE_REG interface for VCPU register access from user-space.
> 4. PLIC emulation is done in user-space. In-kernel PLIC emulation, will
> be added in future.
> 5. Timer and IPI emuation is done in-kernel.
> 6. MMU notifiers supported.
> 7. FP lazy save/restore supported.
> 8. SBI v0.1 emulation for KVM Guest available.
>
> More feature additions and enhancments will follow after this series and
> eventually KVM RISC-V will be at-par with other architectures.

This looks clean and it shouldn't take long to have it merged. Please
sort out the MAINTAINERS additions. It would also be nice if
tools/testing/selftests/kvm/ worked with RISC-V from the beginning;
there have been recent ARM and s390 ports that you can take some
inspiration from.

Paolo

> This series is based upon KVM pre-patches sent by Atish earlier
> (https://lkml.org/lkml/2019/7/26/1271) and it can be found in
> riscv_kvm_v1 branch at:
> https//github.com/avpatel/linux.git
>
> Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v1 branch at:
> https//github.com/avpatel/kvmtool.git
>
> We need OpenSBI with RISC-V hypervisor extension support which can be
> found in hyp_ext_changes_v1 branch at:
> https://github.com/riscv/opensbi.git
>
> The QEMU RISC-V hypervisor emulation is done by Alistair and is available
> in riscv-hyp-work.next branch at:
> https://github.com/alistair23/qemu.git
>
> To play around with KVM RISC-V, here are few reference commands:
> 1) To cross-compile KVMTOOL:
> $ make lkvm-static
> 2) To launch RISC-V Host Linux:
> $ qemu-system-riscv64 -monitor null -cpu rv64,h=true -M virt \
> -m 512M -display none -serial mon:stdio \
> -kernel opensbi/build/platform/qemu/virt/firmware/fw_jump.elf \
> -device loader,file=build-riscv64/arch/riscv/boot/Image,addr=0x80200000 \
> -initrd ./rootfs_kvm_riscv64.img \
> -append "root=/dev/ram rw console=ttyS0 earlycon=sbi"
> 3) To launch RISC-V Guest Linux with 9P rootfs:
> $ ./apps/lkvm-static run -m 128 -c2 --console serial \
> -p "console=ttyS0 earlycon=uart8250,mmio,0x3f8" -k ./apps/Image --debug
> 4) To launch RISC-V Guest Linux with initrd:
> $ ./apps/lkvm-static run -m 128 -c2 --console serial \
> -p "console=ttyS0 earlycon=uart8250,mmio,0x3f8" -k ./apps/Image \
> -i ./apps/rootfs.img --debug
>
> Anup Patel (13):
> KVM: RISC-V: Add KVM_REG_RISCV for ONE_REG interface
> RISC-V: Add hypervisor extension related CSR defines
> RISC-V: Add initial skeletal KVM support
> RISC-V: KVM: Implement VCPU create, init and destroy functions
> RISC-V: KVM: Implement VCPU interrupts and requests handling
> RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
> RISC-V: KVM: Implement VCPU world-switch
> RISC-V: KVM: Handle MMIO exits for VCPU
> RISC-V: KVM: Handle WFI exits for VCPU
> RISC-V: KVM: Implement VMID allocator
> RISC-V: KVM: Implement stage2 page table programming
> RISC-V: KVM: Implement MMU notifiers
> RISC-V: Enable VIRTIO drivers in RV64 and RV32 defconfig
>
> Atish Patra (3):
> RISC-V: KVM: Add timer functionality
> RISC-V: KVM: FP lazy save/restore
> RISC-V: KVM: Add SBI v0.1 support
>
> arch/riscv/Kconfig | 2 +
> arch/riscv/Makefile | 2 +
> arch/riscv/configs/defconfig | 23 +-
> arch/riscv/configs/rv32_defconfig | 13 +
> arch/riscv/include/asm/csr.h | 58 ++
> arch/riscv/include/asm/kvm_host.h | 232 ++++++
> arch/riscv/include/asm/kvm_vcpu_timer.h | 32 +
> arch/riscv/include/asm/pgtable-bits.h | 1 +
> arch/riscv/include/uapi/asm/kvm.h | 74 ++
> arch/riscv/kernel/asm-offsets.c | 148 ++++
> arch/riscv/kvm/Kconfig | 34 +
> arch/riscv/kvm/Makefile | 14 +
> arch/riscv/kvm/main.c | 64 ++
> arch/riscv/kvm/mmu.c | 904 ++++++++++++++++++++++++
> arch/riscv/kvm/tlb.S | 42 ++
> arch/riscv/kvm/vcpu.c | 817 +++++++++++++++++++++
> arch/riscv/kvm/vcpu_exit.c | 553 +++++++++++++++
> arch/riscv/kvm/vcpu_sbi.c | 118 ++++
> arch/riscv/kvm/vcpu_switch.S | 367 ++++++++++
> arch/riscv/kvm/vcpu_timer.c | 106 +++
> arch/riscv/kvm/vm.c | 107 +++
> arch/riscv/kvm/vmid.c | 130 ++++
> drivers/clocksource/timer-riscv.c | 6 +
> include/clocksource/timer-riscv.h | 14 +
> include/uapi/linux/kvm.h | 1 +
> 25 files changed, 3857 insertions(+), 5 deletions(-)
> create mode 100644 arch/riscv/include/asm/kvm_host.h
> create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
> create mode 100644 arch/riscv/include/uapi/asm/kvm.h
> create mode 100644 arch/riscv/kvm/Kconfig
> create mode 100644 arch/riscv/kvm/Makefile
> create mode 100644 arch/riscv/kvm/main.c
> create mode 100644 arch/riscv/kvm/mmu.c
> create mode 100644 arch/riscv/kvm/tlb.S
> create mode 100644 arch/riscv/kvm/vcpu.c
> create mode 100644 arch/riscv/kvm/vcpu_exit.c
> create mode 100644 arch/riscv/kvm/vcpu_sbi.c
> create mode 100644 arch/riscv/kvm/vcpu_switch.S
> create mode 100644 arch/riscv/kvm/vcpu_timer.c
> create mode 100644 arch/riscv/kvm/vm.c
> create mode 100644 arch/riscv/kvm/vmid.c
> create mode 100644 include/clocksource/timer-riscv.h
>
> --
> 2.17.1
>

2019-07-30 10:23:16

by Andreas Schwab

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] KVM RISC-V Support

ERROR: "riscv_cs_get_mult_shift" [arch/riscv/kvm/kvm.ko] undefined!
ERROR: "riscv_isa" [arch/riscv/kvm/kvm.ko] undefined!
ERROR: "smp_send_reschedule" [arch/riscv/kvm/kvm.ko] undefined!
ERROR: "riscv_timebase" [arch/riscv/kvm/kvm.ko] undefined!

Andreas.

--
Andreas Schwab, SUSE Labs, [email protected]
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

2019-07-30 10:27:04

by Atish Patra

[permalink] [raw]
Subject: Re: [RFC PATCH 13/16] RISC-V: KVM: Add timer functionality

On 7/29/19 11:51 PM, Andreas Schwab wrote:
> On Jul 29 2019, Atish Patra <[email protected]> wrote:
>
>> Strange. We never saw this error.
>
> It is part of CONFIG_KERNEL_HEADER_TEST. Everyone developing a driver
> should enable it.
>
>> #include <linux/types.h>
>>
>> Can you try it at your end and confirm please ?
>
> Confirmed.
>

Thanks. I will update the patch in v2.

> Andreas.
>


--
Regards,
Atish

2019-07-30 10:51:03

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] KVM RISC-V Support

On Tue, Jul 30, 2019 at 12:23 PM Andreas Schwab <[email protected]> wrote:
>
> ERROR: "riscv_cs_get_mult_shift" [arch/riscv/kvm/kvm.ko] undefined!
> ERROR: "riscv_isa" [arch/riscv/kvm/kvm.ko] undefined!
> ERROR: "smp_send_reschedule" [arch/riscv/kvm/kvm.ko] undefined!
> ERROR: "riscv_timebase" [arch/riscv/kvm/kvm.ko] undefined!

Strange, we are not seeing these compile errors.

Anyway, please ensure that you apply Atish's KVM prep patches
(https://lkml.org/lkml/2019/7/26/1271) on Linux-5.3-rcX before applying
this series.

Regards,
Anup

2019-07-30 10:58:18

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] KVM RISC-V Support

On Tue, Jul 30, 2019 at 12:23 PM Andreas Schwab <[email protected]> wrote:
>
> ERROR: "riscv_cs_get_mult_shift" [arch/riscv/kvm/kvm.ko] undefined!
> ERROR: "riscv_isa" [arch/riscv/kvm/kvm.ko] undefined!
> ERROR: "smp_send_reschedule" [arch/riscv/kvm/kvm.ko] undefined!
> ERROR: "riscv_timebase" [arch/riscv/kvm/kvm.ko] undefined!

Found the issue.

These symbols are not exported and you are building KVM RISC-V as module.

Thanks for reporting. We will fix it.

Regards,
Anup

2019-07-30 10:59:28

by Andreas Schwab

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] KVM RISC-V Support

On Jul 30 2019, Anup Patel <[email protected]> wrote:

> On Tue, Jul 30, 2019 at 12:23 PM Andreas Schwab <[email protected]> wrote:
>>
>> ERROR: "riscv_cs_get_mult_shift" [arch/riscv/kvm/kvm.ko] undefined!
>> ERROR: "riscv_isa" [arch/riscv/kvm/kvm.ko] undefined!
>> ERROR: "smp_send_reschedule" [arch/riscv/kvm/kvm.ko] undefined!
>> ERROR: "riscv_timebase" [arch/riscv/kvm/kvm.ko] undefined!
>
> Strange, we are not seeing these compile errors.

None of these symbols are exported.

> Anyway, please ensure that you apply Atish's KVM prep patches
> (https://lkml.org/lkml/2019/7/26/1271) on Linux-5.3-rcX before applying
> this series.

None of these patches contain EXPORT_SYMBOL declarations.

Andreas.

--
Andreas Schwab, SUSE Labs, [email protected]
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

2019-07-30 11:09:12

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 04/16] RISC-V: KVM: Implement VCPU create, init and destroy functions

On 29/07/19 13:56, Anup Patel wrote:
> + cntx->hstatus |= HSTATUS_SP2V;
> + cntx->hstatus |= HSTATUS_SP2P;

IIUC, cntx->hstatus's SP2P bit contains the guest's sstatus.SPP bit? I
suggest adding a comment here, and again providing a ONE_REG interface
to sstatus so that the ABI is final before RISC-V KVM is merged.

What happens if the guest executes SRET? Is that EXC_SYSCALL in hedeleg?

(BTW the name of SP2V and SP2P is horrible, I think HPV/HPP or HSPV/HSPP
would have been clearer, but that's not your fault).

Paolo

> + cntx->hstatus |= HSTATUS_SPV;

2019-07-30 11:09:28

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls

On 29/07/19 13:56, Anup Patel wrote:
> The PC register represents program counter whereas the MODE
> register represent VCPU privilege mode (i.e. S/U-mode).
>

Is there any reason to include this pseudo-register instead of allowing
SSTATUS access directly in this patch (and perhaps also SEPC)?

Paolo

2019-07-30 11:09:59

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 10/16] RISC-V: KVM: Implement VMID allocator

On 29/07/19 13:57, Anup Patel wrote:
> + /* First user of a new VMID version? */
> + if (unlikely(vmid_next == 0)) {
> + atomic_long_inc(&vmid_version);
> + vmid_next = 1;
> +

vmid_version is only written under vmid_lock, so it doesn't need to be
atomic. You only need WRITE_ONCE/READ_ONCE.

> +
> + /* Request stage2 page table update for all VCPUs */
> + kvm_for_each_vcpu(i, v, vcpu->kvm)
> + kvm_make_request(KVM_REQ_UPDATE_PGTBL, v);

Perhaps rename kvm_riscv_stage2_update_pgtbl and KVM_REQ_UPDATE_PGTBL to
kvm_riscv_update_hgatp and KVM_REQ_UPDATE_HGATP?

Paolo

2019-07-30 11:10:17

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 11/16] RISC-V: KVM: Implement stage2 page table programming

On 29/07/19 13:57, Anup Patel wrote:
> This patch implements all required functions for programming
> the stage2 page table for each Guest/VM.
>
> At high-level, the flow of stage2 related functions is similar
> from KVM ARM/ARM64 implementation but the stage2 page table
> format is quite different for KVM RISC-V.

FWIW I very much prefer KVM x86's recursive implementation of the MMU to
the hardcoding of pgd/pmd/pte. I am not asking you to rewrite it, but
I'll mention it because I noticed that you do not support 48-bit guest
physical addresses.

Paolo

2019-07-30 11:12:17

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 03/16] RISC-V: Add initial skeletal KVM support

On 29/07/19 13:56, Anup Patel wrote:
> + case KVM_CAP_DEVICE_CTRL:
> + case KVM_CAP_USER_MEMORY:
> + case KVM_CAP_SYNC_MMU:

Technically KVM_CAP_SYNC_MMU should only be added after you add MMU
notifiers.

Paolo

> + case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
> + case KVM_CAP_ONE_REG:
> + case KVM_CAP_READONLY_MEM:
> + case KVM_CAP_MP_STATE:
> + case KVM_CAP_IMMEDIATE_EXIT:
> + r = 1;
> + break;

2019-07-30 11:13:46

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 03/16] RISC-V: Add initial skeletal KVM support

On 29/07/19 13:56, Anup Patel wrote:
> +void kvm_riscv_halt_guest(struct kvm *kvm)
> +{
> + int i;
> + struct kvm_vcpu *vcpu;
> +
> + kvm_for_each_vcpu(i, vcpu, kvm)
> + vcpu->arch.pause = true;
> + kvm_make_all_cpus_request(kvm, KVM_REQ_SLEEP);
> +}
> +
> +void kvm_riscv_resume_guest(struct kvm *kvm)
> +{
> + int i;
> + struct kvm_vcpu *vcpu;
> +
> + kvm_for_each_vcpu(i, vcpu, kvm) {
> + vcpu->arch.pause = false;
> + swake_up_one(kvm_arch_vcpu_wq(vcpu));
> + }

Are these unused? (Perhaps I'm just blind :))

Paolo

2019-07-30 11:15:15

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 07/16] RISC-V: KVM: Implement VCPU world-switch

On 29/07/19 13:57, Anup Patel wrote:
> void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> {
> - /* TODO: */
> + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
> +
> + csr_write(CSR_HIDELEG, csr->hideleg);
> + csr_write(CSR_HEDELEG, csr->hedeleg);

Writing HIDELEG and HEDELEG here seems either wrong or inefficient to me.

I don't remember the spec well enough, but there are two cases:

1) either they only matter while the guest runs and then you can set
them in kvm_arch_hardware_enable. KVM common code takes care of doing
this on all CPUs for you.

2) or they also matter while the host runs and then you need to set them
in vcpu_switch.S.

Paolo

2019-07-30 11:17:54

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls

On 30/07/19 10:43, Paolo Bonzini wrote:
> On 29/07/19 13:56, Anup Patel wrote:
>> The PC register represents program counter whereas the MODE
>> register represent VCPU privilege mode (i.e. S/U-mode).
>>
> Is there any reason to include this pseudo-register instead of allowing
> SSTATUS access directly in this patch (and perhaps also SEPC)?

Nevermind, I was confused - the current MODE is indeed not accessible as
a "real" CSR in RISC-V.

Still, I would prefer all the VS CSRs to be accessible via the get/set
reg ioctls.

Paolo

2019-07-30 12:09:46

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 03/16] RISC-V: Add initial skeletal KVM support

On Tue, Jul 30, 2019 at 2:53 PM Paolo Bonzini <[email protected]> wrote:
>
> On 29/07/19 13:56, Anup Patel wrote:
> > + case KVM_CAP_DEVICE_CTRL:
> > + case KVM_CAP_USER_MEMORY:
> > + case KVM_CAP_SYNC_MMU:
>
> Technically KVM_CAP_SYNC_MMU should only be added after you add MMU
> notifiers.

Sure, I will move this case to MMU notifier patch.

Regards,
Anup

2019-07-30 12:10:00

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 03/16] RISC-V: Add initial skeletal KVM support

On Tue, Jul 30, 2019 at 2:55 PM Paolo Bonzini <[email protected]> wrote:
>
> On 29/07/19 13:56, Anup Patel wrote:
> > +void kvm_riscv_halt_guest(struct kvm *kvm)
> > +{
> > + int i;
> > + struct kvm_vcpu *vcpu;
> > +
> > + kvm_for_each_vcpu(i, vcpu, kvm)
> > + vcpu->arch.pause = true;
> > + kvm_make_all_cpus_request(kvm, KVM_REQ_SLEEP);
> > +}
> > +
> > +void kvm_riscv_resume_guest(struct kvm *kvm)
> > +{
> > + int i;
> > + struct kvm_vcpu *vcpu;
> > +
> > + kvm_for_each_vcpu(i, vcpu, kvm) {
> > + vcpu->arch.pause = false;
> > + swake_up_one(kvm_arch_vcpu_wq(vcpu));
> > + }
>
> Are these unused? (Perhaps I'm just blind :))

Not used as of now.

The intention was to have APIs for freezing/unfreezing Guest
which can be used to do some work which is atomic from
Guest perspective.

I will remove it and add it back when required.

Regards,
Anup

2019-07-30 12:11:49

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 05/16] RISC-V: KVM: Implement VCPU interrupts and requests handling

First, something that is not clear to me: how do you deal with a guest
writing 1 to VSIP.SSIP? I think that could lead to lost interrupts if
you have the following sequence

1) guest writes 1 to VSIP.SSIP

2) guest leaves VS-mode

3) host syncs VSIP

4) user mode triggers interrupt

5) host reenters guest

6) host moves irqs_pending to VSIP and clears VSIP.SSIP in the process

Perhaps irqs_pending needs to be split in two fields, irqs_pending and
irqs_pending_mask, and then you can do this:

/*
* irqs_pending and irqs_pending_mask have multiple-producer/single-
* consumer semantics; therefore bits can be set in the mask without
* a lock, but clearing the bits requires vcpu_lock. Furthermore,
* consumers should never write to irqs_pending, and should not
* use bits of irqs_pending that weren't 1 in the mask.
*/

int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
{
...
set_bit(irq, &vcpu->arch.irqs_pending);
smp_mb__before_atomic();
set_bit(irq, &vcpu->arch.irqs_pending_mask);
kvm_vcpu_kick(vcpu);
}

int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
{
...
clear_bit(irq, &vcpu->arch.irqs_pending);
smp_mb__before_atomic();
set_bit(irq, &vcpu->arch.irqs_pending_mask);
}

static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
{
...
WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
}

and kvm_riscv_vcpu_flush_interrupts can leave aside VSIP bits that
aren't in vcpu->arch.irqs_pending_mask:

if (atomic_read(&vcpu->arch.irqs_pending_mask)) {
u32 mask, val;

mask = xchg_acquire(&vcpu->arch.irqs_pending_mask, 0);
val = READ_ONCE(vcpu->arch.irqs_pending) & mask;

vcpu->arch.guest_csr.vsip &= ~mask;
vcpu->arch.guest_csr.vsip |= val;
csr_write(CSR_VSIP, vsip);
}

Also, the getter of CSR_VSIP should call
kvm_riscv_vcpu_flush_interrupts, while the setter should clear
irqs_pending_mask.

On 29/07/19 13:56, Anup Patel wrote:
> + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> + kvm_vcpu_kick(vcpu);

The request is not needed as long as kvm_riscv_vcpu_flush_interrupts is
called *after* smp_store_mb(vcpu->mode, IN_GUEST_MODE) in
kvm_arch_vcpu_ioctl_run. This is the "request-less vCPU kick" pattern
in Documentation/virtual/kvm/vcpu-requests.rst. The smp_store_mb then
orders the write of IN_GUEST_MODE before the read of irqs_pending (or
irqs_pending_mask in my proposal above); in the producers, there is a
dual memory barrier in kvm_vcpu_exiting_guest_mode(), ordering the write
of irqs_pending(_mask) before the read of vcpu->mode.

Similar to other VS* CSRs, I'd rather have a ONE_REG interface for VSIE
and VSIP from the beginning as well. Note that the VSIP setter would
clear irqs_pending_mask, while the getter would call
kvm_riscv_vcpu_flush_interrupts before reading. It's up to userspace to
ensure that no interrupt injections happen between the calls to the
getter and the setter.

Paolo

> + csr_write(CSR_VSIP, vcpu->arch.irqs_pending);
> + vcpu->arch.guest_csr.vsip = vcpu->arch.irqs_pending;
> + }

2019-07-30 12:12:42

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 08/16] RISC-V: KVM: Handle MMIO exits for VCPU

On 29/07/19 13:57, Anup Patel wrote:
> +static ulong get_insn(struct kvm_vcpu *vcpu)
> +{
> + ulong __sepc = vcpu->arch.guest_context.sepc;
> + ulong __hstatus, __sstatus, __vsstatus;
> +#ifdef CONFIG_RISCV_ISA_C
> + ulong rvc_mask = 3, tmp;
> +#endif
> + ulong flags, val;
> +
> + local_irq_save(flags);
> +
> + __vsstatus = csr_read(CSR_VSSTATUS);
> + __sstatus = csr_read(CSR_SSTATUS);
> + __hstatus = csr_read(CSR_HSTATUS);
> +
> + csr_write(CSR_VSSTATUS, __vsstatus | SR_MXR);
> + csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus | SR_MXR);
> + csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
> +
> +#ifndef CONFIG_RISCV_ISA_C
> + asm ("\n"
> +#ifdef CONFIG_64BIT
> + STR(LWU) " %[insn], (%[addr])\n"
> +#else
> + STR(LW) " %[insn], (%[addr])\n"
> +#endif
> + : [insn] "=&r" (val) : [addr] "r" (__sepc));
> +#else
> + asm ("and %[tmp], %[addr], 2\n"
> + "bnez %[tmp], 1f\n"
> +#ifdef CONFIG_64BIT
> + STR(LWU) " %[insn], (%[addr])\n"
> +#else
> + STR(LW) " %[insn], (%[addr])\n"
> +#endif
> + "and %[tmp], %[insn], %[rvc_mask]\n"
> + "beq %[tmp], %[rvc_mask], 2f\n"
> + "sll %[insn], %[insn], %[xlen_minus_16]\n"
> + "srl %[insn], %[insn], %[xlen_minus_16]\n"
> + "j 2f\n"
> + "1:\n"
> + "lhu %[insn], (%[addr])\n"
> + "and %[tmp], %[insn], %[rvc_mask]\n"
> + "bne %[tmp], %[rvc_mask], 2f\n"
> + "lhu %[tmp], 2(%[addr])\n"
> + "sll %[tmp], %[tmp], 16\n"
> + "add %[insn], %[insn], %[tmp]\n"
> + "2:"
> + : [vsstatus] "+&r" (__vsstatus), [insn] "=&r" (val),
> + [tmp] "=&r" (tmp)
> + : [addr] "r" (__sepc), [rvc_mask] "r" (rvc_mask),
> + [xlen_minus_16] "i" (__riscv_xlen - 16));
> +#endif
> +
> + csr_write(CSR_HSTATUS, __hstatus);
> + csr_write(CSR_SSTATUS, __sstatus);
> + csr_write(CSR_VSSTATUS, __vsstatus);
> +
> + local_irq_restore(flags);
> +
> + return val;
> +}
> +

This also needs fixups for exceptions, because the guest can race
against the host and modify its page tables concurrently with the
vmexit. (How effective this is, of course, depends on how the TLB is
implemented in hardware, but you need to do the safe thing anyway).

Paolo

2019-07-30 12:16:07

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] KVM RISC-V Support

On 30/07/19 07:26, Anup Patel wrote:
> Here's a brief TODO list which we want to immediately work upon after this
> series:
> 1. Handle trap from unpriv access in SBI v0.1 emulation
> 2. In-kernel PLIC emulation
> 3. SBI v0.2 emulation in-kernel
> 4. SBI v0.2 hart hotplug emulation in-kernel
> 5. ..... and so on .....
>
> We will include above TODO list in v2 series cover letter as well.

I guess I gave you a bunch of extra items in today's more thorough
review. :)

BTW, since IPIs are handled in the SBI I wouldn't bother with in-kernel
PLIC emulation unless you can demonstrate performance improvements (for
example due to irqfd). In fact, it may be more interesting to add
plumbing for userspace handling of selected SBI calls (in addition to
get/putchar, sbi_system_reset and sbi_hart_down look like good
candidates in SBI v0.2).

> We were thinking to keep KVM RISC-V disabled by default (i.e. keep it
> experimental) until we have validated it on some FPGA or real HW. For now,
> users can explicitly enable it and play-around on QEMU emulation. I hope
> this is fine with most people ?

That's certainly okay with me.

Paolo

2019-07-30 12:16:58

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 04/16] RISC-V: KVM: Implement VCPU create, init and destroy functions

On Tue, Jul 30, 2019 at 3:46 PM Paolo Bonzini <[email protected]> wrote:
>
> On 30/07/19 10:48, Paolo Bonzini wrote:
> > On 29/07/19 13:56, Anup Patel wrote:
> >> + cntx->hstatus |= HSTATUS_SP2V;
> >> + cntx->hstatus |= HSTATUS_SP2P;
> > IIUC, cntx->hstatus's SP2P bit contains the guest's sstatus.SPP bit?
>
> Nevermind, that was also a bit confused. The guest's sstatus.SPP is in
> vsstatus. The pseudocode for V-mode switch is
>
> SRET:
> V = hstatus.SPV (1)
> MODE = sstatus.SPP
> hstatus.SPV = hstatus.SP2V
> sstatus.SPP = hstatus.SP2P
> hstatus.SP2V = 0
> hstatus.SP2P = 0
> ...
>
> trap:
> hstatus.SP2V = hstatus.SPV
> hstatus.SP2P = sstatus.SPP
> hstatus.SPV = V (1)
> sstatus.SPP = MODE
> V = 0
> MODE = 1
>

Yes, this kind of pseudo-code are not explicitly specified in the
RISC-V spec. The RISC-V formal model is supposed to cover
this kind of detailed HW state transition.

> so:
>
> 1) indeed we need SP2V=SPV=1 when entering guest mode
>
> 2) sstatus.SPP contains the guest mode
>
> 3) SP2P doesn't really matter for KVM since it never goes to VS-mode
> from an interrupt handler, so if my reasoning is correct I'd leave it
> clear, but I guess it's up to you whether to set it or not.

Yes, SP2P does not matter but we set it to 1 here so that from Guest
perspective it seems we were in S-mode previously.

Regards,
Anup

2019-07-30 12:17:27

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 04/16] RISC-V: KVM: Implement VCPU create, init and destroy functions

On 30/07/19 13:45, Anup Patel wrote:
>> so:
>>
>> 1) indeed we need SP2V=SPV=1 when entering guest mode
>>
>> 2) sstatus.SPP contains the guest mode
>>
>> 3) SP2P doesn't really matter for KVM since it never goes to VS-mode
>> from an interrupt handler, so if my reasoning is correct I'd leave it
>> clear, but I guess it's up to you whether to set it or not.
> Yes, SP2P does not matter but we set it to 1 here so that from Guest
> perspective it seems we were in S-mode previously.

But the guest never reads sstatus.SPP, it always reads, vsstatus.SPP
doesn't it? In any case it doesn't matter.

Paolo

2019-07-30 12:22:24

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls

On Tue, Jul 30, 2019 at 3:05 PM Paolo Bonzini <[email protected]> wrote:
>
> On 30/07/19 10:43, Paolo Bonzini wrote:
> > On 29/07/19 13:56, Anup Patel wrote:
> >> The PC register represents program counter whereas the MODE
> >> register represent VCPU privilege mode (i.e. S/U-mode).
> >>
> > Is there any reason to include this pseudo-register instead of allowing
> > SSTATUS access directly in this patch (and perhaps also SEPC)?
>
> Nevermind, I was confused - the current MODE is indeed not accessible as
> a "real" CSR in RISC-V.

Yes, you got it right.

>
> Still, I would prefer all the VS CSRs to be accessible via the get/set
> reg ioctls.

We had implemented VS CSRs access to user-space but then we
removed it to keep this series simple and easy to review. We thought
of adding it later when we deal with Guest/VM migration.

Do you want it to be added as part of this series ?

Regards,
Anup

2019-07-30 12:24:13

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls

On 30/07/19 14:08, Anup Patel wrote:
>> Still, I would prefer all the VS CSRs to be accessible via the get/set
>> reg ioctls.
> We had implemented VS CSRs access to user-space but then we
> removed it to keep this series simple and easy to review. We thought
> of adding it later when we deal with Guest/VM migration.
>
> Do you want it to be added as part of this series ?

Yes, please. It's not enough code to deserve a separate patch, and it
is useful for debugging.

Paolo

2019-07-30 12:47:43

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 05/16] RISC-V: KVM: Implement VCPU interrupts and requests handling

On Tue, Jul 30, 2019 at 5:42 PM Paolo Bonzini <[email protected]> wrote:
>
> On 30/07/19 14:00, Anup Patel wrote:
> > On Tue, Jul 30, 2019 at 4:47 PM Paolo Bonzini <[email protected]> wrote:
> >>
> >> First, something that is not clear to me: how do you deal with a guest
> >> writing 1 to VSIP.SSIP? I think that could lead to lost interrupts if
> >> you have the following sequence
> >>
> >> 1) guest writes 1 to VSIP.SSIP
> >>
> >> 2) guest leaves VS-mode
> >>
> >> 3) host syncs VSIP
> >>
> >> 4) user mode triggers interrupt
> >>
> >> 5) host reenters guest
> >>
> >> 6) host moves irqs_pending to VSIP and clears VSIP.SSIP in the process
> >
> > This reasoning also apply to M-mode firmware (OpenSBI) providing timer
> > and IPI services to HS-mode software. We had some discussion around
> > it in a different context.
> > (Refer, https://github.com/riscv/opensbi/issues/128)
> >
> > The thing is SIP CSR is supposed to be read-only for any S-mode SW. This
> > means HS-mode/VS-mode SW modifications to SIP CSR should have no
> > effect.
>
> Is it? The privileged specification says
>
> Interprocessor interrupts are sent to other harts by implementation-
> specific means, which will ultimately cause the SSIP bit to be set in
> the recipient hart’s sip register.

To further explain my rationale ...

Here's some text from RISC-V spec regarding MIP CSR:
"The mip register is an MXLEN-bit read/write register containing information
on pending interrupts, while mie is the corresponding MXLEN-bit read/write
register containing interrupt enable bits. Only the bits corresponding to
lower-privilege software interrupts (USIP, SSIP), timer interrupts (UTIP, STIP),
and external interrupts (UEIP, SEIP) in mip are writable through this CSR
address; the remaining bits are read-only."

Here's some text from RISC-V spec regarding SIP CSR:
"software interrupt-pending (SSIP) bit in the sip register. A pending
supervisor-level software interrupt can be cleared by writing 0 to the SSIP bit
in sip. Supervisor-level software interrupts are disabled when the SSIE bit in
the sie register is clear."

Without RISC-V hypervisor extension, the SIP is essentially a restricted
view of MIP CSR. Also as-per above, S-mode SW can only write 0 to SSIP
bit in SIP CSR whereas it can only be set by M-mode SW or some HW
mechanism (such as S-mode CLINT).

There was quite a bit of discussion in last RISC-V Zurich Workshop about
avoiding SBI calls for injecting IPIs. The best suggestion so far is to
eventually have RISC-V systems with separate CLINT HW for M-mode
and S-mode. The S-mode SW can use S-mode CLINT to trigger IPIs to
other CPUs and it will use SBI calls for IPIs only when S-mode CLINT is
not available.

>
> All bits besides SSIP in the sip register are read-only.
>
> Meaning that sending an IPI to self by writing 1 to sip.SSIP is
> well-defined. The same should be true of vsip.SSIP while in VS mode.
>
> > Do you still an issue here?
>
> Do you see any issues in the pseudocode I sent? It gets away with the
> spinlock and request so it may be a good idea anyway. :)

Yes, I am evaluating your psedocode right now. I definitely want to
remove the irq_pending_lock if possible. I will try to in-corporate your
suggestion in v2 series.

Regards,
Anup

2019-07-30 12:54:21

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 07/16] RISC-V: KVM: Implement VCPU world-switch

On Tue, Jul 30, 2019 at 3:04 PM Paolo Bonzini <[email protected]> wrote:
>
> On 29/07/19 13:57, Anup Patel wrote:
> > void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> > {
> > - /* TODO: */
> > + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
> > +
> > + csr_write(CSR_HIDELEG, csr->hideleg);
> > + csr_write(CSR_HEDELEG, csr->hedeleg);
>
> Writing HIDELEG and HEDELEG here seems either wrong or inefficient to me.
>
> I don't remember the spec well enough, but there are two cases:
>
> 1) either they only matter while the guest runs and then you can set
> them in kvm_arch_hardware_enable. KVM common code takes care of doing
> this on all CPUs for you.

This is a good suggestion. I will use kvm_arch_hardware_enable() for
programming HIDELEG and HEDELEG CSRs.

>
> 2) or they also matter while the host runs and then you need to set them
> in vcpu_switch.S.

They don't matter in HS-mode so we don't need to access them in
vcpu_switch.S

Regards,
Anup

2019-07-30 13:49:57

by Andreas Schwab

[permalink] [raw]
Subject: Re: [RFC PATCH 13/16] RISC-V: KVM: Add timer functionality

On Jul 29 2019, Atish Patra <[email protected]> wrote:

> Strange. We never saw this error.

It is part of CONFIG_KERNEL_HEADER_TEST. Everyone developing a driver
should enable it.

> #include <linux/types.h>
>
> Can you try it at your end and confirm please ?

Confirmed.

Andreas.

--
Andreas Schwab, SUSE Labs, [email protected]
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

2019-07-30 14:58:11

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 15/16] RISC-V: KVM: Add SBI v0.1 support

On 29/07/19 13:57, Anup Patel wrote:
> From: Atish Patra <[email protected]>
>
> The KVM host kernel running in HS-mode needs to handle SBI calls coming
> from guest kernel running in VS-mode.
>
> This patch adds SBI v0.1 support in KVM RISC-V. All the SBI calls are
> implemented correctly except remote tlb flushes. For remote TLB flushes,
> we are doing full TLB flush and this will be optimized in future.
>
> Signed-off-by: Atish Patra <[email protected]>
> Signed-off-by: Anup Patel <[email protected]>
> ---
> arch/riscv/include/asm/kvm_host.h | 2 +
> arch/riscv/kvm/Makefile | 2 +-
> arch/riscv/kvm/vcpu_exit.c | 3 +
> arch/riscv/kvm/vcpu_sbi.c | 118 ++++++++++++++++++++++++++++++
> 4 files changed, 124 insertions(+), 1 deletion(-)
> create mode 100644 arch/riscv/kvm/vcpu_sbi.c
>
> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> index 1bb4befa89da..22a62ffc09f5 100644
> --- a/arch/riscv/include/asm/kvm_host.h
> +++ b/arch/riscv/include/asm/kvm_host.h
> @@ -227,4 +227,6 @@ void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
> void kvm_riscv_halt_guest(struct kvm *kvm);
> void kvm_riscv_resume_guest(struct kvm *kvm);
>
> +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu);
> +
> #endif /* __RISCV_KVM_HOST_H__ */
> diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> index 3e0c7558320d..b56dc1650d2c 100644
> --- a/arch/riscv/kvm/Makefile
> +++ b/arch/riscv/kvm/Makefile
> @@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
> kvm-objs := $(common-objs-y)
>
> kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
> -kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
> +kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o vcpu_sbi.o
>
> obj-$(CONFIG_KVM) += kvm.o
> diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
> index 2d09640c98b2..003e43facdfc 100644
> --- a/arch/riscv/kvm/vcpu_exit.c
> +++ b/arch/riscv/kvm/vcpu_exit.c
> @@ -531,6 +531,9 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
> (vcpu->arch.guest_context.hstatus & HSTATUS_STL))
> ret = stage2_page_fault(vcpu, run, scause, stval);
> break;
> + case EXC_SUPERVISOR_SYSCALL:
> + if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
> + ret = kvm_riscv_vcpu_sbi_ecall(vcpu);
> default:
> break;
> };
> diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
> new file mode 100644
> index 000000000000..8dfdbf744378
> --- /dev/null
> +++ b/arch/riscv/kvm/vcpu_sbi.c
> @@ -0,0 +1,118 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/**
> + * Copyright (c) 2019 Western Digital Corporation or its affiliates.
> + *
> + * Authors:
> + * Atish Patra <[email protected]>
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/err.h>
> +#include <linux/kvm_host.h>
> +#include <asm/csr.h>
> +#include <asm/kvm_vcpu_timer.h>
> +
> +#define SBI_VERSION_MAJOR 0
> +#define SBI_VERSION_MINOR 1
> +
> +static unsigned long kvm_sbi_unpriv_load(const unsigned long *addr,
> + struct kvm_vcpu *vcpu)
> +{
> + unsigned long flags, val;
> + unsigned long __hstatus, __sstatus;
> +
> + local_irq_save(flags);
> + __hstatus = csr_read(CSR_HSTATUS);
> + __sstatus = csr_read(CSR_SSTATUS);
> + csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
> + csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus);
> + val = *addr;
> + csr_write(CSR_HSTATUS, __hstatus);
> + csr_write(CSR_SSTATUS, __sstatus);
> + local_irq_restore(flags);
> +
> + return val;
> +}
> +
> +static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu, u32 type)
> +{
> + int i;
> + struct kvm_vcpu *tmp;
> +
> + kvm_for_each_vcpu(i, tmp, vcpu->kvm)
> + tmp->arch.power_off = true;
> + kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
> +
> + memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
> + vcpu->run->system_event.type = type;
> + vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> +}
> +
> +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu)
> +{
> + int ret = 1;
> + u64 next_cycle;
> + int vcpuid;
> + struct kvm_vcpu *remote_vcpu;
> + ulong dhart_mask;
> + struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> +
> + if (!cp)
> + return -EINVAL;
> + switch (cp->a7) {
> + case SBI_SET_TIMER:
> +#if __riscv_xlen == 32
> + next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
> +#else
> + next_cycle = (u64)cp->a0;
> +#endif
> + kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle);
> + break;
> + case SBI_CONSOLE_PUTCHAR:
> + /* Not implemented */
> + cp->a0 = -ENOTSUPP;
> + break;
> + case SBI_CONSOLE_GETCHAR:
> + /* Not implemented */
> + cp->a0 = -ENOTSUPP;
> + break;

Would it make sense to send these two down to userspace?

Paolo

> + case SBI_CLEAR_IPI:
> + kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_SOFT);
> + break;
> + case SBI_SEND_IPI:
> + dhart_mask = kvm_sbi_unpriv_load((unsigned long *)cp->a0, vcpu);
> + for_each_set_bit(vcpuid, &dhart_mask, BITS_PER_LONG) {
> + remote_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, vcpuid);
> + kvm_riscv_vcpu_set_interrupt(remote_vcpu, IRQ_S_SOFT);
> + }
> + break;
> + case SBI_SHUTDOWN:
> + kvm_sbi_system_shutdown(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN);
> + ret = 0;
> + break;
> + case SBI_REMOTE_FENCE_I:
> + sbi_remote_fence_i(NULL);
> + break;
> +
> + /*TODO:There should be a way to call remote hfence.bvma.
> + * Preferred method is now a SBI call. Until then, just flush
> + * all tlbs.
> + */
> + case SBI_REMOTE_SFENCE_VMA:
> + /*TODO: Parse vma range.*/
> + sbi_remote_sfence_vma(NULL, 0, 0);
> + break;
> + case SBI_REMOTE_SFENCE_VMA_ASID:
> + /*TODO: Parse vma range for given ASID */
> + sbi_remote_sfence_vma(NULL, 0, 0);
> + break;
> + default:
> + cp->a0 = ENOTSUPP;
> + break;
> + };
> +
> + if (ret >= 0)
> + cp->sepc += 4;
> +
> + return ret;
> +}
>

2019-07-30 15:43:58

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 04/16] RISC-V: KVM: Implement VCPU create, init and destroy functions

On 30/07/19 10:48, Paolo Bonzini wrote:
> On 29/07/19 13:56, Anup Patel wrote:
>> + cntx->hstatus |= HSTATUS_SP2V;
>> + cntx->hstatus |= HSTATUS_SP2P;
> IIUC, cntx->hstatus's SP2P bit contains the guest's sstatus.SPP bit?

Nevermind, that was also a bit confused. The guest's sstatus.SPP is in
vsstatus. The pseudocode for V-mode switch is

SRET:
V = hstatus.SPV (1)
MODE = sstatus.SPP
hstatus.SPV = hstatus.SP2V
sstatus.SPP = hstatus.SP2P
hstatus.SP2V = 0
hstatus.SP2P = 0
...

trap:
hstatus.SP2V = hstatus.SPV
hstatus.SP2P = sstatus.SPP
hstatus.SPV = V (1)
sstatus.SPP = MODE
V = 0
MODE = 1

so:

1) indeed we need SP2V=SPV=1 when entering guest mode

2) sstatus.SPP contains the guest mode

3) SP2P doesn't really matter for KVM since it never goes to VS-mode
from an interrupt handler, so if my reasoning is correct I'd leave it
clear, but I guess it's up to you whether to set it or not.

Paolo

> I suggest adding a comment here, and again providing a ONE_REG interface
> to sstatus so that the ABI is final before RISC-V KVM is merged.
>
> What happens if the guest executes SRET? Is that EXC_SYSCALL in hedeleg?
>
> (BTW the name of SP2V and SP2P is horrible, I think HPV/HPP or HSPV/HSPP
> would have been clearer, but that's not your fault).

2019-07-30 15:50:16

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 13/16] RISC-V: KVM: Add timer functionality

On 29/07/19 13:57, Anup Patel wrote:
> + if (delta_ns > VCPU_TIMER_PROGRAM_THRESHOLD_NS) {
> + hrtimer_start(&t->hrt, ktime_add_ns(ktime_get(), delta_ns),

I think the guest would prefer if you saved the time before enabling
interrupts on the host, and use that here instead of ktime_get().
Otherwise the timer could be delayed arbitrarily by host interrupts.

(Because the RISC-V SBI timer is relative only---which is
unfortunate---guests will already pay a latency price due to the extra
cost of the SBI call compared to a bare metal implementation. Sooner or
later you may want to implement something like x86's heuristic to
advance the timer deadline by a few hundred nanoseconds; perhaps add a
TODO now).

Paolo

> + HRTIMER_MODE_ABS);
> + t->is_set = true;
> + } else
> + kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_TIMER);
> +

2019-07-30 15:54:04

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 05/16] RISC-V: KVM: Implement VCPU interrupts and requests handling

On Tue, Jul 30, 2019 at 4:47 PM Paolo Bonzini <[email protected]> wrote:
>
> First, something that is not clear to me: how do you deal with a guest
> writing 1 to VSIP.SSIP? I think that could lead to lost interrupts if
> you have the following sequence
>
> 1) guest writes 1 to VSIP.SSIP
>
> 2) guest leaves VS-mode
>
> 3) host syncs VSIP
>
> 4) user mode triggers interrupt
>
> 5) host reenters guest
>
> 6) host moves irqs_pending to VSIP and clears VSIP.SSIP in the process

This reasoning also apply to M-mode firmware (OpenSBI) providing timer
and IPI services to HS-mode software. We had some discussion around
it in a different context.
(Refer, https://github.com/riscv/opensbi/issues/128)

The thing is SIP CSR is supposed to be read-only for any S-mode SW. This
means HS-mode/VS-mode SW modifications to SIP CSR should have no
effect.

For HS-mode, only certain bits are writable from M-mode such as SSIP
and in-future even this will go away when we have specialized HW to
trigger S-mode IPIs without going through M-mode firmware.

For VS-mode, only HS-mode controls the pending bits writes to VSIP CSR.

If above is honored correctly by HW then the use-case you mentioned above
is not possible because Guest writing 1 to SIP.SSIP will be ignored.

It is possible that we have buggy HW which does allow Guest write to SIP
CSR bits then our current approach is to just overwrite VSIP whenver it
is different from irq_pending bits before entering Guest.

Do you still an issue here?

Regards,
Anup

>
> Perhaps irqs_pending needs to be split in two fields, irqs_pending and
> irqs_pending_mask, and then you can do this:
>
> /*
> * irqs_pending and irqs_pending_mask have multiple-producer/single-
> * consumer semantics; therefore bits can be set in the mask without
> * a lock, but clearing the bits requires vcpu_lock. Furthermore,
> * consumers should never write to irqs_pending, and should not
> * use bits of irqs_pending that weren't 1 in the mask.
> */
>
> int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
> {
> ...
> set_bit(irq, &vcpu->arch.irqs_pending);
> smp_mb__before_atomic();
> set_bit(irq, &vcpu->arch.irqs_pending_mask);
> kvm_vcpu_kick(vcpu);
> }
>
> int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
> {
> ...
> clear_bit(irq, &vcpu->arch.irqs_pending);
> smp_mb__before_atomic();
> set_bit(irq, &vcpu->arch.irqs_pending_mask);
> }
>
> static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
> {
> ...
> WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
> }
>
> and kvm_riscv_vcpu_flush_interrupts can leave aside VSIP bits that
> aren't in vcpu->arch.irqs_pending_mask:
>
> if (atomic_read(&vcpu->arch.irqs_pending_mask)) {
> u32 mask, val;
>
> mask = xchg_acquire(&vcpu->arch.irqs_pending_mask, 0);
> val = READ_ONCE(vcpu->arch.irqs_pending) & mask;
>
> vcpu->arch.guest_csr.vsip &= ~mask;
> vcpu->arch.guest_csr.vsip |= val;
> csr_write(CSR_VSIP, vsip);
> }
>
> Also, the getter of CSR_VSIP should call
> kvm_riscv_vcpu_flush_interrupts, while the setter should clear
> irqs_pending_mask.
>
> On 29/07/19 13:56, Anup Patel wrote:
> > + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > + kvm_vcpu_kick(vcpu);
>
> The request is not needed as long as kvm_riscv_vcpu_flush_interrupts is
> called *after* smp_store_mb(vcpu->mode, IN_GUEST_MODE) in
> kvm_arch_vcpu_ioctl_run. This is the "request-less vCPU kick" pattern
> in Documentation/virtual/kvm/vcpu-requests.rst. The smp_store_mb then
> orders the write of IN_GUEST_MODE before the read of irqs_pending (or
> irqs_pending_mask in my proposal above); in the producers, there is a
> dual memory barrier in kvm_vcpu_exiting_guest_mode(), ordering the write
> of irqs_pending(_mask) before the read of vcpu->mode.
>
> Similar to other VS* CSRs, I'd rather have a ONE_REG interface for VSIE
> and VSIP from the beginning as well. Note that the VSIP setter would
> clear irqs_pending_mask, while the getter would call
> kvm_riscv_vcpu_flush_interrupts before reading. It's up to userspace to
> ensure that no interrupt injections happen between the calls to the
> getter and the setter.
>
> Paolo
>
> > + csr_write(CSR_VSIP, vcpu->arch.irqs_pending);
> > + vcpu->arch.guest_csr.vsip = vcpu->arch.irqs_pending;
> > + }
>

2019-07-30 15:54:31

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 11/16] RISC-V: KVM: Implement stage2 page table programming

On Tue, Jul 30, 2019 at 2:30 PM Paolo Bonzini <[email protected]> wrote:
>
> On 29/07/19 13:57, Anup Patel wrote:
> > This patch implements all required functions for programming
> > the stage2 page table for each Guest/VM.
> >
> > At high-level, the flow of stage2 related functions is similar
> > from KVM ARM/ARM64 implementation but the stage2 page table
> > format is quite different for KVM RISC-V.
>
> FWIW I very much prefer KVM x86's recursive implementation of the MMU to
> the hardcoding of pgd/pmd/pte. I am not asking you to rewrite it, but
> I'll mention it because I noticed that you do not support 48-bit guest
> physical addresses.

Yes, I also prefer recursive page table programming. In fact, the first
hypervisor we ported for RISC-V was Xvisor and over there have
recursive page table programming for both stage1 and stage2.

BTW, 48bit VA and guest physical address is already defined in
latest RISC-V spec. It's just that there is not HW (or QEMU) implementation
as of now for 4-level page table.

I will certainly add this to our TODO list.

Regards,
Anup

2019-07-30 16:02:01

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 05/16] RISC-V: KVM: Implement VCPU interrupts and requests handling

On 30/07/19 14:45, Anup Patel wrote:
> Here's some text from RISC-V spec regarding SIP CSR:
> "software interrupt-pending (SSIP) bit in the sip register. A pending
> supervisor-level software interrupt can be cleared by writing 0 to the SSIP bit
> in sip. Supervisor-level software interrupts are disabled when the SSIE bit in
> the sie register is clear."
>
> Without RISC-V hypervisor extension, the SIP is essentially a restricted
> view of MIP CSR. Also as-per above, S-mode SW can only write 0 to SSIP
> bit in SIP CSR whereas it can only be set by M-mode SW or some HW
> mechanism (such as S-mode CLINT).

But that's not what the spec says. It just says (just before the
sentence you quoted):

A supervisor-level software interrupt is triggered on the current
hart by writing 1 to its supervisor software interrupt-pending (SSIP)
bit in the sip register.

and it's not written anywhere that S-mode SW cannot write 1. In fact
that text is even under sip, not under mip, so IMO there's no doubt that
S-mode SW _can_ write 1, and the hypervisor must operate accordingly.

In fact I'm sure that if Windows were ever ported to RISC-V, it would be
very happy to use that feature. On x86, Intel even accelerated it
specifically for Microsoft. :)

Paolo

2019-07-30 16:11:58

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 05/16] RISC-V: KVM: Implement VCPU interrupts and requests handling

On Tue, Jul 30, 2019 at 6:48 PM Paolo Bonzini <[email protected]> wrote:
>
> On 30/07/19 14:45, Anup Patel wrote:
> > Here's some text from RISC-V spec regarding SIP CSR:
> > "software interrupt-pending (SSIP) bit in the sip register. A pending
> > supervisor-level software interrupt can be cleared by writing 0 to the SSIP bit
> > in sip. Supervisor-level software interrupts are disabled when the SSIE bit in
> > the sie register is clear."
> >
> > Without RISC-V hypervisor extension, the SIP is essentially a restricted
> > view of MIP CSR. Also as-per above, S-mode SW can only write 0 to SSIP
> > bit in SIP CSR whereas it can only be set by M-mode SW or some HW
> > mechanism (such as S-mode CLINT).
>
> But that's not what the spec says. It just says (just before the
> sentence you quoted):
>
> A supervisor-level software interrupt is triggered on the current
> hart by writing 1 to its supervisor software interrupt-pending (SSIP)
> bit in the sip register.

Unfortunately, this statement does not state who is allowed to write 1
in SIP.SSIP bit.

I quoted MIP CSR documentation to highlight the fact that only M-mode
SW can set SSIP bit.

In fact, I had same understanding as you have regarding SSIP bit
until we had MSIP issue in OpenSBI.
(https://github.com/riscv/opensbi/issues/128)

>
> and it's not written anywhere that S-mode SW cannot write 1. In fact
> that text is even under sip, not under mip, so IMO there's no doubt that
> S-mode SW _can_ write 1, and the hypervisor must operate accordingly.

Without hypervisor support, SIP CSR is nothing but a restricted view of
MIP CSR thats why MIP CSR documentation applies here.

I think this discussion deserves a Github issue on RISC-V ISA manual.

If my interpretation is incorrect then it would be really strange that
HART in S-mode SW can inject IPI to itself by writing 1 to SIP.SSIP bit.

>
> In fact I'm sure that if Windows were ever ported to RISC-V, it would be
> very happy to use that feature. On x86, Intel even accelerated it
> specifically for Microsoft. :)

That would be indeed very strange usage. :)

Regards,
Anup

2019-07-30 16:33:56

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] KVM RISC-V Support

On 30/07/19 15:50, Anup Patel wrote:
>> BTW, since IPIs are handled in the SBI I wouldn't bother with in-kernel
>> PLIC emulation unless you can demonstrate performance improvements (for
>> example due to irqfd). In fact, it may be more interesting to add
>
> I thought VHOST requires irqfd and we would certainly endup providing
> in-kernel PLIC emulation to support VHOST.

vhost only needs an eventfd, userspace can poll the eventfd and inject
the irq as usual with KVM_INTERRUPT. Of course that can be slower, but
you can benchmark it and see if it's indeed a good reason for in-kernel
PLIC.

>> plumbing for userspace handling of selected SBI calls (in addition to
>> get/putchar, sbi_system_reset and sbi_hart_down look like good
>> candidates in SBI v0.2).
>
> The get/putchar SBI v0.1 calls won't be encouraged going forward because
> we already have earlycon implmentation in-place and Guest kernel can directly
> write to UART registers for earlyprints.

> If we still wanted to implement get/putchar calls then we would need a RISC-V
> specific exit reason in KVM. We have tried to avoid RISC-V specific IOCTLs
> or exit reason in this series.

Sounds good.

Paolo

>>
>>> We were thinking to keep KVM RISC-V disabled by default (i.e. keep it
>>> experimental) until we have validated it on some FPGA or real HW. For now,
>>> users can explicitly enable it and play-around on QEMU emulation. I hope
>>> this is fine with most people ?
>>
>> That's certainly okay with me.
>>
>
> Thanks,
> Anup
>

2019-07-30 16:34:16

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 05/16] RISC-V: KVM: Implement VCPU interrupts and requests handling

On 30/07/19 15:35, Anup Patel wrote:
> On Tue, Jul 30, 2019 at 6:48 PM Paolo Bonzini <[email protected]> wrote:
>>
>> On 30/07/19 14:45, Anup Patel wrote:
>>> Here's some text from RISC-V spec regarding SIP CSR:
>>> "software interrupt-pending (SSIP) bit in the sip register. A pending
>>> supervisor-level software interrupt can be cleared by writing 0 to the SSIP bit
>>> in sip. Supervisor-level software interrupts are disabled when the SSIE bit in
>>> the sie register is clear."
>>>
>>> Without RISC-V hypervisor extension, the SIP is essentially a restricted
>>> view of MIP CSR. Also as-per above, S-mode SW can only write 0 to SSIP
>>> bit in SIP CSR whereas it can only be set by M-mode SW or some HW
>>> mechanism (such as S-mode CLINT).
>>
>> But that's not what the spec says. It just says (just before the
>> sentence you quoted):
>>
>> A supervisor-level software interrupt is triggered on the current
>> hart by writing 1 to its supervisor software interrupt-pending (SSIP)
>> bit in the sip register.
>
> Unfortunately, this statement does not state who is allowed to write 1
> in SIP.SSIP bit.

If it doesn't state who is allowed to write 1, whoever has access to sip
can.

> I quoted MIP CSR documentation to highlight the fact that only M-mode
> SW can set SSIP bit.
>
> In fact, I had same understanding as you have regarding SSIP bit
> until we had MSIP issue in OpenSBI.
> (https://github.com/riscv/opensbi/issues/128)
>
>> and it's not written anywhere that S-mode SW cannot write 1. In fact
>> that text is even under sip, not under mip, so IMO there's no doubt that
>> S-mode SW _can_ write 1, and the hypervisor must operate accordingly.
>
> Without hypervisor support, SIP CSR is nothing but a restricted view of
> MIP CSR thats why MIP CSR documentation applies here.

But the privileged spec says mip.MSIP is read-only, it cannot be cleared
(as in the above OpenSBI issue). So mip.MSIP and sip.SSIP are already
different in that respect, and I don't see how the spec says that S-mode
SW cannot set sip.SSIP.

(As an aside, why would M-mode even bother using sip and not mip to
write 1 to SSIP?).

> I think this discussion deserves a Github issue on RISC-V ISA manual.

Perhaps, but I think it makes more sense this way. The question remains
of why M-mode is not allowed to write to MSIP/MEIP/MTIP. My guess is
that then MSIP/MEIP/MTIP are simply a read-only view of an external pin,
so it simplifies hardware a tiny bit by forcing acks to go through the
MMIO registers.

> If my interpretation is incorrect then it would be really strange that
> HART in S-mode SW can inject IPI to itself by writing 1 to SIP.SSIP bit.

Well, it can be useful, for example Windows does it when interrupt
handlers want to schedule some work to happen out of interrupt context.
Going through SBI would be unpleasant if it causes an HS-mode trap.

Paolo

>>
>> In fact I'm sure that if Windows were ever ported to RISC-V, it would be
>> very happy to use that feature. On x86, Intel even accelerated it
>> specifically for Microsoft. :)
>
> That would be indeed very strange usage. :)
>
> Regards,
> Anup
>

2019-07-30 17:11:31

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls

On Tue, Jul 30, 2019 at 5:40 PM Paolo Bonzini <[email protected]> wrote:
>
> On 30/07/19 14:08, Anup Patel wrote:
> >> Still, I would prefer all the VS CSRs to be accessible via the get/set
> >> reg ioctls.
> > We had implemented VS CSRs access to user-space but then we
> > removed it to keep this series simple and easy to review. We thought
> > of adding it later when we deal with Guest/VM migration.
> >
> > Do you want it to be added as part of this series ?
>
> Yes, please. It's not enough code to deserve a separate patch, and it
> is useful for debugging.

Sure, I will add it in v2 series.

We have skipped Guest FP ONE_REG interface with same rationale.
We should add that as well. Agree ?

Regards,
Anup

2019-07-30 17:11:39

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 05/16] RISC-V: KVM: Implement VCPU interrupts and requests handling

On 30/07/19 14:00, Anup Patel wrote:
> On Tue, Jul 30, 2019 at 4:47 PM Paolo Bonzini <[email protected]> wrote:
>>
>> First, something that is not clear to me: how do you deal with a guest
>> writing 1 to VSIP.SSIP? I think that could lead to lost interrupts if
>> you have the following sequence
>>
>> 1) guest writes 1 to VSIP.SSIP
>>
>> 2) guest leaves VS-mode
>>
>> 3) host syncs VSIP
>>
>> 4) user mode triggers interrupt
>>
>> 5) host reenters guest
>>
>> 6) host moves irqs_pending to VSIP and clears VSIP.SSIP in the process
>
> This reasoning also apply to M-mode firmware (OpenSBI) providing timer
> and IPI services to HS-mode software. We had some discussion around
> it in a different context.
> (Refer, https://github.com/riscv/opensbi/issues/128)
>
> The thing is SIP CSR is supposed to be read-only for any S-mode SW. This
> means HS-mode/VS-mode SW modifications to SIP CSR should have no
> effect.

Is it? The privileged specification says

Interprocessor interrupts are sent to other harts by implementation-
specific means, which will ultimately cause the SSIP bit to be set in
the recipient hart’s sip register.

All bits besides SSIP in the sip register are read-only.

Meaning that sending an IPI to self by writing 1 to sip.SSIP is
well-defined. The same should be true of vsip.SSIP while in VS mode.

> Do you still an issue here?

Do you see any issues in the pseudocode I sent? It gets away with the
spinlock and request so it may be a good idea anyway. :)

Paolo

> Regards,
> Anup
>
>>
>> Perhaps irqs_pending needs to be split in two fields, irqs_pending and
>> irqs_pending_mask, and then you can do this:
>>
>> /*
>> * irqs_pending and irqs_pending_mask have multiple-producer/single-
>> * consumer semantics; therefore bits can be set in the mask without
>> * a lock, but clearing the bits requires vcpu_lock. Furthermore,
>> * consumers should never write to irqs_pending, and should not
>> * use bits of irqs_pending that weren't 1 in the mask.
>> */
>>
>> int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
>> {
>> ...
>> set_bit(irq, &vcpu->arch.irqs_pending);
>> smp_mb__before_atomic();
>> set_bit(irq, &vcpu->arch.irqs_pending_mask);
>> kvm_vcpu_kick(vcpu);
>> }
>>
>> int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
>> {
>> ...
>> clear_bit(irq, &vcpu->arch.irqs_pending);
>> smp_mb__before_atomic();
>> set_bit(irq, &vcpu->arch.irqs_pending_mask);
>> }
>>
>> static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
>> {
>> ...
>> WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
>> }
>>
>> and kvm_riscv_vcpu_flush_interrupts can leave aside VSIP bits that
>> aren't in vcpu->arch.irqs_pending_mask:
>>
>> if (atomic_read(&vcpu->arch.irqs_pending_mask)) {
>> u32 mask, val;
>>
>> mask = xchg_acquire(&vcpu->arch.irqs_pending_mask, 0);
>> val = READ_ONCE(vcpu->arch.irqs_pending) & mask;
>>
>> vcpu->arch.guest_csr.vsip &= ~mask;
>> vcpu->arch.guest_csr.vsip |= val;
>> csr_write(CSR_VSIP, vsip);
>> }
>>
>> Also, the getter of CSR_VSIP should call
>> kvm_riscv_vcpu_flush_interrupts, while the setter should clear
>> irqs_pending_mask.
>>
>> On 29/07/19 13:56, Anup Patel wrote:
>>> + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>>> + kvm_vcpu_kick(vcpu);
>>
>> The request is not needed as long as kvm_riscv_vcpu_flush_interrupts is
>> called *after* smp_store_mb(vcpu->mode, IN_GUEST_MODE) in
>> kvm_arch_vcpu_ioctl_run. This is the "request-less vCPU kick" pattern
>> in Documentation/virtual/kvm/vcpu-requests.rst. The smp_store_mb then
>> orders the write of IN_GUEST_MODE before the read of irqs_pending (or
>> irqs_pending_mask in my proposal above); in the producers, there is a
>> dual memory barrier in kvm_vcpu_exiting_guest_mode(), ordering the write
>> of irqs_pending(_mask) before the read of vcpu->mode.
>>
>> Similar to other VS* CSRs, I'd rather have a ONE_REG interface for VSIE
>> and VSIP from the beginning as well. Note that the VSIP setter would
>> clear irqs_pending_mask, while the getter would call
>> kvm_riscv_vcpu_flush_interrupts before reading. It's up to userspace to
>> ensure that no interrupt injections happen between the calls to the
>> getter and the setter.
>>
>> Paolo
>>
>>> + csr_write(CSR_VSIP, vcpu->arch.irqs_pending);
>>> + vcpu->arch.guest_csr.vsip = vcpu->arch.irqs_pending;
>>> + }
>>

2019-07-30 17:50:51

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] KVM RISC-V Support

On Tue, Jul 30, 2019 at 5:03 PM Paolo Bonzini <[email protected]> wrote:
>
> On 30/07/19 07:26, Anup Patel wrote:
> > Here's a brief TODO list which we want to immediately work upon after this
> > series:
> > 1. Handle trap from unpriv access in SBI v0.1 emulation
> > 2. In-kernel PLIC emulation
> > 3. SBI v0.2 emulation in-kernel
> > 4. SBI v0.2 hart hotplug emulation in-kernel
> > 5. ..... and so on .....
> >
> > We will include above TODO list in v2 series cover letter as well.
>
> I guess I gave you a bunch of extra items in today's more thorough
> review. :)

Thanks, your review comments are very useful. We will address all
of them.

>
> BTW, since IPIs are handled in the SBI I wouldn't bother with in-kernel
> PLIC emulation unless you can demonstrate performance improvements (for
> example due to irqfd). In fact, it may be more interesting to add

I thought VHOST requires irqfd and we would certainly endup providing
in-kernel PLIC emulation to support VHOST.

> plumbing for userspace handling of selected SBI calls (in addition to
> get/putchar, sbi_system_reset and sbi_hart_down look like good
> candidates in SBI v0.2).

The get/putchar SBI v0.1 calls won't be encouraged going forward because
we already have earlycon implmentation in-place and Guest kernel can directly
write to UART registers for earlyprints.

If we still wanted to implement get/putchar calls then we would need a RISC-V
specific exit reason in KVM. We have tried to avoid RISC-V specific IOCTLs
or exit reason in this series.

>
> > We were thinking to keep KVM RISC-V disabled by default (i.e. keep it
> > experimental) until we have validated it on some FPGA or real HW. For now,
> > users can explicitly enable it and play-around on QEMU emulation. I hope
> > this is fine with most people ?
>
> That's certainly okay with me.
>

Thanks,
Anup

2019-07-31 05:36:05

by Atish Patra

[permalink] [raw]
Subject: Re: [RFC PATCH 13/16] RISC-V: KVM: Add timer functionality

On Tue, 2019-07-30 at 13:26 +0200, Paolo Bonzini wrote:
> On 29/07/19 13:57, Anup Patel wrote:
> > + if (delta_ns > VCPU_TIMER_PROGRAM_THRESHOLD_NS) {
> > + hrtimer_start(&t->hrt, ktime_add_ns(ktime_get(),
> > delta_ns),
>
> I think the guest would prefer if you saved the time before enabling
> interrupts on the host, and use that here instead of ktime_get().
> Otherwise the timer could be delayed arbitrarily by host interrupts.
>
> (Because the RISC-V SBI timer is relative only---which is
> unfortunate---

Just to clarify: RISC-V SBI timer call passes absolute time.

https://elixir.bootlin.com/linux/v5.3-rc2/source/drivers/clocksource/timer-riscv.c#L32

That's why we compute a delta between absolute time passed via SBI and
current time. hrtimer is programmed to trigger only after the delta
time from now.


> guests will already pay a latency price due to the extra
> cost of the SBI call compared to a bare metal implementation.

Yes. There are ongoing discussions to remove this SBI call completely.
Hopefully, that will happen before any real hardware with
virtualization support shows up :).

> Sooner or
> later you may want to implement something like x86's heuristic to
> advance the timer deadline by a few hundred nanoseconds; perhaps add
> a
> TODO now).
>

I am not aware of this approach. I will take a look. Thanks.

Regards,
Atish
> Paolo
>
> > + HRTIMER_MODE_ABS);
> > + t->is_set = true;
> > + } else
> > + kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_TIMER);
> > +
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2019-07-31 07:30:56

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 13/16] RISC-V: KVM: Add timer functionality

On 31/07/19 03:55, Atish Patra wrote:
> On Tue, 2019-07-30 at 13:26 +0200, Paolo Bonzini wrote:
>> On 29/07/19 13:57, Anup Patel wrote:
>>> + if (delta_ns > VCPU_TIMER_PROGRAM_THRESHOLD_NS) {
>>> + hrtimer_start(&t->hrt, ktime_add_ns(ktime_get(),
>>> delta_ns),
>>
>> I think the guest would prefer if you saved the time before enabling
>> interrupts on the host, and use that here instead of ktime_get().
>> Otherwise the timer could be delayed arbitrarily by host interrupts.
>>
>> (Because the RISC-V SBI timer is relative only---which is
>> unfortunate---
>
> Just to clarify: RISC-V SBI timer call passes absolute time.
>
> https://elixir.bootlin.com/linux/v5.3-rc2/source/drivers/clocksource/timer-riscv.c#L32
>
> That's why we compute a delta between absolute time passed via SBI and
> current time. hrtimer is programmed to trigger only after the delta
> time from now.

Nevermind, I got lost in all the conversions.

One important issue is the lack of ability to program a delta between
HS/HU-mode cycles and VS/VU-mode cycles. Without this, it's impossible
to do virtual machine migration (except with hcounteren
trap-and-emulate, which I think we agree is not acceptable). I found
the open issue at https://github.com/riscv/riscv-isa-manual/issues/298
and commented on it.

Paolo

2019-07-31 07:45:05

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 08/16] RISC-V: KVM: Handle MMIO exits for VCPU

On Tue, Jul 30, 2019 at 4:50 PM Paolo Bonzini <[email protected]> wrote:
>
> On 29/07/19 13:57, Anup Patel wrote:
> > +static ulong get_insn(struct kvm_vcpu *vcpu)
> > +{
> > + ulong __sepc = vcpu->arch.guest_context.sepc;
> > + ulong __hstatus, __sstatus, __vsstatus;
> > +#ifdef CONFIG_RISCV_ISA_C
> > + ulong rvc_mask = 3, tmp;
> > +#endif
> > + ulong flags, val;
> > +
> > + local_irq_save(flags);
> > +
> > + __vsstatus = csr_read(CSR_VSSTATUS);
> > + __sstatus = csr_read(CSR_SSTATUS);
> > + __hstatus = csr_read(CSR_HSTATUS);
> > +
> > + csr_write(CSR_VSSTATUS, __vsstatus | SR_MXR);
> > + csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus | SR_MXR);
> > + csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
> > +
> > +#ifndef CONFIG_RISCV_ISA_C
> > + asm ("\n"
> > +#ifdef CONFIG_64BIT
> > + STR(LWU) " %[insn], (%[addr])\n"
> > +#else
> > + STR(LW) " %[insn], (%[addr])\n"
> > +#endif
> > + : [insn] "=&r" (val) : [addr] "r" (__sepc));
> > +#else
> > + asm ("and %[tmp], %[addr], 2\n"
> > + "bnez %[tmp], 1f\n"
> > +#ifdef CONFIG_64BIT
> > + STR(LWU) " %[insn], (%[addr])\n"
> > +#else
> > + STR(LW) " %[insn], (%[addr])\n"
> > +#endif
> > + "and %[tmp], %[insn], %[rvc_mask]\n"
> > + "beq %[tmp], %[rvc_mask], 2f\n"
> > + "sll %[insn], %[insn], %[xlen_minus_16]\n"
> > + "srl %[insn], %[insn], %[xlen_minus_16]\n"
> > + "j 2f\n"
> > + "1:\n"
> > + "lhu %[insn], (%[addr])\n"
> > + "and %[tmp], %[insn], %[rvc_mask]\n"
> > + "bne %[tmp], %[rvc_mask], 2f\n"
> > + "lhu %[tmp], 2(%[addr])\n"
> > + "sll %[tmp], %[tmp], 16\n"
> > + "add %[insn], %[insn], %[tmp]\n"
> > + "2:"
> > + : [vsstatus] "+&r" (__vsstatus), [insn] "=&r" (val),
> > + [tmp] "=&r" (tmp)
> > + : [addr] "r" (__sepc), [rvc_mask] "r" (rvc_mask),
> > + [xlen_minus_16] "i" (__riscv_xlen - 16));
> > +#endif
> > +
> > + csr_write(CSR_HSTATUS, __hstatus);
> > + csr_write(CSR_SSTATUS, __sstatus);
> > + csr_write(CSR_VSSTATUS, __vsstatus);
> > +
> > + local_irq_restore(flags);
> > +
> > + return val;
> > +}
> > +
>
> This also needs fixups for exceptions, because the guest can race
> against the host and modify its page tables concurrently with the
> vmexit. (How effective this is, of course, depends on how the TLB is
> implemented in hardware, but you need to do the safe thing anyway).

For Guest with single VCPU, we won't see any issue but we might
get an exception for Guest with multiple VCPUs. We have added this
in our TODO list.

In this context, I have proposed to have separate CSR holding trapped
instruction value so that we don't need to use unpriv load/store for figuring
out trapped instruction.

Refer, https://github.com/riscv/riscv-isa-manual/issues/394

The above Github issue and missing time delta CSR will be last
two unaddressed Github issues from RISC-V spec perspective.

Regards,
Anup

2019-07-31 09:37:32

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 13/16] RISC-V: KVM: Add timer functionality

On Wed, Jul 31, 2019 at 12:28 PM Paolo Bonzini <[email protected]> wrote:
>
> On 31/07/19 03:55, Atish Patra wrote:
> > On Tue, 2019-07-30 at 13:26 +0200, Paolo Bonzini wrote:
> >> On 29/07/19 13:57, Anup Patel wrote:
> >>> + if (delta_ns > VCPU_TIMER_PROGRAM_THRESHOLD_NS) {
> >>> + hrtimer_start(&t->hrt, ktime_add_ns(ktime_get(),
> >>> delta_ns),
> >>
> >> I think the guest would prefer if you saved the time before enabling
> >> interrupts on the host, and use that here instead of ktime_get().
> >> Otherwise the timer could be delayed arbitrarily by host interrupts.
> >>
> >> (Because the RISC-V SBI timer is relative only---which is
> >> unfortunate---
> >
> > Just to clarify: RISC-V SBI timer call passes absolute time.
> >
> > https://elixir.bootlin.com/linux/v5.3-rc2/source/drivers/clocksource/timer-riscv.c#L32
> >
> > That's why we compute a delta between absolute time passed via SBI and
> > current time. hrtimer is programmed to trigger only after the delta
> > time from now.
>
> Nevermind, I got lost in all the conversions.
>
> One important issue is the lack of ability to program a delta between
> HS/HU-mode cycles and VS/VU-mode cycles. Without this, it's impossible
> to do virtual machine migration (except with hcounteren
> trap-and-emulate, which I think we agree is not acceptable). I found
> the open issue at https://github.com/riscv/riscv-isa-manual/issues/298
> and commented on it.

This Github issue is open since quite some time now.

Thanks for commenting. I have pinged RISC-V spec maintainers as well.

Regards,
Anup

2019-08-02 10:37:50

by Anup Patel

[permalink] [raw]
Subject: Re: [RFC PATCH 05/16] RISC-V: KVM: Implement VCPU interrupts and requests handling

On Tue, Jul 30, 2019 at 7:38 PM Paolo Bonzini <[email protected]> wrote:
>
> On 30/07/19 15:35, Anup Patel wrote:
> > On Tue, Jul 30, 2019 at 6:48 PM Paolo Bonzini <[email protected]> wrote:
> >>
> >> On 30/07/19 14:45, Anup Patel wrote:
> >>> Here's some text from RISC-V spec regarding SIP CSR:
> >>> "software interrupt-pending (SSIP) bit in the sip register. A pending
> >>> supervisor-level software interrupt can be cleared by writing 0 to the SSIP bit
> >>> in sip. Supervisor-level software interrupts are disabled when the SSIE bit in
> >>> the sie register is clear."
> >>>
> >>> Without RISC-V hypervisor extension, the SIP is essentially a restricted
> >>> view of MIP CSR. Also as-per above, S-mode SW can only write 0 to SSIP
> >>> bit in SIP CSR whereas it can only be set by M-mode SW or some HW
> >>> mechanism (such as S-mode CLINT).
> >>
> >> But that's not what the spec says. It just says (just before the
> >> sentence you quoted):
> >>
> >> A supervisor-level software interrupt is triggered on the current
> >> hart by writing 1 to its supervisor software interrupt-pending (SSIP)
> >> bit in the sip register.
> >
> > Unfortunately, this statement does not state who is allowed to write 1
> > in SIP.SSIP bit.
>
> If it doesn't state who is allowed to write 1, whoever has access to sip
> can.
>
> > I quoted MIP CSR documentation to highlight the fact that only M-mode
> > SW can set SSIP bit.
> >
> > In fact, I had same understanding as you have regarding SSIP bit
> > until we had MSIP issue in OpenSBI.
> > (https://github.com/riscv/opensbi/issues/128)
> >
> >> and it's not written anywhere that S-mode SW cannot write 1. In fact
> >> that text is even under sip, not under mip, so IMO there's no doubt that
> >> S-mode SW _can_ write 1, and the hypervisor must operate accordingly.
> >
> > Without hypervisor support, SIP CSR is nothing but a restricted view of
> > MIP CSR thats why MIP CSR documentation applies here.
>
> But the privileged spec says mip.MSIP is read-only, it cannot be cleared
> (as in the above OpenSBI issue). So mip.MSIP and sip.SSIP are already
> different in that respect, and I don't see how the spec says that S-mode
> SW cannot set sip.SSIP.
>
> (As an aside, why would M-mode even bother using sip and not mip to
> write 1 to SSIP?).
>
> > I think this discussion deserves a Github issue on RISC-V ISA manual.
>
> Perhaps, but I think it makes more sense this way. The question remains
> of why M-mode is not allowed to write to MSIP/MEIP/MTIP. My guess is
> that then MSIP/MEIP/MTIP are simply a read-only view of an external pin,
> so it simplifies hardware a tiny bit by forcing acks to go through the
> MMIO registers.
>
> > If my interpretation is incorrect then it would be really strange that
> > HART in S-mode SW can inject IPI to itself by writing 1 to SIP.SSIP bit.
>
> Well, it can be useful, for example Windows does it when interrupt
> handlers want to schedule some work to happen out of interrupt context.
> Going through SBI would be unpleasant if it causes an HS-mode trap.

Another way of artificially injecting interrupt would be using interrupt
controller, where Windows can just write to some pending register of
interrupt controller.

I have raised a new Github issue on GitHub for clarity on this. You can
add your comments to this issue as well.
https://github.com/riscv/riscv-isa-manual/issues/425

Also, I have raised a proposal to support mechanism for external entity
(such as PLICv2 with virtualization support) to inject virtual interrupts.
https://github.com/riscv/riscv-isa-manual/issues/429

Regards,
Anup