2021-04-01 18:21:50

by Anup Patel

[permalink] [raw]
Subject: [PATCH v17 00/17] KVM RISC-V Support

This series adds initial KVM RISC-V support. Currently, we are able to boot
Linux on RV64/RV32 Guest with multiple VCPUs.

Key aspects of KVM RISC-V added by this series are:
1. No RISC-V specific KVM IOCTL
2. Minimal possible KVM world-switch which touches only GPRs and few CSRs
3. Both RV64 and RV32 host supported
4. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure
5. KVM ONE_REG interface for VCPU register access from user-space
6. PLIC emulation is done in user-space
7. Timer and IPI emuation is done in-kernel
8. Both Sv39x4 and Sv48x4 supported for RV64 host
9. MMU notifiers supported
10. Generic dirtylog supported
11. FP lazy save/restore supported
12. SBI v0.1 emulation for KVM Guest available
13. Forward unhandled SBI calls to KVM userspace
14. Hugepage support for Guest/VM
15. IOEVENTFD support for Vhost

Here's a brief TODO list which we will work upon after this series:
1. SBI v0.2 emulation in-kernel
2. SBI v0.2 hart state management emulation in-kernel
3. In-kernel PLIC emulation
4. ..... and more .....

This series can be found in riscv_kvm_v17 branch at:
https//github.com/avpatel/linux.git

Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v7 branch
at: https//github.com/avpatel/kvmtool.git

The QEMU RISC-V hypervisor emulation is done by Alistair and is available
in master branch at: https://git.qemu.org/git/qemu.git

To play around with KVM RISC-V, refer KVM RISC-V wiki at:
https://github.com/kvm-riscv/howto/wiki
https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-QEMU
https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-Spike

Changes since v16:
- Rebased on Linux-5.12-rc5
- Remove redundant kvm_arch_create_memslot(), kvm_arch_vcpu_setup(),
kvm_arch_vcpu_init(), kvm_arch_has_vcpu_debugfs(), and
kvm_arch_create_vcpu_debugfs() from PATCH5
- Make stage2_wp_memory_region() and stage2_ioremap() as static
in PATCH13

Changes since v15:
- Rebased on Linux-5.11-rc3
- Fixed kvm_stage2_map() to use gfn_to_pfn_prot() for determing
writeability of a host pfn.
- Use "__u64" in-place of "u64" and "__u32" in-place of "u32" for
uapi/asm/kvm.h

Changes since v14:
- Rebased on Linux-5.10-rc3
- Fixed Stage2 (G-stage) PDG allocation to ensure it is 16KB aligned

Changes since v13:
- Rebased on Linux-5.9-rc3
- Fixed kvm_riscv_vcpu_set_reg_csr() for SIP updation in PATCH5
- Fixed instruction length computation in PATCH7
- Added ioeventfd support in PATCH7
- Ensure HSTATUS.SPVP is set to correct value before using HLV/HSV
intructions in PATCH7
- Fixed stage2_map_page() to set PTE 'A' and 'D' bits correctly
in PATCH10
- Added stage2 dirty page logging in PATCH10
- Allow KVM user-space to SET/GET SCOUNTER CSR in PATCH5
- Save/restore SCOUNTEREN in PATCH6
- Reduced quite a few instructions for __kvm_riscv_switch_to() by
using CSR swap instruction in PATCH6
- Detect and use Sv48x4 when available in PATCH10

Changes since v12:
- Rebased patches on Linux-5.8-rc4
- By default enable all counters in HCOUNTEREN
- RISC-V H-Extension v0.6.1 spec support

Changes since v11:
- Rebased patches on Linux-5.7-rc3
- Fixed typo in typecast of stage2_map_size define
- Introduced struct kvm_cpu_trap to represent trap details and
use it as function parameter wherever applicable
- Pass memslot to kvm_riscv_stage2_map() for supporing dirty page
logging in future
- RISC-V H-Extension v0.6 spec support
- Send-out first three patches as separate series so that it can
be taken by Palmer for Linux RISC-V

Changes since v10:
- Rebased patches on Linux-5.6-rc5
- Reduce RISCV_ISA_EXT_MAX from 256 to 64
- Separate PATCH for removing N-extension related defines
- Added comments as requested by Palmer
- Fixed HIDELEG CSR programming

Changes since v9:
- Rebased patches on Linux-5.5-rc3
- Squash PATCH19 and PATCH20 into PATCH5
- Squash PATCH18 into PATCH11
- Squash PATCH17 into PATCH16
- Added ONE_REG interface for VCPU timer in PATCH13
- Use HTIMEDELTA for VCPU timer in PATCH13
- Updated KVM RISC-V mailing list in MAINTAINERS entry
- Update KVM kconfig option to depend on RISCV_SBI and MMU
- Check for SBI v0.2 and SBI v0.2 RFENCE extension at boot-time
- Use SBI v0.2 RFENCE extension in VMID implementation
- Use SBI v0.2 RFENCE extension in Stage2 MMU implementation
- Use SBI v0.2 RFENCE extension in SBI implementation
- Moved to RISC-V Hypervisor v0.5 draft spec
- Updated Documentation/virt/kvm/api.txt for timer ONE_REG interface

Changes since v8:
- Rebased series on Linux-5.4-rc3 and Atish's SBI v0.2 patches
- Use HRTIMER_MODE_REL instead of HRTIMER_MODE_ABS in timer emulation
- Fixed kvm_riscv_stage2_map() to handle hugepages
- Added patch to forward unhandled SBI calls to user-space
- Added patch for iterative/recursive stage2 page table programming
- Added patch to remove per-CPU vsip_shadow variable
- Added patch to fix race-condition in kvm_riscv_vcpu_sync_interrupts()

Changes since v7:
- Rebased series on Linux-5.4-rc1 and Atish's SBI v0.2 patches
- Removed PATCH1, PATCH3, and PATCH20 because these already merged
- Use kernel doc style comments for ISA bitmap functions
- Don't parse X, Y, and Z extension in riscv_fill_hwcap() because it will
be added in-future
- Mark KVM RISC-V kconfig option as EXPERIMENTAL
- Typo fix in commit description of PATCH6 of v7 series
- Use separate structs for CORE and CSR registers of ONE_REG interface
- Explicitly include asm/sbi.h in kvm/vcpu_sbi.c
- Removed implicit switch-case fall-through in kvm_riscv_vcpu_exit()
- No need to set VSSTATUS.MXR bit in kvm_riscv_vcpu_unpriv_read()
- Removed register for instruction length in kvm_riscv_vcpu_unpriv_read()
- Added defines for checking/decoding instruction length
- Added separate patch to forward unhandled SBI calls to userspace tool

Changes since v6:
- Rebased patches on Linux-5.3-rc7
- Added "return_handled" in struct kvm_mmio_decode to ensure that
kvm_riscv_vcpu_mmio_return() updates SEPC only once
- Removed trap_stval parameter from kvm_riscv_vcpu_unpriv_read()
- Updated git repo URL in MAINTAINERS entry

Changes since v5:
- Renamed KVM_REG_RISCV_CONFIG_TIMEBASE register to
KVM_REG_RISCV_CONFIG_TBFREQ register in ONE_REG interface
- Update SPEC in kvm_riscv_vcpu_mmio_return() for MMIO exits
- Use switch case instead of illegal instruction opcode table for simplicity
- Improve comments in stage2_remote_tlb_flush() for a potential remote TLB
flush optimization
- Handle all unsupported SBI calls in default case of
kvm_riscv_vcpu_sbi_ecall() function
- Fixed kvm_riscv_vcpu_sync_interrupts() for software interrupts
- Improved unprivilege reads to handle traps due to Guest stage1 page table
- Added separate patch to document RISC-V specific things in
Documentation/virt/kvm/api.txt

Changes since v4:
- Rebased patches on Linux-5.3-rc5
- Added Paolo's Acked-by and Reviewed-by
- Updated mailing list in MAINTAINERS entry

Changes since v3:
- Moved patch for ISA bitmap from KVM prep series to this series
- Make vsip_shadow as run-time percpu variable instead of compile-time
- Flush Guest TLBs on all Host CPUs whenever we run-out of VMIDs

Changes since v2:
- Removed references of KVM_REQ_IRQ_PENDING from all patches
- Use kvm->srcu within in-kernel KVM run loop
- Added percpu vsip_shadow to track last value programmed in VSIP CSR
- Added comments about irqs_pending and irqs_pending_mask
- Used kvm_arch_vcpu_runnable() in-place-of kvm_riscv_vcpu_has_interrupt()
in system_opcode_insn()
- Removed unwanted smp_wmb() in kvm_riscv_stage2_vmid_update()
- Use kvm_flush_remote_tlbs() in kvm_riscv_stage2_vmid_update()
- Use READ_ONCE() in kvm_riscv_stage2_update_hgatp() for vmid

Changes since v1:
- Fixed compile errors in building KVM RISC-V as module
- Removed unused kvm_riscv_halt_guest() and kvm_riscv_resume_guest()
- Set KVM_CAP_SYNC_MMU capability only after MMU notifiers are implemented
- Made vmid_version as unsigned long instead of atomic
- Renamed KVM_REQ_UPDATE_PGTBL to KVM_REQ_UPDATE_HGATP
- Renamed kvm_riscv_stage2_update_pgtbl() to kvm_riscv_stage2_update_hgatp()
- Configure HIDELEG and HEDELEG in kvm_arch_hardware_enable()
- Updated ONE_REG interface for CSR access to user-space
- Removed irqs_pending_lock and use atomic bitops instead
- Added separate patch for FP ONE_REG interface
- Added separate patch for updating MAINTAINERS file

Anup Patel (13):
RISC-V: Add hypervisor extension related CSR defines
RISC-V: Add initial skeletal KVM support
RISC-V: KVM: Implement VCPU create, init and destroy functions
RISC-V: KVM: Implement VCPU interrupts and requests handling
RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
RISC-V: KVM: Implement VCPU world-switch
RISC-V: KVM: Handle MMIO exits for VCPU
RISC-V: KVM: Handle WFI exits for VCPU
RISC-V: KVM: Implement VMID allocator
RISC-V: KVM: Implement stage2 page table programming
RISC-V: KVM: Implement MMU notifiers
RISC-V: KVM: Document RISC-V specific parts of KVM API
RISC-V: KVM: Add MAINTAINERS entry

Atish Patra (4):
RISC-V: KVM: Add timer functionality
RISC-V: KVM: FP lazy save/restore
RISC-V: KVM: Implement ONE REG interface for FP registers
RISC-V: KVM: Add SBI v0.1 support

Documentation/virt/kvm/api.rst | 193 ++++-
MAINTAINERS | 11 +
arch/riscv/Kconfig | 1 +
arch/riscv/Makefile | 2 +
arch/riscv/include/asm/csr.h | 89 +++
arch/riscv/include/asm/kvm_host.h | 277 +++++++
arch/riscv/include/asm/kvm_types.h | 7 +
arch/riscv/include/asm/kvm_vcpu_timer.h | 44 ++
arch/riscv/include/asm/pgtable-bits.h | 1 +
arch/riscv/include/uapi/asm/kvm.h | 128 +++
arch/riscv/kernel/asm-offsets.c | 156 ++++
arch/riscv/kvm/Kconfig | 36 +
arch/riscv/kvm/Makefile | 15 +
arch/riscv/kvm/main.c | 118 +++
arch/riscv/kvm/mmu.c | 854 ++++++++++++++++++++
arch/riscv/kvm/tlb.S | 74 ++
arch/riscv/kvm/vcpu.c | 997 ++++++++++++++++++++++++
arch/riscv/kvm/vcpu_exit.c | 701 +++++++++++++++++
arch/riscv/kvm/vcpu_sbi.c | 173 ++++
arch/riscv/kvm/vcpu_switch.S | 400 ++++++++++
arch/riscv/kvm/vcpu_timer.c | 225 ++++++
arch/riscv/kvm/vm.c | 81 ++
arch/riscv/kvm/vmid.c | 120 +++
drivers/clocksource/timer-riscv.c | 8 +
include/clocksource/timer-riscv.h | 16 +
include/uapi/linux/kvm.h | 8 +
26 files changed, 4726 insertions(+), 9 deletions(-)
create mode 100644 arch/riscv/include/asm/kvm_host.h
create mode 100644 arch/riscv/include/asm/kvm_types.h
create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
create mode 100644 arch/riscv/include/uapi/asm/kvm.h
create mode 100644 arch/riscv/kvm/Kconfig
create mode 100644 arch/riscv/kvm/Makefile
create mode 100644 arch/riscv/kvm/main.c
create mode 100644 arch/riscv/kvm/mmu.c
create mode 100644 arch/riscv/kvm/tlb.S
create mode 100644 arch/riscv/kvm/vcpu.c
create mode 100644 arch/riscv/kvm/vcpu_exit.c
create mode 100644 arch/riscv/kvm/vcpu_sbi.c
create mode 100644 arch/riscv/kvm/vcpu_switch.S
create mode 100644 arch/riscv/kvm/vcpu_timer.c
create mode 100644 arch/riscv/kvm/vm.c
create mode 100644 arch/riscv/kvm/vmid.c
create mode 100644 include/clocksource/timer-riscv.h

--
2.25.1


2021-04-01 18:21:57

by Anup Patel

[permalink] [raw]
Subject: [PATCH v17 11/17] RISC-V: KVM: Implement MMU notifiers

This patch implements MMU notifiers for KVM RISC-V so that Guest
physical address space is in-sync with Host physical address space.

This will allow swapping, page migration, etc to work transparently
with KVM RISC-V.

Signed-off-by: Anup Patel <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alexander Graf <[email protected]>
---
arch/riscv/include/asm/kvm_host.h | 7 ++
arch/riscv/kvm/Kconfig | 1 +
arch/riscv/kvm/mmu.c | 144 +++++++++++++++++++++++++++++-
arch/riscv/kvm/vm.c | 1 +
4 files changed, 149 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index f80c394312b8..4e318137d82a 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -202,6 +202,13 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}

+#define KVM_ARCH_WANT_MMU_NOTIFIER
+int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start,
+ unsigned long end, unsigned int flags);
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
+int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
+
void __kvm_riscv_hfence_gvma_vmid_gpa(unsigned long gpa, unsigned long vmid);
void __kvm_riscv_hfence_gvma_vmid(unsigned long vmid);
void __kvm_riscv_hfence_gvma_gpa(unsigned long gpa);
diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig
index 633063edaee8..a712bb910cda 100644
--- a/arch/riscv/kvm/Kconfig
+++ b/arch/riscv/kvm/Kconfig
@@ -20,6 +20,7 @@ if VIRTUALIZATION
config KVM
tristate "Kernel-based Virtual Machine (KVM) support (EXPERIMENTAL)"
depends on RISCV_SBI && MMU
+ select MMU_NOTIFIER
select PREEMPT_NOTIFIERS
select ANON_INODES
select KVM_MMIO
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 4c533a41b887..b64704aaed7d 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -296,7 +296,8 @@ static void stage2_op_pte(struct kvm *kvm, gpa_t addr,
}
}

-static void stage2_unmap_range(struct kvm *kvm, gpa_t start, gpa_t size)
+static void stage2_unmap_range(struct kvm *kvm, gpa_t start,
+ gpa_t size, bool may_block)
{
int ret;
pte_t *ptep;
@@ -321,6 +322,13 @@ static void stage2_unmap_range(struct kvm *kvm, gpa_t start, gpa_t size)

next:
addr += page_size;
+
+ /*
+ * If the range is too large, release the kvm->mmu_lock
+ * to prevent starvation and lockup detector warnings.
+ */
+ if (may_block && addr < end)
+ cond_resched_lock(&kvm->mmu_lock);
}
}

@@ -404,6 +412,38 @@ static int stage2_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa,

}

+static int handle_hva_to_gpa(struct kvm *kvm,
+ unsigned long start,
+ unsigned long end,
+ int (*handler)(struct kvm *kvm,
+ gpa_t gpa, u64 size,
+ void *data),
+ void *data)
+{
+ struct kvm_memslots *slots;
+ struct kvm_memory_slot *memslot;
+ int ret = 0;
+
+ slots = kvm_memslots(kvm);
+
+ /* we only care about the pages that the guest sees */
+ kvm_for_each_memslot(memslot, slots) {
+ unsigned long hva_start, hva_end;
+ gfn_t gpa;
+
+ hva_start = max(start, memslot->userspace_addr);
+ hva_end = min(end, memslot->userspace_addr +
+ (memslot->npages << PAGE_SHIFT));
+ if (hva_start >= hva_end)
+ continue;
+
+ gpa = hva_to_gfn_memslot(hva_start, memslot) << PAGE_SHIFT;
+ ret |= handler(kvm, gpa, (u64)(hva_end - hva_start), data);
+ }
+
+ return ret;
+}
+
void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset,
@@ -543,7 +583,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
spin_lock(&kvm->mmu_lock);
if (ret)
stage2_unmap_range(kvm, mem->guest_phys_addr,
- mem->memory_size);
+ mem->memory_size, false);
spin_unlock(&kvm->mmu_lock);

out:
@@ -551,6 +591,96 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
return ret;
}

+static int kvm_unmap_hva_handler(struct kvm *kvm,
+ gpa_t gpa, u64 size, void *data)
+{
+ unsigned int flags = *(unsigned int *)data;
+ bool may_block = flags & MMU_NOTIFIER_RANGE_BLOCKABLE;
+
+ stage2_unmap_range(kvm, gpa, size, may_block);
+ return 0;
+}
+
+int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start,
+ unsigned long end, unsigned int flags)
+{
+ if (!kvm->arch.pgd)
+ return 0;
+
+ handle_hva_to_gpa(kvm, start, end, &kvm_unmap_hva_handler, &flags);
+ return 0;
+}
+
+static int kvm_set_spte_handler(struct kvm *kvm,
+ gpa_t gpa, u64 size, void *data)
+{
+ pte_t *pte = (pte_t *)data;
+
+ WARN_ON(size != PAGE_SIZE);
+ stage2_set_pte(kvm, 0, NULL, gpa, pte);
+
+ return 0;
+}
+
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+{
+ unsigned long end = hva + PAGE_SIZE;
+ kvm_pfn_t pfn = pte_pfn(pte);
+ pte_t stage2_pte;
+
+ if (!kvm->arch.pgd)
+ return 0;
+
+ stage2_pte = pfn_pte(pfn, PAGE_WRITE_EXEC);
+ handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &stage2_pte);
+
+ return 0;
+}
+
+static int kvm_age_hva_handler(struct kvm *kvm,
+ gpa_t gpa, u64 size, void *data)
+{
+ pte_t *ptep;
+ u32 ptep_level = 0;
+
+ WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PGDIR_SIZE);
+
+ if (!stage2_get_leaf_entry(kvm, gpa, &ptep, &ptep_level))
+ return 0;
+
+ return ptep_test_and_clear_young(NULL, 0, ptep);
+}
+
+int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
+{
+ if (!kvm->arch.pgd)
+ return 0;
+
+ return handle_hva_to_gpa(kvm, start, end, kvm_age_hva_handler, NULL);
+}
+
+static int kvm_test_age_hva_handler(struct kvm *kvm,
+ gpa_t gpa, u64 size, void *data)
+{
+ pte_t *ptep;
+ u32 ptep_level = 0;
+
+ WARN_ON(size != PAGE_SIZE && size != PMD_SIZE);
+ if (!stage2_get_leaf_entry(kvm, gpa, &ptep, &ptep_level))
+ return 0;
+
+ return pte_young(*ptep);
+}
+
+int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+{
+ if (!kvm->arch.pgd)
+ return 0;
+
+ return handle_hva_to_gpa(kvm, hva, hva,
+ kvm_test_age_hva_handler, NULL);
+}
+
int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,
struct kvm_memory_slot *memslot,
gpa_t gpa, unsigned long hva, bool is_write)
@@ -565,7 +695,7 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,
struct kvm_mmu_page_cache *pcache = &vcpu->arch.mmu_page_cache;
bool logging = (memslot->dirty_bitmap &&
!(memslot->flags & KVM_MEM_READONLY)) ? true : false;
- unsigned long vma_pagesize;
+ unsigned long vma_pagesize, mmu_seq;

mmap_read_lock(current->mm);

@@ -604,6 +734,8 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,
return ret;
}

+ mmu_seq = kvm->mmu_notifier_seq;
+
hfn = gfn_to_pfn_prot(kvm, gfn, is_write, &writeable);
if (hfn == KVM_PFN_ERR_HWPOISON) {
send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva,
@@ -622,6 +754,9 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,

spin_lock(&kvm->mmu_lock);

+ if (mmu_notifier_retry(kvm, mmu_seq))
+ goto out_unlock;
+
if (writeable) {
kvm_set_pfn_dirty(hfn);
mark_page_dirty(kvm, gfn);
@@ -635,6 +770,7 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,
if (ret)
kvm_err("Failed to map in stage2\n");

+out_unlock:
spin_unlock(&kvm->mmu_lock);
kvm_set_pfn_accessed(hfn);
kvm_release_pfn_clean(hfn);
@@ -671,7 +807,7 @@ void kvm_riscv_stage2_free_pgd(struct kvm *kvm)

spin_lock(&kvm->mmu_lock);
if (kvm->arch.pgd) {
- stage2_unmap_range(kvm, 0UL, stage2_gpa_size);
+ stage2_unmap_range(kvm, 0UL, stage2_gpa_size, false);
pgd = READ_ONCE(kvm->arch.pgd);
kvm->arch.pgd = NULL;
kvm->arch.pgd_phys = 0;
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index 6cde69a82252..00a1a88008be 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -49,6 +49,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_IOEVENTFD:
case KVM_CAP_DEVICE_CTRL:
case KVM_CAP_USER_MEMORY:
+ case KVM_CAP_SYNC_MMU:
case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
case KVM_CAP_ONE_REG:
case KVM_CAP_READONLY_MEM:
--
2.25.1

2021-04-01 18:22:13

by Anup Patel

[permalink] [raw]
Subject: [PATCH v17 14/17] RISC-V: KVM: Implement ONE REG interface for FP registers

From: Atish Patra <[email protected]>

Add a KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctl interface for floating
point registers such as F0-F31 and FCSR. This support is added for
both 'F' and 'D' extensions.

Signed-off-by: Atish Patra <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alexander Graf <[email protected]>
---
arch/riscv/include/uapi/asm/kvm.h | 10 +++
arch/riscv/kvm/vcpu.c | 104 ++++++++++++++++++++++++++++++
2 files changed, 114 insertions(+)

diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index 08691dd27bcf..f808ad1ce500 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -113,6 +113,16 @@ struct kvm_riscv_timer {
#define KVM_REG_RISCV_TIMER_REG(name) \
(offsetof(struct kvm_riscv_timer, name) / sizeof(__u64))

+/* F extension registers are mapped as type 5 */
+#define KVM_REG_RISCV_FP_F (0x05 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_FP_F_REG(name) \
+ (offsetof(struct __riscv_f_ext_state, name) / sizeof(__u32))
+
+/* D extension registers are mapped as type 6 */
+#define KVM_REG_RISCV_FP_D (0x06 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_FP_D_REG(name) \
+ (offsetof(struct __riscv_d_ext_state, name) / sizeof(__u64))
+
#endif

#endif /* __LINUX_KVM_RISCV_H */
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 581fa55f7232..a797f247db64 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -416,6 +416,98 @@ static int kvm_riscv_vcpu_set_reg_csr(struct kvm_vcpu *vcpu,
return 0;
}

+static int kvm_riscv_vcpu_get_reg_fp(struct kvm_vcpu *vcpu,
+ const struct kvm_one_reg *reg,
+ unsigned long rtype)
+{
+ struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
+ unsigned long isa = vcpu->arch.isa;
+ unsigned long __user *uaddr =
+ (unsigned long __user *)(unsigned long)reg->addr;
+ unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+ KVM_REG_SIZE_MASK |
+ rtype);
+ void *reg_val;
+
+ if ((rtype == KVM_REG_RISCV_FP_F) &&
+ riscv_isa_extension_available(&isa, f)) {
+ if (KVM_REG_SIZE(reg->id) != sizeof(u32))
+ return -EINVAL;
+ if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr))
+ reg_val = &cntx->fp.f.fcsr;
+ else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) &&
+ reg_num <= KVM_REG_RISCV_FP_F_REG(f[31]))
+ reg_val = &cntx->fp.f.f[reg_num];
+ else
+ return -EINVAL;
+ } else if ((rtype == KVM_REG_RISCV_FP_D) &&
+ riscv_isa_extension_available(&isa, d)) {
+ if (reg_num == KVM_REG_RISCV_FP_D_REG(fcsr)) {
+ if (KVM_REG_SIZE(reg->id) != sizeof(u32))
+ return -EINVAL;
+ reg_val = &cntx->fp.d.fcsr;
+ } else if ((KVM_REG_RISCV_FP_D_REG(f[0]) <= reg_num) &&
+ reg_num <= KVM_REG_RISCV_FP_D_REG(f[31])) {
+ if (KVM_REG_SIZE(reg->id) != sizeof(u64))
+ return -EINVAL;
+ reg_val = &cntx->fp.d.f[reg_num];
+ } else
+ return -EINVAL;
+ } else
+ return -EINVAL;
+
+ if (copy_to_user(uaddr, reg_val, KVM_REG_SIZE(reg->id)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int kvm_riscv_vcpu_set_reg_fp(struct kvm_vcpu *vcpu,
+ const struct kvm_one_reg *reg,
+ unsigned long rtype)
+{
+ struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
+ unsigned long isa = vcpu->arch.isa;
+ unsigned long __user *uaddr =
+ (unsigned long __user *)(unsigned long)reg->addr;
+ unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+ KVM_REG_SIZE_MASK |
+ rtype);
+ void *reg_val;
+
+ if ((rtype == KVM_REG_RISCV_FP_F) &&
+ riscv_isa_extension_available(&isa, f)) {
+ if (KVM_REG_SIZE(reg->id) != sizeof(u32))
+ return -EINVAL;
+ if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr))
+ reg_val = &cntx->fp.f.fcsr;
+ else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) &&
+ reg_num <= KVM_REG_RISCV_FP_F_REG(f[31]))
+ reg_val = &cntx->fp.f.f[reg_num];
+ else
+ return -EINVAL;
+ } else if ((rtype == KVM_REG_RISCV_FP_D) &&
+ riscv_isa_extension_available(&isa, d)) {
+ if (reg_num == KVM_REG_RISCV_FP_D_REG(fcsr)) {
+ if (KVM_REG_SIZE(reg->id) != sizeof(u32))
+ return -EINVAL;
+ reg_val = &cntx->fp.d.fcsr;
+ } else if ((KVM_REG_RISCV_FP_D_REG(f[0]) <= reg_num) &&
+ reg_num <= KVM_REG_RISCV_FP_D_REG(f[31])) {
+ if (KVM_REG_SIZE(reg->id) != sizeof(u64))
+ return -EINVAL;
+ reg_val = &cntx->fp.d.f[reg_num];
+ } else
+ return -EINVAL;
+ } else
+ return -EINVAL;
+
+ if (copy_from_user(reg_val, uaddr, KVM_REG_SIZE(reg->id)))
+ return -EFAULT;
+
+ return 0;
+}
+
static int kvm_riscv_vcpu_set_reg(struct kvm_vcpu *vcpu,
const struct kvm_one_reg *reg)
{
@@ -427,6 +519,12 @@ static int kvm_riscv_vcpu_set_reg(struct kvm_vcpu *vcpu,
return kvm_riscv_vcpu_set_reg_csr(vcpu, reg);
else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_TIMER)
return kvm_riscv_vcpu_set_reg_timer(vcpu, reg);
+ else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_F)
+ return kvm_riscv_vcpu_set_reg_fp(vcpu, reg,
+ KVM_REG_RISCV_FP_F);
+ else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_D)
+ return kvm_riscv_vcpu_set_reg_fp(vcpu, reg,
+ KVM_REG_RISCV_FP_D);

return -EINVAL;
}
@@ -442,6 +540,12 @@ static int kvm_riscv_vcpu_get_reg(struct kvm_vcpu *vcpu,
return kvm_riscv_vcpu_get_reg_csr(vcpu, reg);
else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_TIMER)
return kvm_riscv_vcpu_get_reg_timer(vcpu, reg);
+ else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_F)
+ return kvm_riscv_vcpu_get_reg_fp(vcpu, reg,
+ KVM_REG_RISCV_FP_F);
+ else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_D)
+ return kvm_riscv_vcpu_get_reg_fp(vcpu, reg,
+ KVM_REG_RISCV_FP_D);

return -EINVAL;
}
--
2.25.1

2021-04-01 18:23:13

by Anup Patel

[permalink] [raw]
Subject: [PATCH v17 17/17] RISC-V: KVM: Add MAINTAINERS entry

Add myself as maintainer for KVM RISC-V and Atish as designated reviewer.

Signed-off-by: Atish Patra <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alexander Graf <[email protected]>
---
MAINTAINERS | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index fb2a3633b719..3d1aa899fc4c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9796,6 +9796,17 @@ F: arch/powerpc/include/uapi/asm/kvm*
F: arch/powerpc/kernel/kvm*
F: arch/powerpc/kvm/

+KERNEL VIRTUAL MACHINE FOR RISC-V (KVM/riscv)
+M: Anup Patel <[email protected]>
+R: Atish Patra <[email protected]>
+L: [email protected]
+L: [email protected]
+S: Maintained
+T: git git://github.com/kvm-riscv/linux.git
+F: arch/riscv/include/asm/kvm*
+F: arch/riscv/include/uapi/asm/kvm*
+F: arch/riscv/kvm/
+
KERNEL VIRTUAL MACHINE for s390 (KVM/s390)
M: Christian Borntraeger <[email protected]>
M: Janosch Frank <[email protected]>
--
2.25.1

2021-04-01 18:23:55

by Anup Patel

[permalink] [raw]
Subject: [PATCH v17 04/17] RISC-V: KVM: Implement VCPU interrupts and requests handling

This patch implements VCPU interrupts and requests which are both
asynchronous events.

The VCPU interrupts can be set/unset using KVM_INTERRUPT ioctl from
user-space. In future, the in-kernel IRQCHIP emulation will use
kvm_riscv_vcpu_set_interrupt() and kvm_riscv_vcpu_unset_interrupt()
functions to set/unset VCPU interrupts.

Important VCPU requests implemented by this patch are:
KVM_REQ_SLEEP - set whenever VCPU itself goes to sleep state
KVM_REQ_VCPU_RESET - set whenever VCPU reset is requested

The WFI trap-n-emulate (added later) will use KVM_REQ_SLEEP request
and kvm_riscv_vcpu_has_interrupt() function.

The KVM_REQ_VCPU_RESET request will be used by SBI emulation (added
later) to power-up a VCPU in power-off state. The user-space can use
the GET_MPSTATE/SET_MPSTATE ioctls to get/set power state of a VCPU.

Signed-off-by: Anup Patel <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alexander Graf <[email protected]>
---
arch/riscv/include/asm/kvm_host.h | 23 ++++
arch/riscv/include/uapi/asm/kvm.h | 3 +
arch/riscv/kvm/vcpu.c | 182 +++++++++++++++++++++++++++---
3 files changed, 195 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 2796a4211508..1bf660b1a9d8 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -132,6 +132,21 @@ struct kvm_vcpu_arch {
/* CPU CSR context upon Guest VCPU reset */
struct kvm_vcpu_csr guest_reset_csr;

+ /*
+ * VCPU interrupts
+ *
+ * We have a lockless approach for tracking pending VCPU interrupts
+ * implemented using atomic bitops. The irqs_pending bitmap represent
+ * pending interrupts whereas irqs_pending_mask represent bits changed
+ * in irqs_pending. Our approach is modeled around multiple producer
+ * and single consumer problem where the consumer is the VCPU itself.
+ */
+ unsigned long irqs_pending;
+ unsigned long irqs_pending_mask;
+
+ /* VCPU power-off state */
+ bool power_off;
+
/* Don't run the VCPU (blocked) */
bool pause;

@@ -156,4 +171,12 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,

static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {}

+int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
+int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
+void kvm_riscv_vcpu_flush_interrupts(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu);
+bool kvm_riscv_vcpu_has_interrupts(struct kvm_vcpu *vcpu, unsigned long mask);
+void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
+
#endif /* __RISCV_KVM_HOST_H__ */
diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index 984d041a3e3b..3d3d703713c6 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -18,6 +18,9 @@

#define KVM_COALESCED_MMIO_PAGE_OFFSET 1

+#define KVM_INTERRUPT_SET -1U
+#define KVM_INTERRUPT_UNSET -2U
+
/* for KVM_GET_REGS and KVM_SET_REGS */
struct kvm_regs {
};
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index d87f56126df6..ae85a5d9b979 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -11,6 +11,7 @@
#include <linux/err.h>
#include <linux/kdebug.h>
#include <linux/module.h>
+#include <linux/percpu.h>
#include <linux/uaccess.h>
#include <linux/vmalloc.h>
#include <linux/sched/signal.h>
@@ -54,6 +55,9 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
memcpy(csr, reset_csr, sizeof(*csr));

memcpy(cntx, reset_cntx, sizeof(*cntx));
+
+ WRITE_ONCE(vcpu->arch.irqs_pending, 0);
+ WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
}

int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
@@ -97,8 +101,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)

int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
{
- /* TODO: */
- return 0;
+ return kvm_riscv_vcpu_has_interrupts(vcpu, 1UL << IRQ_VS_TIMER);
}

void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
@@ -111,20 +114,18 @@ void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)

int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
{
- /* TODO: */
- return 0;
+ return (kvm_riscv_vcpu_has_interrupts(vcpu, -1UL) &&
+ !vcpu->arch.power_off && !vcpu->arch.pause);
}

int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
{
- /* TODO: */
- return 0;
+ return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
}

bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
{
- /* TODO: */
- return false;
+ return (vcpu->arch.guest_context.sstatus & SR_SPP) ? true : false;
}

vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
@@ -135,7 +136,21 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
long kvm_arch_vcpu_async_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
- /* TODO; */
+ struct kvm_vcpu *vcpu = filp->private_data;
+ void __user *argp = (void __user *)arg;
+
+ if (ioctl == KVM_INTERRUPT) {
+ struct kvm_interrupt irq;
+
+ if (copy_from_user(&irq, argp, sizeof(irq)))
+ return -EFAULT;
+
+ if (irq.irq == KVM_INTERRUPT_SET)
+ return kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_VS_EXT);
+ else
+ return kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_VS_EXT);
+ }
+
return -ENOIOCTLCMD;
}

@@ -184,18 +199,121 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
return -EINVAL;
}

+void kvm_riscv_vcpu_flush_interrupts(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+ unsigned long mask, val;
+
+ if (READ_ONCE(vcpu->arch.irqs_pending_mask)) {
+ mask = xchg_acquire(&vcpu->arch.irqs_pending_mask, 0);
+ val = READ_ONCE(vcpu->arch.irqs_pending) & mask;
+
+ csr->hvip &= ~mask;
+ csr->hvip |= val;
+ }
+}
+
+void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu)
+{
+ unsigned long hvip;
+ struct kvm_vcpu_arch *v = &vcpu->arch;
+ struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+
+ /* Read current HVIP and HIE CSRs */
+ hvip = csr_read(CSR_HVIP);
+ csr->hie = csr_read(CSR_HIE);
+
+ /* Sync-up HVIP.VSSIP bit changes does by Guest */
+ if ((csr->hvip ^ hvip) & (1UL << IRQ_VS_SOFT)) {
+ if (hvip & (1UL << IRQ_VS_SOFT)) {
+ if (!test_and_set_bit(IRQ_VS_SOFT,
+ &v->irqs_pending_mask))
+ set_bit(IRQ_VS_SOFT, &v->irqs_pending);
+ } else {
+ if (!test_and_set_bit(IRQ_VS_SOFT,
+ &v->irqs_pending_mask))
+ clear_bit(IRQ_VS_SOFT, &v->irqs_pending);
+ }
+ }
+}
+
+int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
+{
+ if (irq != IRQ_VS_SOFT &&
+ irq != IRQ_VS_TIMER &&
+ irq != IRQ_VS_EXT)
+ return -EINVAL;
+
+ set_bit(irq, &vcpu->arch.irqs_pending);
+ smp_mb__before_atomic();
+ set_bit(irq, &vcpu->arch.irqs_pending_mask);
+
+ kvm_vcpu_kick(vcpu);
+
+ return 0;
+}
+
+int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
+{
+ if (irq != IRQ_VS_SOFT &&
+ irq != IRQ_VS_TIMER &&
+ irq != IRQ_VS_EXT)
+ return -EINVAL;
+
+ clear_bit(irq, &vcpu->arch.irqs_pending);
+ smp_mb__before_atomic();
+ set_bit(irq, &vcpu->arch.irqs_pending_mask);
+
+ return 0;
+}
+
+bool kvm_riscv_vcpu_has_interrupts(struct kvm_vcpu *vcpu, unsigned long mask)
+{
+ return (READ_ONCE(vcpu->arch.irqs_pending) &
+ vcpu->arch.guest_csr.hie & mask) ? true : false;
+}
+
+void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu)
+{
+ vcpu->arch.power_off = true;
+ kvm_make_request(KVM_REQ_SLEEP, vcpu);
+ kvm_vcpu_kick(vcpu);
+}
+
+void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu)
+{
+ vcpu->arch.power_off = false;
+ kvm_vcpu_wake_up(vcpu);
+}
+
int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
{
- /* TODO: */
+ if (vcpu->arch.power_off)
+ mp_state->mp_state = KVM_MP_STATE_STOPPED;
+ else
+ mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
+
return 0;
}

int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
{
- /* TODO: */
- return 0;
+ int ret = 0;
+
+ switch (mp_state->mp_state) {
+ case KVM_MP_STATE_RUNNABLE:
+ vcpu->arch.power_off = false;
+ break;
+ case KVM_MP_STATE_STOPPED:
+ kvm_riscv_vcpu_power_off(vcpu);
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ return ret;
}

int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
@@ -219,7 +337,33 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)

static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
{
- /* TODO: */
+ struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
+
+ if (kvm_request_pending(vcpu)) {
+ if (kvm_check_request(KVM_REQ_SLEEP, vcpu)) {
+ rcuwait_wait_event(wait,
+ (!vcpu->arch.power_off) && (!vcpu->arch.pause),
+ TASK_INTERRUPTIBLE);
+
+ if (vcpu->arch.power_off || vcpu->arch.pause) {
+ /*
+ * Awaken to handle a signal, request to
+ * sleep again later.
+ */
+ kvm_make_request(KVM_REQ_SLEEP, vcpu);
+ }
+ }
+
+ if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu))
+ kvm_riscv_reset_vcpu(vcpu);
+ }
+}
+
+static void kvm_riscv_update_hvip(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+
+ csr_write(CSR_HVIP, csr->hvip);
}

int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
@@ -283,6 +427,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx);
smp_mb__after_srcu_read_unlock();

+ /*
+ * We might have got VCPU interrupts updated asynchronously
+ * so update it in HW.
+ */
+ kvm_riscv_vcpu_flush_interrupts(vcpu);
+
+ /* Update HVIP CSR for current CPU */
+ kvm_riscv_update_hvip(vcpu);
+
if (ret <= 0 ||
kvm_request_pending(vcpu)) {
vcpu->mode = OUTSIDE_GUEST_MODE;
@@ -310,6 +463,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
trap.htval = csr_read(CSR_HTVAL);
trap.htinst = csr_read(CSR_HTINST);

+ /* Syncup interrupts state with HW */
+ kvm_riscv_vcpu_sync_interrupts(vcpu);
+
/*
* We may have taken a host interrupt in VS/VU-mode (i.e.
* while executing the guest). This interrupt is still
--
2.25.1

2021-04-01 18:24:03

by Anup Patel

[permalink] [raw]
Subject: [PATCH v17 15/17] RISC-V: KVM: Add SBI v0.1 support

From: Atish Patra <[email protected]>

The KVM host kernel is running in HS-mode needs so we need to handle
the SBI calls coming from guest kernel running in VS-mode.

This patch adds SBI v0.1 support in KVM RISC-V. Almost all SBI v0.1
calls are implemented in KVM kernel module except GETCHAR and PUTCHART
calls which are forwarded to user space because these calls cannot be
implemented in kernel space. In future, when we implement SBI v0.2 for
Guest, we will forward SBI v0.2 experimental and vendor extension calls
to user space.

Signed-off-by: Atish Patra <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
---
arch/riscv/include/asm/kvm_host.h | 10 ++
arch/riscv/kvm/Makefile | 2 +-
arch/riscv/kvm/vcpu.c | 9 ++
arch/riscv/kvm/vcpu_exit.c | 4 +
arch/riscv/kvm/vcpu_sbi.c | 173 ++++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 8 ++
6 files changed, 205 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/kvm/vcpu_sbi.c

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index c6e717087a25..0818705e3c2b 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -79,6 +79,10 @@ struct kvm_mmio_decode {
int return_handled;
};

+struct kvm_sbi_context {
+ int return_handled;
+};
+
#define KVM_MMU_PAGE_CACHE_NR_OBJS 32

struct kvm_mmu_page_cache {
@@ -191,6 +195,9 @@ struct kvm_vcpu_arch {
/* MMIO instruction details */
struct kvm_mmio_decode mmio_decode;

+ /* SBI context */
+ struct kvm_sbi_context sbi_context;
+
/* Cache pages needed to program page tables with spinlock held */
struct kvm_mmu_page_cache mmu_page_cache;

@@ -264,4 +271,7 @@ bool kvm_riscv_vcpu_has_interrupts(struct kvm_vcpu *vcpu, unsigned long mask);
void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);

+int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
#endif /* __RISCV_KVM_HOST_H__ */
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index a034826f9a3f..7cf0015d9142 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -10,6 +10,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
kvm-objs := $(common-objs-y)

kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
-kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
+kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o vcpu_sbi.o

obj-$(CONFIG_KVM) += kvm.o
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index a797f247db64..2c2c5e078c30 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -867,6 +867,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
}
}

+ /* Process SBI value returned from user-space */
+ if (run->exit_reason == KVM_EXIT_RISCV_SBI) {
+ ret = kvm_riscv_vcpu_sbi_return(vcpu, vcpu->run);
+ if (ret) {
+ srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx);
+ return ret;
+ }
+ }
+
if (run->immediate_exit) {
srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx);
return -EINTR;
diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
index 1873b8c35101..6d4e98e2ad6f 100644
--- a/arch/riscv/kvm/vcpu_exit.c
+++ b/arch/riscv/kvm/vcpu_exit.c
@@ -678,6 +678,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
ret = stage2_page_fault(vcpu, run, trap);
break;
+ case EXC_SUPERVISOR_SYSCALL:
+ if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
+ ret = kvm_riscv_vcpu_sbi_ecall(vcpu, run);
+ break;
default:
break;
};
diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
new file mode 100644
index 000000000000..9d1d25cf217f
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_sbi.c
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * Copyright (c) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Atish Patra <[email protected]>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <asm/csr.h>
+#include <asm/sbi.h>
+#include <asm/kvm_vcpu_timer.h>
+
+#define SBI_VERSION_MAJOR 0
+#define SBI_VERSION_MINOR 1
+
+static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu,
+ struct kvm_run *run, u32 type)
+{
+ int i;
+ struct kvm_vcpu *tmp;
+
+ kvm_for_each_vcpu(i, tmp, vcpu->kvm)
+ tmp->arch.power_off = true;
+ kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
+
+ memset(&run->system_event, 0, sizeof(run->system_event));
+ run->system_event.type = type;
+ run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+}
+
+static void kvm_riscv_vcpu_sbi_forward(struct kvm_vcpu *vcpu,
+ struct kvm_run *run)
+{
+ struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
+
+ vcpu->arch.sbi_context.return_handled = 0;
+ vcpu->stat.ecall_exit_stat++;
+ run->exit_reason = KVM_EXIT_RISCV_SBI;
+ run->riscv_sbi.extension_id = cp->a7;
+ run->riscv_sbi.function_id = cp->a6;
+ run->riscv_sbi.args[0] = cp->a0;
+ run->riscv_sbi.args[1] = cp->a1;
+ run->riscv_sbi.args[2] = cp->a2;
+ run->riscv_sbi.args[3] = cp->a3;
+ run->riscv_sbi.args[4] = cp->a4;
+ run->riscv_sbi.args[5] = cp->a5;
+ run->riscv_sbi.ret[0] = cp->a0;
+ run->riscv_sbi.ret[1] = cp->a1;
+}
+
+int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+ struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
+
+ /* Handle SBI return only once */
+ if (vcpu->arch.sbi_context.return_handled)
+ return 0;
+ vcpu->arch.sbi_context.return_handled = 1;
+
+ /* Update return values */
+ cp->a0 = run->riscv_sbi.ret[0];
+ cp->a1 = run->riscv_sbi.ret[1];
+
+ /* Move to next instruction */
+ vcpu->arch.guest_context.sepc += 4;
+
+ return 0;
+}
+
+int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+ ulong hmask;
+ int i, ret = 1;
+ u64 next_cycle;
+ struct kvm_vcpu *rvcpu;
+ bool next_sepc = true;
+ struct cpumask cm, hm;
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_cpu_trap utrap = { 0 };
+ struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
+
+ if (!cp)
+ return -EINVAL;
+
+ switch (cp->a7) {
+ case SBI_EXT_0_1_CONSOLE_GETCHAR:
+ case SBI_EXT_0_1_CONSOLE_PUTCHAR:
+ /*
+ * The CONSOLE_GETCHAR/CONSOLE_PUTCHAR SBI calls cannot be
+ * handled in kernel so we forward these to user-space
+ */
+ kvm_riscv_vcpu_sbi_forward(vcpu, run);
+ next_sepc = false;
+ ret = 0;
+ break;
+ case SBI_EXT_0_1_SET_TIMER:
+#if __riscv_xlen == 32
+ next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
+#else
+ next_cycle = (u64)cp->a0;
+#endif
+ kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle);
+ break;
+ case SBI_EXT_0_1_CLEAR_IPI:
+ kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_VS_SOFT);
+ break;
+ case SBI_EXT_0_1_SEND_IPI:
+ if (cp->a0)
+ hmask = kvm_riscv_vcpu_unpriv_read(vcpu, false, cp->a0,
+ &utrap);
+ else
+ hmask = (1UL << atomic_read(&kvm->online_vcpus)) - 1;
+ if (utrap.scause) {
+ utrap.sepc = cp->sepc;
+ kvm_riscv_vcpu_trap_redirect(vcpu, &utrap);
+ next_sepc = false;
+ break;
+ }
+ for_each_set_bit(i, &hmask, BITS_PER_LONG) {
+ rvcpu = kvm_get_vcpu_by_id(vcpu->kvm, i);
+ kvm_riscv_vcpu_set_interrupt(rvcpu, IRQ_VS_SOFT);
+ }
+ break;
+ case SBI_EXT_0_1_SHUTDOWN:
+ kvm_sbi_system_shutdown(vcpu, run, KVM_SYSTEM_EVENT_SHUTDOWN);
+ next_sepc = false;
+ ret = 0;
+ break;
+ case SBI_EXT_0_1_REMOTE_FENCE_I:
+ case SBI_EXT_0_1_REMOTE_SFENCE_VMA:
+ case SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID:
+ if (cp->a0)
+ hmask = kvm_riscv_vcpu_unpriv_read(vcpu, false, cp->a0,
+ &utrap);
+ else
+ hmask = (1UL << atomic_read(&kvm->online_vcpus)) - 1;
+ if (utrap.scause) {
+ utrap.sepc = cp->sepc;
+ kvm_riscv_vcpu_trap_redirect(vcpu, &utrap);
+ next_sepc = false;
+ break;
+ }
+ cpumask_clear(&cm);
+ for_each_set_bit(i, &hmask, BITS_PER_LONG) {
+ rvcpu = kvm_get_vcpu_by_id(vcpu->kvm, i);
+ if (rvcpu->cpu < 0)
+ continue;
+ cpumask_set_cpu(rvcpu->cpu, &cm);
+ }
+ riscv_cpuid_to_hartid_mask(&cm, &hm);
+ if (cp->a7 == SBI_EXT_0_1_REMOTE_FENCE_I)
+ sbi_remote_fence_i(cpumask_bits(&hm));
+ else if (cp->a7 == SBI_EXT_0_1_REMOTE_SFENCE_VMA)
+ sbi_remote_hfence_vvma(cpumask_bits(&hm),
+ cp->a1, cp->a2);
+ else
+ sbi_remote_hfence_vvma_asid(cpumask_bits(&hm),
+ cp->a1, cp->a2, cp->a3);
+ break;
+ default:
+ /* Return error for unsupported SBI calls */
+ cp->a0 = SBI_ERR_NOT_SUPPORTED;
+ break;
+ };
+
+ if (next_sepc)
+ cp->sepc += 4;
+
+ return ret;
+}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f6afee209620..da0c11c13f5a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -268,6 +268,7 @@ struct kvm_xen_exit {
#define KVM_EXIT_AP_RESET_HOLD 32
#define KVM_EXIT_X86_BUS_LOCK 33
#define KVM_EXIT_XEN 34
+#define KVM_EXIT_RISCV_SBI 35

/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -446,6 +447,13 @@ struct kvm_run {
} msr;
/* KVM_EXIT_XEN */
struct kvm_xen_exit xen;
+ /* KVM_EXIT_RISCV_SBI */
+ struct {
+ unsigned long extension_id;
+ unsigned long function_id;
+ unsigned long args[6];
+ unsigned long ret[2];
+ } riscv_sbi;
/* Fix the size of the union. */
char padding[256];
};
--
2.25.1

2021-04-01 18:24:22

by Anup Patel

[permalink] [raw]
Subject: [PATCH v17 16/17] RISC-V: KVM: Document RISC-V specific parts of KVM API

Document RISC-V specific parts of the KVM API, such as:
- The interrupt numbers passed to the KVM_INTERRUPT ioctl.
- The states supported by the KVM_{GET,SET}_MP_STATE ioctls.
- The registers supported by the KVM_{GET,SET}_ONE_REG interface
and the encoding of those register ids.
- The exit reason KVM_EXIT_RISCV_SBI for SBI calls forwarded to
userspace tool.

CC: Jonathan Corbet <[email protected]>
CC: [email protected]
Signed-off-by: Anup Patel <[email protected]>
---
Documentation/virt/kvm/api.rst | 193 +++++++++++++++++++++++++++++++--
1 file changed, 184 insertions(+), 9 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 307f2fcf1b02..c8fe09a62690 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -532,7 +532,7 @@ translation mode.
------------------

:Capability: basic
-:Architectures: x86, ppc, mips
+:Architectures: x86, ppc, mips, riscv
:Type: vcpu ioctl
:Parameters: struct kvm_interrupt (in)
:Returns: 0 on success, negative on failure.
@@ -601,6 +601,23 @@ interrupt number dequeues the interrupt.

This is an asynchronous vcpu ioctl and can be invoked from any thread.

+RISC-V:
+^^^^^^^
+
+Queues an external interrupt to be injected into the virutal CPU. This ioctl
+is overloaded with 2 different irq values:
+
+a) KVM_INTERRUPT_SET
+
+ This sets external interrupt for a virtual CPU and it will receive
+ once it is ready.
+
+b) KVM_INTERRUPT_UNSET
+
+ This clears pending external interrupt for a virtual CPU.
+
+This is an asynchronous vcpu ioctl and can be invoked from any thread.
+

4.17 KVM_DEBUG_GUEST
--------------------
@@ -1394,7 +1411,7 @@ for vm-wide capabilities.
---------------------

:Capability: KVM_CAP_MP_STATE
-:Architectures: x86, s390, arm, arm64
+:Architectures: x86, s390, arm, arm64, riscv
:Type: vcpu ioctl
:Parameters: struct kvm_mp_state (out)
:Returns: 0 on success; -1 on error
@@ -1411,7 +1428,8 @@ uniprocessor guests).
Possible values are:

========================== ===============================================
- KVM_MP_STATE_RUNNABLE the vcpu is currently running [x86,arm/arm64]
+ KVM_MP_STATE_RUNNABLE the vcpu is currently running
+ [x86,arm/arm64,riscv]
KVM_MP_STATE_UNINITIALIZED the vcpu is an application processor (AP)
which has not yet received an INIT signal [x86]
KVM_MP_STATE_INIT_RECEIVED the vcpu has received an INIT signal, and is
@@ -1420,7 +1438,7 @@ Possible values are:
is waiting for an interrupt [x86]
KVM_MP_STATE_SIPI_RECEIVED the vcpu has just received a SIPI (vector
accessible via KVM_GET_VCPU_EVENTS) [x86]
- KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64]
+ KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64,riscv]
KVM_MP_STATE_CHECK_STOP the vcpu is in a special error state [s390]
KVM_MP_STATE_OPERATING the vcpu is operating (running or halted)
[s390]
@@ -1432,8 +1450,8 @@ On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
in-kernel irqchip, the multiprocessing state must be maintained by userspace on
these architectures.

-For arm/arm64:
-^^^^^^^^^^^^^^
+For arm/arm64/riscv:
+^^^^^^^^^^^^^^^^^^^^

The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
@@ -1442,7 +1460,7 @@ KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
---------------------

:Capability: KVM_CAP_MP_STATE
-:Architectures: x86, s390, arm, arm64
+:Architectures: x86, s390, arm, arm64, riscv
:Type: vcpu ioctl
:Parameters: struct kvm_mp_state (in)
:Returns: 0 on success; -1 on error
@@ -1454,8 +1472,8 @@ On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
in-kernel irqchip, the multiprocessing state must be maintained by userspace on
these architectures.

-For arm/arm64:
-^^^^^^^^^^^^^^
+For arm/arm64/riscv:
+^^^^^^^^^^^^^^^^^^^^

The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
@@ -2572,6 +2590,144 @@ following id bit patterns::

0x7020 0000 0003 02 <0:3> <reg:5>

+RISC-V registers are mapped using the lower 32 bits. The upper 8 bits of
+that is the register group type.
+
+RISC-V config registers are meant for configuring a Guest VCPU and it has
+the following id bit patterns::
+
+ 0x8020 0000 01 <index into the kvm_riscv_config struct:24> (32bit Host)
+ 0x8030 0000 01 <index into the kvm_riscv_config struct:24> (64bit Host)
+
+Following are the RISC-V config registers:
+
+======================= ========= =============================================
+ Encoding Register Description
+======================= ========= =============================================
+ 0x80x0 0000 0100 0000 isa ISA feature bitmap of Guest VCPU
+======================= ========= =============================================
+
+The isa config register can be read anytime but can only be written before
+a Guest VCPU runs. It will have ISA feature bits matching underlying host
+set by default.
+
+RISC-V core registers represent the general excution state of a Guest VCPU
+and it has the following id bit patterns::
+
+ 0x8020 0000 02 <index into the kvm_riscv_core struct:24> (32bit Host)
+ 0x8030 0000 02 <index into the kvm_riscv_core struct:24> (64bit Host)
+
+Following are the RISC-V core registers:
+
+======================= ========= =============================================
+ Encoding Register Description
+======================= ========= =============================================
+ 0x80x0 0000 0200 0000 regs.pc Program counter
+ 0x80x0 0000 0200 0001 regs.ra Return address
+ 0x80x0 0000 0200 0002 regs.sp Stack pointer
+ 0x80x0 0000 0200 0003 regs.gp Global pointer
+ 0x80x0 0000 0200 0004 regs.tp Task pointer
+ 0x80x0 0000 0200 0005 regs.t0 Caller saved register 0
+ 0x80x0 0000 0200 0006 regs.t1 Caller saved register 1
+ 0x80x0 0000 0200 0007 regs.t2 Caller saved register 2
+ 0x80x0 0000 0200 0008 regs.s0 Callee saved register 0
+ 0x80x0 0000 0200 0009 regs.s1 Callee saved register 1
+ 0x80x0 0000 0200 000a regs.a0 Function argument (or return value) 0
+ 0x80x0 0000 0200 000b regs.a1 Function argument (or return value) 1
+ 0x80x0 0000 0200 000c regs.a2 Function argument 2
+ 0x80x0 0000 0200 000d regs.a3 Function argument 3
+ 0x80x0 0000 0200 000e regs.a4 Function argument 4
+ 0x80x0 0000 0200 000f regs.a5 Function argument 5
+ 0x80x0 0000 0200 0010 regs.a6 Function argument 6
+ 0x80x0 0000 0200 0011 regs.a7 Function argument 7
+ 0x80x0 0000 0200 0012 regs.s2 Callee saved register 2
+ 0x80x0 0000 0200 0013 regs.s3 Callee saved register 3
+ 0x80x0 0000 0200 0014 regs.s4 Callee saved register 4
+ 0x80x0 0000 0200 0015 regs.s5 Callee saved register 5
+ 0x80x0 0000 0200 0016 regs.s6 Callee saved register 6
+ 0x80x0 0000 0200 0017 regs.s7 Callee saved register 7
+ 0x80x0 0000 0200 0018 regs.s8 Callee saved register 8
+ 0x80x0 0000 0200 0019 regs.s9 Callee saved register 9
+ 0x80x0 0000 0200 001a regs.s10 Callee saved register 10
+ 0x80x0 0000 0200 001b regs.s11 Callee saved register 11
+ 0x80x0 0000 0200 001c regs.t3 Caller saved register 3
+ 0x80x0 0000 0200 001d regs.t4 Caller saved register 4
+ 0x80x0 0000 0200 001e regs.t5 Caller saved register 5
+ 0x80x0 0000 0200 001f regs.t6 Caller saved register 6
+ 0x80x0 0000 0200 0020 mode Privilege mode (1 = S-mode or 0 = U-mode)
+======================= ========= =============================================
+
+RISC-V csr registers represent the supervisor mode control/status registers
+of a Guest VCPU and it has the following id bit patterns::
+
+ 0x8020 0000 03 <index into the kvm_riscv_csr struct:24> (32bit Host)
+ 0x8030 0000 03 <index into the kvm_riscv_csr struct:24> (64bit Host)
+
+Following are the RISC-V csr registers:
+
+======================= ========= =============================================
+ Encoding Register Description
+======================= ========= =============================================
+ 0x80x0 0000 0300 0000 sstatus Supervisor status
+ 0x80x0 0000 0300 0001 sie Supervisor interrupt enable
+ 0x80x0 0000 0300 0002 stvec Supervisor trap vector base
+ 0x80x0 0000 0300 0003 sscratch Supervisor scratch register
+ 0x80x0 0000 0300 0004 sepc Supervisor exception program counter
+ 0x80x0 0000 0300 0005 scause Supervisor trap cause
+ 0x80x0 0000 0300 0006 stval Supervisor bad address or instruction
+ 0x80x0 0000 0300 0007 sip Supervisor interrupt pending
+ 0x80x0 0000 0300 0008 satp Supervisor address translation and protection
+======================= ========= =============================================
+
+RISC-V timer registers represent the timer state of a Guest VCPU and it has
+the following id bit patterns::
+
+ 0x8030 0000 04 <index into the kvm_riscv_timer struct:24>
+
+Following are the RISC-V timer registers:
+
+======================= ========= =============================================
+ Encoding Register Description
+======================= ========= =============================================
+ 0x8030 0000 0400 0000 frequency Time base frequency (read-only)
+ 0x8030 0000 0400 0001 time Time value visible to Guest
+ 0x8030 0000 0400 0002 compare Time compare programmed by Guest
+ 0x8030 0000 0400 0003 state Time compare state (1 = ON or 0 = OFF)
+======================= ========= =============================================
+
+RISC-V F-extension registers represent the single precision floating point
+state of a Guest VCPU and it has the following id bit patterns::
+
+ 0x8020 0000 05 <index into the __riscv_f_ext_state struct:24>
+
+Following are the RISC-V F-extension registers:
+
+======================= ========= =============================================
+ Encoding Register Description
+======================= ========= =============================================
+ 0x8020 0000 0500 0000 f[0] Floating point register 0
+ ...
+ 0x8020 0000 0500 001f f[31] Floating point register 31
+ 0x8020 0000 0500 0020 fcsr Floating point control and status register
+======================= ========= =============================================
+
+RISC-V D-extension registers represent the double precision floating point
+state of a Guest VCPU and it has the following id bit patterns::
+
+ 0x8020 0000 06 <index into the __riscv_d_ext_state struct:24> (fcsr)
+ 0x8030 0000 06 <index into the __riscv_d_ext_state struct:24> (non-fcsr)
+
+Following are the RISC-V D-extension registers:
+
+======================= ========= =============================================
+ Encoding Register Description
+======================= ========= =============================================
+ 0x8030 0000 0600 0000 f[0] Floating point register 0
+ ...
+ 0x8030 0000 0600 001f f[31] Floating point register 31
+ 0x8020 0000 0600 0020 fcsr Floating point control and status register
+======================= ========= =============================================
+

4.69 KVM_GET_ONE_REG
--------------------
@@ -5475,6 +5631,25 @@ Valid values for 'type' are:
Userspace is expected to place the hypercall result into the appropriate
field before invoking KVM_RUN again.

+::
+
+ /* KVM_EXIT_RISCV_SBI */
+ struct {
+ unsigned long extension_id;
+ unsigned long function_id;
+ unsigned long args[6];
+ unsigned long ret[2];
+ } riscv_sbi;
+If exit reason is KVM_EXIT_RISCV_SBI then it indicates that the VCPU has
+done a SBI call which is not handled by KVM RISC-V kernel module. The details
+of the SBI call are available in 'riscv_sbi' member of kvm_run structure. The
+'extension_id' field of 'riscv_sbi' represents SBI extension ID whereas the
+'function_id' field represents function ID of given SBI extension. The 'args'
+array field of 'riscv_sbi' represents parameters for the SBI call and 'ret'
+array field represents return values. The userspace should update the return
+values of SBI call before resuming the VCPU. For more details on RISC-V SBI
+spec refer, https://github.com/riscv/riscv-sbi-doc.
+
::

/* Fix the size of the union. */
--
2.25.1

2021-04-01 18:28:34

by Anup Patel

[permalink] [raw]
Subject: [PATCH v17 01/17] RISC-V: Add hypervisor extension related CSR defines

This patch extends asm/csr.h by adding RISC-V hypervisor extension
related defines.

Signed-off-by: Anup Patel <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alexander Graf <[email protected]>
---
arch/riscv/include/asm/csr.h | 89 ++++++++++++++++++++++++++++++++++++
1 file changed, 89 insertions(+)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index caadfc1d7487..bdf00bd558e4 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -30,6 +30,8 @@
#define SR_XS_CLEAN _AC(0x00010000, UL)
#define SR_XS_DIRTY _AC(0x00018000, UL)

+#define SR_MXR _AC(0x00080000, UL)
+
#ifndef CONFIG_64BIT
#define SR_SD _AC(0x80000000, UL) /* FS/XS dirty */
#else
@@ -58,22 +60,32 @@

/* Interrupt causes (minus the high bit) */
#define IRQ_S_SOFT 1
+#define IRQ_VS_SOFT 2
#define IRQ_M_SOFT 3
#define IRQ_S_TIMER 5
+#define IRQ_VS_TIMER 6
#define IRQ_M_TIMER 7
#define IRQ_S_EXT 9
+#define IRQ_VS_EXT 10
#define IRQ_M_EXT 11

/* Exception causes */
#define EXC_INST_MISALIGNED 0
#define EXC_INST_ACCESS 1
+#define EXC_INST_ILLEGAL 2
#define EXC_BREAKPOINT 3
#define EXC_LOAD_ACCESS 5
#define EXC_STORE_ACCESS 7
#define EXC_SYSCALL 8
+#define EXC_HYPERVISOR_SYSCALL 9
+#define EXC_SUPERVISOR_SYSCALL 10
#define EXC_INST_PAGE_FAULT 12
#define EXC_LOAD_PAGE_FAULT 13
#define EXC_STORE_PAGE_FAULT 15
+#define EXC_INST_GUEST_PAGE_FAULT 20
+#define EXC_LOAD_GUEST_PAGE_FAULT 21
+#define EXC_VIRTUAL_INST_FAULT 22
+#define EXC_STORE_GUEST_PAGE_FAULT 23

/* PMP configuration */
#define PMP_R 0x01
@@ -85,6 +97,58 @@
#define PMP_A_NAPOT 0x18
#define PMP_L 0x80

+/* HSTATUS flags */
+#ifdef CONFIG_64BIT
+#define HSTATUS_VSXL _AC(0x300000000, UL)
+#define HSTATUS_VSXL_SHIFT 32
+#endif
+#define HSTATUS_VTSR _AC(0x00400000, UL)
+#define HSTATUS_VTW _AC(0x00200000, UL)
+#define HSTATUS_VTVM _AC(0x00100000, UL)
+#define HSTATUS_VGEIN _AC(0x0003f000, UL)
+#define HSTATUS_VGEIN_SHIFT 12
+#define HSTATUS_HU _AC(0x00000200, UL)
+#define HSTATUS_SPVP _AC(0x00000100, UL)
+#define HSTATUS_SPV _AC(0x00000080, UL)
+#define HSTATUS_GVA _AC(0x00000040, UL)
+#define HSTATUS_VSBE _AC(0x00000020, UL)
+
+/* HGATP flags */
+#define HGATP_MODE_OFF _AC(0, UL)
+#define HGATP_MODE_SV32X4 _AC(1, UL)
+#define HGATP_MODE_SV39X4 _AC(8, UL)
+#define HGATP_MODE_SV48X4 _AC(9, UL)
+
+#define HGATP32_MODE_SHIFT 31
+#define HGATP32_VMID_SHIFT 22
+#define HGATP32_VMID_MASK _AC(0x1FC00000, UL)
+#define HGATP32_PPN _AC(0x003FFFFF, UL)
+
+#define HGATP64_MODE_SHIFT 60
+#define HGATP64_VMID_SHIFT 44
+#define HGATP64_VMID_MASK _AC(0x03FFF00000000000, UL)
+#define HGATP64_PPN _AC(0x00000FFFFFFFFFFF, UL)
+
+#define HGATP_PAGE_SHIFT 12
+
+#ifdef CONFIG_64BIT
+#define HGATP_PPN HGATP64_PPN
+#define HGATP_VMID_SHIFT HGATP64_VMID_SHIFT
+#define HGATP_VMID_MASK HGATP64_VMID_MASK
+#define HGATP_MODE_SHIFT HGATP64_MODE_SHIFT
+#else
+#define HGATP_PPN HGATP32_PPN
+#define HGATP_VMID_SHIFT HGATP32_VMID_SHIFT
+#define HGATP_VMID_MASK HGATP32_VMID_MASK
+#define HGATP_MODE_SHIFT HGATP32_MODE_SHIFT
+#endif
+
+/* VSIP & HVIP relation */
+#define VSIP_TO_HVIP_SHIFT (IRQ_VS_SOFT - IRQ_S_SOFT)
+#define VSIP_VALID_MASK ((_AC(1, UL) << IRQ_S_SOFT) | \
+ (_AC(1, UL) << IRQ_S_TIMER) | \
+ (_AC(1, UL) << IRQ_S_EXT))
+
/* symbolic CSR names: */
#define CSR_CYCLE 0xc00
#define CSR_TIME 0xc01
@@ -104,6 +168,31 @@
#define CSR_SIP 0x144
#define CSR_SATP 0x180

+#define CSR_VSSTATUS 0x200
+#define CSR_VSIE 0x204
+#define CSR_VSTVEC 0x205
+#define CSR_VSSCRATCH 0x240
+#define CSR_VSEPC 0x241
+#define CSR_VSCAUSE 0x242
+#define CSR_VSTVAL 0x243
+#define CSR_VSIP 0x244
+#define CSR_VSATP 0x280
+
+#define CSR_HSTATUS 0x600
+#define CSR_HEDELEG 0x602
+#define CSR_HIDELEG 0x603
+#define CSR_HIE 0x604
+#define CSR_HTIMEDELTA 0x605
+#define CSR_HCOUNTEREN 0x606
+#define CSR_HGEIE 0x607
+#define CSR_HTIMEDELTAH 0x615
+#define CSR_HTVAL 0x643
+#define CSR_HIP 0x644
+#define CSR_HVIP 0x645
+#define CSR_HTINST 0x64a
+#define CSR_HGATP 0x680
+#define CSR_HGEIP 0xe12
+
#define CSR_MSTATUS 0x300
#define CSR_MISA 0x301
#define CSR_MIE 0x304
--
2.25.1

2021-04-01 18:33:32

by Anup Patel

[permalink] [raw]
Subject: [PATCH v17 12/17] RISC-V: KVM: Add timer functionality

From: Atish Patra <[email protected]>

The RISC-V hypervisor specification doesn't have any virtual timer
feature.

Due to this, the guest VCPU timer will be programmed via SBI calls.
The host will use a separate hrtimer event for each guest VCPU to
provide timer functionality. We inject a virtual timer interrupt to
the guest VCPU whenever the guest VCPU hrtimer event expires.

This patch adds guest VCPU timer implementation along with ONE_REG
interface to access VCPU timer state from user space.

Signed-off-by: Atish Patra <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
---
arch/riscv/include/asm/kvm_host.h | 7 +
arch/riscv/include/asm/kvm_vcpu_timer.h | 44 +++++
arch/riscv/include/uapi/asm/kvm.h | 17 ++
arch/riscv/kvm/Makefile | 2 +-
arch/riscv/kvm/vcpu.c | 14 ++
arch/riscv/kvm/vcpu_timer.c | 225 ++++++++++++++++++++++++
arch/riscv/kvm/vm.c | 2 +-
drivers/clocksource/timer-riscv.c | 8 +
include/clocksource/timer-riscv.h | 16 ++
9 files changed, 333 insertions(+), 2 deletions(-)
create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
create mode 100644 arch/riscv/kvm/vcpu_timer.c
create mode 100644 include/clocksource/timer-riscv.h

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 4e318137d82a..20308046d5df 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -12,6 +12,7 @@
#include <linux/types.h>
#include <linux/kvm.h>
#include <linux/kvm_types.h>
+#include <asm/kvm_vcpu_timer.h>

#ifdef CONFIG_64BIT
#define KVM_MAX_VCPUS (1U << 16)
@@ -65,6 +66,9 @@ struct kvm_arch {
/* stage2 page table */
pgd_t *pgd;
phys_addr_t pgd_phys;
+
+ /* Guest Timer */
+ struct kvm_guest_timer timer;
};

struct kvm_mmio_decode {
@@ -180,6 +184,9 @@ struct kvm_vcpu_arch {
unsigned long irqs_pending;
unsigned long irqs_pending_mask;

+ /* VCPU Timer */
+ struct kvm_vcpu_timer timer;
+
/* MMIO instruction details */
struct kvm_mmio_decode mmio_decode;

diff --git a/arch/riscv/include/asm/kvm_vcpu_timer.h b/arch/riscv/include/asm/kvm_vcpu_timer.h
new file mode 100644
index 000000000000..375281eb49e0
--- /dev/null
+++ b/arch/riscv/include/asm/kvm_vcpu_timer.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Atish Patra <[email protected]>
+ */
+
+#ifndef __KVM_VCPU_RISCV_TIMER_H
+#define __KVM_VCPU_RISCV_TIMER_H
+
+#include <linux/hrtimer.h>
+
+struct kvm_guest_timer {
+ /* Mult & Shift values to get nanoseconds from cycles */
+ u32 nsec_mult;
+ u32 nsec_shift;
+ /* Time delta value */
+ u64 time_delta;
+};
+
+struct kvm_vcpu_timer {
+ /* Flag for whether init is done */
+ bool init_done;
+ /* Flag for whether timer event is configured */
+ bool next_set;
+ /* Next timer event cycles */
+ u64 next_cycles;
+ /* Underlying hrtimer instance */
+ struct hrtimer hrt;
+};
+
+int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu, u64 ncycles);
+int kvm_riscv_vcpu_get_reg_timer(struct kvm_vcpu *vcpu,
+ const struct kvm_one_reg *reg);
+int kvm_riscv_vcpu_set_reg_timer(struct kvm_vcpu *vcpu,
+ const struct kvm_one_reg *reg);
+int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_timer_restore(struct kvm_vcpu *vcpu);
+int kvm_riscv_guest_timer_init(struct kvm *kvm);
+
+#endif
diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index f7e9dc388d54..08691dd27bcf 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -74,6 +74,18 @@ struct kvm_riscv_csr {
unsigned long scounteren;
};

+/* TIMER registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
+struct kvm_riscv_timer {
+ __u64 frequency;
+ __u64 time;
+ __u64 compare;
+ __u64 state;
+};
+
+/* Possible states for kvm_riscv_timer */
+#define KVM_RISCV_TIMER_STATE_OFF 0
+#define KVM_RISCV_TIMER_STATE_ON 1
+
#define KVM_REG_SIZE(id) \
(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))

@@ -96,6 +108,11 @@ struct kvm_riscv_csr {
#define KVM_REG_RISCV_CSR_REG(name) \
(offsetof(struct kvm_riscv_csr, name) / sizeof(unsigned long))

+/* Timer registers are mapped as type 4 */
+#define KVM_REG_RISCV_TIMER (0x04 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_TIMER_REG(name) \
+ (offsetof(struct kvm_riscv_timer, name) / sizeof(__u64))
+
#endif

#endif /* __LINUX_KVM_RISCV_H */
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index b32f60edf48c..a034826f9a3f 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -10,6 +10,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
kvm-objs := $(common-objs-y)

kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
-kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o
+kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o

obj-$(CONFIG_KVM) += kvm.o
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index dcb2014cb2fc..e242431dca66 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -55,6 +55,8 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)

memcpy(cntx, reset_cntx, sizeof(*cntx));

+ kvm_riscv_vcpu_timer_reset(vcpu);
+
WRITE_ONCE(vcpu->arch.irqs_pending, 0);
WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
}
@@ -82,6 +84,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
cntx->hstatus |= HSTATUS_SPVP;
cntx->hstatus |= HSTATUS_SPV;

+ /* Setup VCPU timer */
+ kvm_riscv_vcpu_timer_init(vcpu);
+
/* Reset VCPU */
kvm_riscv_reset_vcpu(vcpu);

@@ -94,6 +99,9 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)

void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
+ /* Cleanup VCPU timer */
+ kvm_riscv_vcpu_timer_deinit(vcpu);
+
/* Flush the pages pre-allocated for Stage2 page table mappings */
kvm_riscv_stage2_flush_cache(vcpu);
}
@@ -334,6 +342,8 @@ static int kvm_riscv_vcpu_set_reg(struct kvm_vcpu *vcpu,
return kvm_riscv_vcpu_set_reg_core(vcpu, reg);
else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CSR)
return kvm_riscv_vcpu_set_reg_csr(vcpu, reg);
+ else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_TIMER)
+ return kvm_riscv_vcpu_set_reg_timer(vcpu, reg);

return -EINVAL;
}
@@ -347,6 +357,8 @@ static int kvm_riscv_vcpu_get_reg(struct kvm_vcpu *vcpu,
return kvm_riscv_vcpu_get_reg_core(vcpu, reg);
else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CSR)
return kvm_riscv_vcpu_get_reg_csr(vcpu, reg);
+ else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_TIMER)
+ return kvm_riscv_vcpu_get_reg_timer(vcpu, reg);

return -EINVAL;
}
@@ -579,6 +591,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)

kvm_riscv_stage2_update_hgatp(vcpu);

+ kvm_riscv_vcpu_timer_restore(vcpu);
+
vcpu->cpu = cpu;
}

diff --git a/arch/riscv/kvm/vcpu_timer.c b/arch/riscv/kvm/vcpu_timer.c
new file mode 100644
index 000000000000..ddd0ce727b83
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_timer.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Atish Patra <[email protected]>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/uaccess.h>
+#include <clocksource/timer-riscv.h>
+#include <asm/csr.h>
+#include <asm/delay.h>
+#include <asm/kvm_vcpu_timer.h>
+
+static u64 kvm_riscv_current_cycles(struct kvm_guest_timer *gt)
+{
+ return get_cycles64() + gt->time_delta;
+}
+
+static u64 kvm_riscv_delta_cycles2ns(u64 cycles,
+ struct kvm_guest_timer *gt,
+ struct kvm_vcpu_timer *t)
+{
+ unsigned long flags;
+ u64 cycles_now, cycles_delta, delta_ns;
+
+ local_irq_save(flags);
+ cycles_now = kvm_riscv_current_cycles(gt);
+ if (cycles_now < cycles)
+ cycles_delta = cycles - cycles_now;
+ else
+ cycles_delta = 0;
+ delta_ns = (cycles_delta * gt->nsec_mult) >> gt->nsec_shift;
+ local_irq_restore(flags);
+
+ return delta_ns;
+}
+
+static enum hrtimer_restart kvm_riscv_vcpu_hrtimer_expired(struct hrtimer *h)
+{
+ u64 delta_ns;
+ struct kvm_vcpu_timer *t = container_of(h, struct kvm_vcpu_timer, hrt);
+ struct kvm_vcpu *vcpu = container_of(t, struct kvm_vcpu, arch.timer);
+ struct kvm_guest_timer *gt = &vcpu->kvm->arch.timer;
+
+ if (kvm_riscv_current_cycles(gt) < t->next_cycles) {
+ delta_ns = kvm_riscv_delta_cycles2ns(t->next_cycles, gt, t);
+ hrtimer_forward_now(&t->hrt, ktime_set(0, delta_ns));
+ return HRTIMER_RESTART;
+ }
+
+ t->next_set = false;
+ kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_VS_TIMER);
+
+ return HRTIMER_NORESTART;
+}
+
+static int kvm_riscv_vcpu_timer_cancel(struct kvm_vcpu_timer *t)
+{
+ if (!t->init_done || !t->next_set)
+ return -EINVAL;
+
+ hrtimer_cancel(&t->hrt);
+ t->next_set = false;
+
+ return 0;
+}
+
+int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu, u64 ncycles)
+{
+ struct kvm_vcpu_timer *t = &vcpu->arch.timer;
+ struct kvm_guest_timer *gt = &vcpu->kvm->arch.timer;
+ u64 delta_ns;
+
+ if (!t->init_done)
+ return -EINVAL;
+
+ kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_VS_TIMER);
+
+ delta_ns = kvm_riscv_delta_cycles2ns(ncycles, gt, t);
+ t->next_cycles = ncycles;
+ hrtimer_start(&t->hrt, ktime_set(0, delta_ns), HRTIMER_MODE_REL);
+ t->next_set = true;
+
+ return 0;
+}
+
+int kvm_riscv_vcpu_get_reg_timer(struct kvm_vcpu *vcpu,
+ const struct kvm_one_reg *reg)
+{
+ struct kvm_vcpu_timer *t = &vcpu->arch.timer;
+ struct kvm_guest_timer *gt = &vcpu->kvm->arch.timer;
+ u64 __user *uaddr = (u64 __user *)(unsigned long)reg->addr;
+ unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+ KVM_REG_SIZE_MASK |
+ KVM_REG_RISCV_TIMER);
+ u64 reg_val;
+
+ if (KVM_REG_SIZE(reg->id) != sizeof(u64))
+ return -EINVAL;
+ if (reg_num >= sizeof(struct kvm_riscv_timer) / sizeof(u64))
+ return -EINVAL;
+
+ switch (reg_num) {
+ case KVM_REG_RISCV_TIMER_REG(frequency):
+ reg_val = riscv_timebase;
+ break;
+ case KVM_REG_RISCV_TIMER_REG(time):
+ reg_val = kvm_riscv_current_cycles(gt);
+ break;
+ case KVM_REG_RISCV_TIMER_REG(compare):
+ reg_val = t->next_cycles;
+ break;
+ case KVM_REG_RISCV_TIMER_REG(state):
+ reg_val = (t->next_set) ? KVM_RISCV_TIMER_STATE_ON :
+ KVM_RISCV_TIMER_STATE_OFF;
+ break;
+ default:
+ return -EINVAL;
+ };
+
+ if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id)))
+ return -EFAULT;
+
+ return 0;
+}
+
+int kvm_riscv_vcpu_set_reg_timer(struct kvm_vcpu *vcpu,
+ const struct kvm_one_reg *reg)
+{
+ struct kvm_vcpu_timer *t = &vcpu->arch.timer;
+ struct kvm_guest_timer *gt = &vcpu->kvm->arch.timer;
+ u64 __user *uaddr = (u64 __user *)(unsigned long)reg->addr;
+ unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+ KVM_REG_SIZE_MASK |
+ KVM_REG_RISCV_TIMER);
+ u64 reg_val;
+ int ret = 0;
+
+ if (KVM_REG_SIZE(reg->id) != sizeof(u64))
+ return -EINVAL;
+ if (reg_num >= sizeof(struct kvm_riscv_timer) / sizeof(u64))
+ return -EINVAL;
+
+ if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
+ return -EFAULT;
+
+ switch (reg_num) {
+ case KVM_REG_RISCV_TIMER_REG(frequency):
+ ret = -EOPNOTSUPP;
+ break;
+ case KVM_REG_RISCV_TIMER_REG(time):
+ gt->time_delta = reg_val - get_cycles64();
+ break;
+ case KVM_REG_RISCV_TIMER_REG(compare):
+ t->next_cycles = reg_val;
+ break;
+ case KVM_REG_RISCV_TIMER_REG(state):
+ if (reg_val == KVM_RISCV_TIMER_STATE_ON)
+ ret = kvm_riscv_vcpu_timer_next_event(vcpu, reg_val);
+ else
+ ret = kvm_riscv_vcpu_timer_cancel(t);
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ };
+
+ return ret;
+}
+
+int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_timer *t = &vcpu->arch.timer;
+
+ if (t->init_done)
+ return -EINVAL;
+
+ hrtimer_init(&t->hrt, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ t->hrt.function = kvm_riscv_vcpu_hrtimer_expired;
+ t->init_done = true;
+ t->next_set = false;
+
+ return 0;
+}
+
+int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu)
+{
+ int ret;
+
+ ret = kvm_riscv_vcpu_timer_cancel(&vcpu->arch.timer);
+ vcpu->arch.timer.init_done = false;
+
+ return ret;
+}
+
+int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu)
+{
+ return kvm_riscv_vcpu_timer_cancel(&vcpu->arch.timer);
+}
+
+void kvm_riscv_vcpu_timer_restore(struct kvm_vcpu *vcpu)
+{
+ struct kvm_guest_timer *gt = &vcpu->kvm->arch.timer;
+
+#ifdef CONFIG_64BIT
+ csr_write(CSR_HTIMEDELTA, gt->time_delta);
+#else
+ csr_write(CSR_HTIMEDELTA, (u32)(gt->time_delta));
+ csr_write(CSR_HTIMEDELTAH, (u32)(gt->time_delta >> 32));
+#endif
+}
+
+int kvm_riscv_guest_timer_init(struct kvm *kvm)
+{
+ struct kvm_guest_timer *gt = &kvm->arch.timer;
+
+ riscv_cs_get_mult_shift(&gt->nsec_mult, &gt->nsec_shift);
+ gt->time_delta = -get_cycles64();
+
+ return 0;
+}
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index 00a1a88008be..253c45ee20f9 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -26,7 +26,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return r;
}

- return 0;
+ return kvm_riscv_guest_timer_init(kvm);
}

void kvm_arch_destroy_vm(struct kvm *kvm)
diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
index c51c5ed15aa7..8e73c0a23910 100644
--- a/drivers/clocksource/timer-riscv.c
+++ b/drivers/clocksource/timer-riscv.c
@@ -13,6 +13,7 @@
#include <linux/delay.h>
#include <linux/irq.h>
#include <linux/irqdomain.h>
+#include <linux/module.h>
#include <linux/sched_clock.h>
#include <linux/io-64-nonatomic-lo-hi.h>
#include <linux/interrupt.h>
@@ -79,6 +80,13 @@ static int riscv_timer_dying_cpu(unsigned int cpu)
return 0;
}

+void riscv_cs_get_mult_shift(u32 *mult, u32 *shift)
+{
+ *mult = riscv_clocksource.mult;
+ *shift = riscv_clocksource.shift;
+}
+EXPORT_SYMBOL_GPL(riscv_cs_get_mult_shift);
+
/* called directly from the low-level interrupt handler */
static irqreturn_t riscv_timer_interrupt(int irq, void *dev_id)
{
diff --git a/include/clocksource/timer-riscv.h b/include/clocksource/timer-riscv.h
new file mode 100644
index 000000000000..d7f455754e60
--- /dev/null
+++ b/include/clocksource/timer-riscv.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Atish Patra <[email protected]>
+ */
+
+#ifndef __TIMER_RISCV_H
+#define __TIMER_RISCV_H
+
+#include <linux/types.h>
+
+extern void riscv_cs_get_mult_shift(u32 *mult, u32 *shift);
+
+#endif
--
2.25.1