Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp1159811ybl; Thu, 22 Aug 2019 10:11:05 -0700 (PDT) X-Google-Smtp-Source: APXvYqwVng5BMUfdVma4NyjD0wNcG/lRI+kyY+ndB31frhINDbOI+OiSy9zTN/isAxyz+JMgA9Rn X-Received: by 2002:a17:902:bb95:: with SMTP id m21mr1200603pls.26.1566493865033; Thu, 22 Aug 2019 10:11:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566493865; cv=none; d=google.com; s=arc-20160816; b=CWs01qZpXaiojPS8GKhF1VMZNti+2Q/u9d7U3bfmK31wsO3SIN0EUepAXiSpj4L8bn Zk/f4hj1zyHGTunUgmhT3TOj7LPvZhg2bquYPbCAIInrXbCLuhXqmVqjopf/JC6wpyJQ 2yBM6eY412Tv/ANbPRmwO6/dXMeAmcB5wrlh1U49E0RxRAyhmtIOhHrHClrxj5BYCOAS PedvPLF1BO09kGFks/wa94+tTOOl4TWzBzTd3zPwds8HH8/59nSEn3y81quREp+zOI8f YCXJAwxfg21Mg44ZOSHQP1sXzNhpp0BQAO5EUHAS16kZBg1D7/foZU2v9C6NNI55h4qf kk3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=uYYk+VSWs9B+t21aWspakKLUywF9Fl0vAMFkw3Q8uEw=; b=y+vXdybyzeEdkB1gLSHdjKCdGlZuYWtuFYtPPqtv/GDRzyq284Z0EiRY7QOGK2DAyc Zm4LcM+DQUmedJ/Lhw/JOE+8OOFUI7r4U6tuxu80YSjQ0duvuPai8M7bbrMr9SQTbS8l IPDDxOjR6FFSO9QdZTHC6mxz7A+x0Jz5/RzQzx42fH8nRwMxh8Zautvj/TjjPmVPQC4i 7SbvmNwXsgClT0Mc0AZUmYXgVPjKPxBWZN4ivUSsZn784n1rADxmucqyit2blB1izi/x FDfiZq/ccPA23StpMXyYaacO+e3fQfldThJAhlCrR/sQforVvHU91z8SxgnnQ0h08hj7 qyqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@brainfault-org.20150623.gappssmtp.com header.s=20150623 header.b=MrTrr3Ig; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x17si73503plr.112.2019.08.22.10.10.44; Thu, 22 Aug 2019 10:11:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@brainfault-org.20150623.gappssmtp.com header.s=20150623 header.b=MrTrr3Ig; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731886AbfHVN4J (ORCPT + 99 others); Thu, 22 Aug 2019 09:56:09 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:46621 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730312AbfHVN4J (ORCPT ); Thu, 22 Aug 2019 09:56:09 -0400 Received: by mail-wr1-f65.google.com with SMTP id z1so5503383wru.13 for ; Thu, 22 Aug 2019 06:56:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brainfault-org.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=uYYk+VSWs9B+t21aWspakKLUywF9Fl0vAMFkw3Q8uEw=; b=MrTrr3IgiLWKpURbAsdC5kpxZUNMXR8sFUaMTKo0GgP8WEx5bhqaCr/gJIEFAtn75f Q6D0+ODyrqGK5W55o+H44PXrBWmPj1+Izy+CN18yxghXnR/fSlNPGIGhSkb0hq4MvyRu bICm+pYr+p1DlqOtHsP8EDhD8DYlRaj6CqasJpBz67oGk6eCQJmN/kEkJ3Nnp3YBmmBz 8yUrvxAiOZCLO1QCD3ggdnPwNTgcboPOeg6UfihyrLqKTdELkgahRXoUqVGBAZ2ovesW mc4Z4EuODlyuwHMnFjsPkGkfkGR5iZxY2XOlPeF3hQWklaT91OOLxpdVdmNNk85cMCMw 4yLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=uYYk+VSWs9B+t21aWspakKLUywF9Fl0vAMFkw3Q8uEw=; b=eBCdnZWvryu/MygLj1vUgyRJJ0uDnxJxwg8lzC9qgDQdhvNtS/UIwilbceTn7eVH/Y HawAm9NkdD41b2Pj7yyJ90OJ74vesugEsbrj4w6lOmIITsI1ETVzbZ0kiKasR7DUnPnc fP6E+DdnK3GAfBcZ4WC4urrn58NiTecx3kSKPiFE8vjrbe6uUkdgJTNtg0OexDgV7lvr fkqyfBZUIwNZ/erHX0DjJ//i4Ehx8zatTVHYNwf2qLjB795uxGDwWd8PRMwLRJdRbs9N MkN+ylanusI8MQPr+1mIDCLDDf15TW+oZFs36VXPD7EJbwlhb1DB8mU+hPiMY7HYwRpY w/qg== X-Gm-Message-State: APjAAAVQ7vu1UKWaTX/zPkZSNyDPXec/4JsLAEt3A97l3R/gCWZv44Uk 5PFir6WhtXvJAbQAS7F75q4+Hi3l7QJ6pRlECv5kOw== X-Received: by 2002:a05:6000:10cf:: with SMTP id b15mr4508377wrx.180.1566482166147; Thu, 22 Aug 2019 06:56:06 -0700 (PDT) MIME-Version: 1.0 References: <20190822084131.114764-1-anup.patel@wdc.com> <20190822084131.114764-11-anup.patel@wdc.com> <917cea87-42c0-e50a-6508-d5b577c8b702@amazon.com> <4fe83f28-3a55-e74c-0d40-1cd556015fea@amazon.com> In-Reply-To: <4fe83f28-3a55-e74c-0d40-1cd556015fea@amazon.com> From: Anup Patel Date: Thu, 22 Aug 2019 19:25:54 +0530 Message-ID: Subject: Re: [PATCH v5 10/20] RISC-V: KVM: Handle MMIO exits for VCPU To: Alexander Graf Cc: Anup Patel , Palmer Dabbelt , Paul Walmsley , Paolo Bonzini , Radim K , Daniel Lezcano , Thomas Gleixner , Atish Patra , Alistair Francis , Damien Le Moal , Christoph Hellwig , "kvm@vger.kernel.org" , "linux-riscv@lists.infradead.org" , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 22, 2019 at 6:55 PM Alexander Graf wrote: > > > > On 22.08.19 14:33, Anup Patel wrote: > > On Thu, Aug 22, 2019 at 5:44 PM Alexander Graf wrote: > >> > >> On 22.08.19 10:44, Anup Patel wrote: > >>> We will get stage2 page faults whenever Guest/VM access SW emulated > >>> MMIO device or unmapped Guest RAM. > >>> > >>> This patch implements MMIO read/write emulation by extracting MMIO > >>> details from the trapped load/store instruction and forwarding the > >>> MMIO read/write to user-space. The actual MMIO emulation will happen > >>> in user-space and KVM kernel module will only take care of register > >>> updates before resuming the trapped VCPU. > >>> > >>> The handling for stage2 page faults for unmapped Guest RAM will be > >>> implemeted by a separate patch later. > >>> > >>> Signed-off-by: Anup Patel > >>> Acked-by: Paolo Bonzini > >>> Reviewed-by: Paolo Bonzini > >>> --- > >>> arch/riscv/include/asm/kvm_host.h | 11 + > >>> arch/riscv/kvm/mmu.c | 7 + > >>> arch/riscv/kvm/vcpu_exit.c | 436 +++++++++++++++++++++++++++++- > >>> 3 files changed, 451 insertions(+), 3 deletions(-) > >>> > >>> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h > >>> index 18f1097f1d8d..4388bace6d70 100644 > >>> --- a/arch/riscv/include/asm/kvm_host.h > >>> +++ b/arch/riscv/include/asm/kvm_host.h > >>> @@ -53,6 +53,12 @@ struct kvm_arch { > >>> phys_addr_t pgd_phys; > >>> }; > >>> > >>> +struct kvm_mmio_decode { > >>> + unsigned long insn; > >>> + int len; > >>> + int shift; > >>> +}; > >>> + > >>> struct kvm_cpu_context { > >>> unsigned long zero; > >>> unsigned long ra; > >>> @@ -141,6 +147,9 @@ struct kvm_vcpu_arch { > >>> unsigned long irqs_pending; > >>> unsigned long irqs_pending_mask; > >>> > >>> + /* MMIO instruction details */ > >>> + struct kvm_mmio_decode mmio_decode; > >>> + > >>> /* VCPU power-off state */ > >>> bool power_off; > >>> > >>> @@ -160,6 +169,8 @@ static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {} > >>> int kvm_riscv_setup_vsip(void); > >>> void kvm_riscv_cleanup_vsip(void); > >>> > >>> +int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva, > >>> + bool is_write); > >>> void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu); > >>> int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm); > >>> void kvm_riscv_stage2_free_pgd(struct kvm *kvm); > >>> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c > >>> index 04dd089b86ff..2b965f9aac07 100644 > >>> --- a/arch/riscv/kvm/mmu.c > >>> +++ b/arch/riscv/kvm/mmu.c > >>> @@ -61,6 +61,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, > >>> return 0; > >>> } > >>> > >>> +int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long hva, > >>> + bool is_write) > >>> +{ > >>> + /* TODO: */ > >>> + return 0; > >>> +} > >>> + > >>> void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu) > >>> { > >>> /* TODO: */ > >>> diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c > >>> index e4d7c8f0807a..efc06198c259 100644 > >>> --- a/arch/riscv/kvm/vcpu_exit.c > >>> +++ b/arch/riscv/kvm/vcpu_exit.c > >>> @@ -6,9 +6,371 @@ > >>> * Anup Patel > >>> */ > >>> > >>> +#include > >>> #include > >>> #include > >>> #include > >>> +#include > >>> + > >>> +#define INSN_MATCH_LB 0x3 > >>> +#define INSN_MASK_LB 0x707f > >>> +#define INSN_MATCH_LH 0x1003 > >>> +#define INSN_MASK_LH 0x707f > >>> +#define INSN_MATCH_LW 0x2003 > >>> +#define INSN_MASK_LW 0x707f > >>> +#define INSN_MATCH_LD 0x3003 > >>> +#define INSN_MASK_LD 0x707f > >>> +#define INSN_MATCH_LBU 0x4003 > >>> +#define INSN_MASK_LBU 0x707f > >>> +#define INSN_MATCH_LHU 0x5003 > >>> +#define INSN_MASK_LHU 0x707f > >>> +#define INSN_MATCH_LWU 0x6003 > >>> +#define INSN_MASK_LWU 0x707f > >>> +#define INSN_MATCH_SB 0x23 > >>> +#define INSN_MASK_SB 0x707f > >>> +#define INSN_MATCH_SH 0x1023 > >>> +#define INSN_MASK_SH 0x707f > >>> +#define INSN_MATCH_SW 0x2023 > >>> +#define INSN_MASK_SW 0x707f > >>> +#define INSN_MATCH_SD 0x3023 > >>> +#define INSN_MASK_SD 0x707f > >>> + > >>> +#define INSN_MATCH_C_LD 0x6000 > >>> +#define INSN_MASK_C_LD 0xe003 > >>> +#define INSN_MATCH_C_SD 0xe000 > >>> +#define INSN_MASK_C_SD 0xe003 > >>> +#define INSN_MATCH_C_LW 0x4000 > >>> +#define INSN_MASK_C_LW 0xe003 > >>> +#define INSN_MATCH_C_SW 0xc000 > >>> +#define INSN_MASK_C_SW 0xe003 > >>> +#define INSN_MATCH_C_LDSP 0x6002 > >>> +#define INSN_MASK_C_LDSP 0xe003 > >>> +#define INSN_MATCH_C_SDSP 0xe002 > >>> +#define INSN_MASK_C_SDSP 0xe003 > >>> +#define INSN_MATCH_C_LWSP 0x4002 > >>> +#define INSN_MASK_C_LWSP 0xe003 > >>> +#define INSN_MATCH_C_SWSP 0xc002 > >>> +#define INSN_MASK_C_SWSP 0xe003 > >>> + > >>> +#define INSN_LEN(insn) ((((insn) & 0x3) < 0x3) ? 2 : 4) > >>> + > >>> +#ifdef CONFIG_64BIT > >>> +#define LOG_REGBYTES 3 > >>> +#else > >>> +#define LOG_REGBYTES 2 > >>> +#endif > >>> +#define REGBYTES (1 << LOG_REGBYTES) > >>> + > >>> +#define SH_RD 7 > >>> +#define SH_RS1 15 > >>> +#define SH_RS2 20 > >>> +#define SH_RS2C 2 > >>> + > >>> +#define RV_X(x, s, n) (((x) >> (s)) & ((1 << (n)) - 1)) > >>> +#define RVC_LW_IMM(x) ((RV_X(x, 6, 1) << 2) | \ > >>> + (RV_X(x, 10, 3) << 3) | \ > >>> + (RV_X(x, 5, 1) << 6)) > >>> +#define RVC_LD_IMM(x) ((RV_X(x, 10, 3) << 3) | \ > >>> + (RV_X(x, 5, 2) << 6)) > >>> +#define RVC_LWSP_IMM(x) ((RV_X(x, 4, 3) << 2) | \ > >>> + (RV_X(x, 12, 1) << 5) | \ > >>> + (RV_X(x, 2, 2) << 6)) > >>> +#define RVC_LDSP_IMM(x) ((RV_X(x, 5, 2) << 3) | \ > >>> + (RV_X(x, 12, 1) << 5) | \ > >>> + (RV_X(x, 2, 3) << 6)) > >>> +#define RVC_SWSP_IMM(x) ((RV_X(x, 9, 4) << 2) | \ > >>> + (RV_X(x, 7, 2) << 6)) > >>> +#define RVC_SDSP_IMM(x) ((RV_X(x, 10, 3) << 3) | \ > >>> + (RV_X(x, 7, 3) << 6)) > >>> +#define RVC_RS1S(insn) (8 + RV_X(insn, SH_RD, 3)) > >>> +#define RVC_RS2S(insn) (8 + RV_X(insn, SH_RS2C, 3)) > >>> +#define RVC_RS2(insn) RV_X(insn, SH_RS2C, 5) > >>> + > >>> +#define SHIFT_RIGHT(x, y) \ > >>> + ((y) < 0 ? ((x) << -(y)) : ((x) >> (y))) > >>> + > >>> +#define REG_MASK \ > >>> + ((1 << (5 + LOG_REGBYTES)) - (1 << LOG_REGBYTES)) > >>> + > >>> +#define REG_OFFSET(insn, pos) \ > >>> + (SHIFT_RIGHT((insn), (pos) - LOG_REGBYTES) & REG_MASK) > >>> + > >>> +#define REG_PTR(insn, pos, regs) \ > >>> + (ulong *)((ulong)(regs) + REG_OFFSET(insn, pos)) > >>> + > >>> +#define GET_RM(insn) (((insn) >> 12) & 7) > >>> + > >>> +#define GET_RS1(insn, regs) (*REG_PTR(insn, SH_RS1, regs)) > >>> +#define GET_RS2(insn, regs) (*REG_PTR(insn, SH_RS2, regs)) > >>> +#define GET_RS1S(insn, regs) (*REG_PTR(RVC_RS1S(insn), 0, regs)) > >>> +#define GET_RS2S(insn, regs) (*REG_PTR(RVC_RS2S(insn), 0, regs)) > >>> +#define GET_RS2C(insn, regs) (*REG_PTR(insn, SH_RS2C, regs)) > >>> +#define GET_SP(regs) (*REG_PTR(2, 0, regs)) > >>> +#define SET_RD(insn, regs, val) (*REG_PTR(insn, SH_RD, regs) = (val)) > >>> +#define IMM_I(insn) ((s32)(insn) >> 20) > >>> +#define IMM_S(insn) (((s32)(insn) >> 25 << 5) | \ > >>> + (s32)(((insn) >> 7) & 0x1f)) > >>> +#define MASK_FUNCT3 0x7000 > >>> + > >>> +#define STR(x) XSTR(x) > >>> +#define XSTR(x) #x > >>> + > >>> +/* TODO: Handle traps due to unpriv load and redirect it back to VS-mode */ > >>> +static ulong get_insn(struct kvm_vcpu *vcpu) > >>> +{ > >>> + ulong __sepc = vcpu->arch.guest_context.sepc; > >>> + ulong __hstatus, __sstatus, __vsstatus; > >>> +#ifdef CONFIG_RISCV_ISA_C > >>> + ulong rvc_mask = 3, tmp; > >>> +#endif > >>> + ulong flags, val; > >>> + > >>> + local_irq_save(flags); > >>> + > >>> + __vsstatus = csr_read(CSR_VSSTATUS); > >>> + __sstatus = csr_read(CSR_SSTATUS); > >>> + __hstatus = csr_read(CSR_HSTATUS); > >>> + > >>> + csr_write(CSR_VSSTATUS, __vsstatus | SR_MXR); > >>> + csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus | SR_MXR); > >>> + csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV); > >> > >> What happens when the insn load triggers a page fault, maybe because the > >> guest was malicious and did > >> > >> 1) Run on page 0x1000 > >> 2) Remove map for 0x1000, do *not* flush TLB > >> 3) Trigger MMIO > >> > >> That would DOS the host here, as the host kernel would continue running > >> in guest address space, right? > > > > Yes, we can certainly fault while accessing Guest instruction. We will > > be fixing this issue in a followup series. We have mentioned this in cover > > letter as well. > > I don't think the cover letter is the right place for such a comment. > Please definitely put it into the code as well, pointing out that this > is a known bug. Or even better yet: Fix it up properly :). > > In fact, with a bug that dramatic, I'm not even sure we can safely > include the code. We're consciously allowing user space to DOS the kernel. There is already a TODO comment above get_insn() function. > > > > > BTW, RISC-V spec is going to further improve to provide easy > > access of faulting instruction to Hypervisor. > > (Refer, https://github.com/riscv/riscv-isa-manual/issues/431) > > Yes, we have similar extensions on other archs. Is this going to be an > optional addition or a mandatory bit of the hypervisor spec? If it's not > mandatory, we can not rely on it, so the current path has to be safe. Yes, it's going to be optional so we are certainly going to fix this issue here. This issue discussed in previous patch reviews. We have already agreed to fix this in next revision. Regards, Anup > > > Alex