Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp3409165pxu; Mon, 30 Nov 2020 02:26:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJyGoruwnwKgQy2NLrQ/URi9BjhLR8T6Bt2KWjfmKqoGC3DYcGDQ2DOnGRouygXvEvClIFi4 X-Received: by 2002:a17:906:28d4:: with SMTP id p20mr12403898ejd.322.1606731967957; Mon, 30 Nov 2020 02:26:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606731967; cv=none; d=google.com; s=arc-20160816; b=RMWgG0ZAg0Avc9r/AIGJj2lVPyWVhQoaqTEoCeD86riUOBsTRgGzhONbFPFmZ/zz6a ONQ5Bp0aQleWduGdrNy7jkuxbyh0wEoX4YSewa1Bu6m6HGH8KEEfrylR0Bd17lgVrNYL AQrWrzWjRBI7IoDXgzkAb9HLVGj6VNCP0mG9bohOmefhR0WqWLfyzdp+WdCD0xAvu3W2 hFPDoM0rYUU3+M/5q2LcTvDbbNIbGpDDbyS9BR13uEMfpqAytrao8bkNSA4KcTHXzJHE Ay/Fc2GvPVSVy0WZkTF75vALFXb2BSGvbvXrIP+zdkMqWKgz8F7nZwmcr9S3l1wyl7RK cO7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=fClK9dYPgSK1URS64wSbMulejn6T5yvpN8jKUjDOmSs=; b=OLQTCSZGZJscSw2NimK0Mx+x/osaRWqu1Z0Jw5O+bkpBaCYPFdmV8xx/L5Atsv1jv5 G50oUKeHZGHKmPdA0uRqshZHDm29Pd9VNS06zhXKwnlfcLPGHSOfXHr85oSr7OZaFks0 +DmY6vnLkcfOab3Lt/oyTpZMFZQsXJ6AvAnEc3i0760w0TtWca8l3IKqgkb4jcqs3Xow hVZ8n2pjDMykc4yTqlJHOBLH/ZpdUNYtSHdWV/2GA21bhi9mejMFnaOHCkR8Sd9a8oNE FQ1GAITFipKEufDo/ECkpD2fx5SKkMiFJCzy8jfpLoBvj6jdVsjEVjj3JJXj4DmqKf1I 90fg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@brainfault-org.20150623.gappssmtp.com header.s=20150623 header.b=VDQYTXZA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l4si10567977edt.49.2020.11.30.02.25.45; Mon, 30 Nov 2020 02:26:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@brainfault-org.20150623.gappssmtp.com header.s=20150623 header.b=VDQYTXZA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728652AbgK3KWZ (ORCPT + 99 others); Mon, 30 Nov 2020 05:22:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727571AbgK3KWZ (ORCPT ); Mon, 30 Nov 2020 05:22:25 -0500 Received: from mail-wm1-x342.google.com (mail-wm1-x342.google.com [IPv6:2a00:1450:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9A3DC0613D3 for ; Mon, 30 Nov 2020 02:21:44 -0800 (PST) Received: by mail-wm1-x342.google.com with SMTP id a6so4316171wmc.2 for ; Mon, 30 Nov 2020 02:21:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brainfault-org.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fClK9dYPgSK1URS64wSbMulejn6T5yvpN8jKUjDOmSs=; b=VDQYTXZA9JFPGhZ+UKdSNhvFmqlc/Lc5kkB3FkebjR6HzNooh88aluRs6YSmHfsBcI Kgxj1m8crgwwpPEWkBssWVhXvuL54FcDlf/uu65aH9JbMQ7Zdk7ObiptMwd+/TcbfX6v cHD1y9SFGO1h3i/lY4cKPEv8xUiFGYTnRhd0Vhvtl6JZ0WzYrGWYHbcvgUk75FtfJZU8 0m4QsIF7NpfLKUdoiwnYdjV2s/PV75ReEQOIEmZUvJOFBFfS5mqIBglFGsQP51cMOO9o AoXAwFIiwF2AOyfdXvfs7jy3cq5K2R0luMhl+t32By/ye7m0I06pkouc9hgDpFL4b/4M hwtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fClK9dYPgSK1URS64wSbMulejn6T5yvpN8jKUjDOmSs=; b=R1nGZF4TFps14+S6ijiVNSS5PoknqxO1DA8VDPF0wtA5HCFdAb2y+2BrUnV/D1fiyx xkqsTRR6UQglgfGihcB/5gcfqP/9ggkssn/FTYQjGYqhBx1Z/iLnqPt9FXgSw2svxUnd hOVIcyLHNNX/IqQwI0s03atpJkAhcoBeUxyEqC7irvLTUb/b2ePtNGHVySo9hVo20fjD 7B+D6Gn9cZPA0HMzBYXrEfPZ+nClbbUKQFxWBI4ASdogiEsR8NayBhVmAMTC/3aR0Vsp fNAlboZls05Zxk2xjrlJeIT8aOqA0xa2LCXyaB630eUB3sr7LbaQwMyqHhZeMPGOOIDh Lopg== X-Gm-Message-State: AOAM533fVzPS2tDsdWAJ1/eblTyWBiCzxFO/aWp7FRdioT9Q4q3KdtCa 2/YczvhdXAlN9aAHi0AnFDcm1vEgJH6JizobHq+PHQ== X-Received: by 2002:a1c:1fc2:: with SMTP id f185mr4115261wmf.134.1606731703433; Mon, 30 Nov 2020 02:21:43 -0800 (PST) MIME-Version: 1.0 References: <20201109113240.3733496-1-anup.patel@wdc.com> <20201109113240.3733496-11-anup.patel@wdc.com> <186ade3c372b44ef8ca1830da8c5002b@huawei.com> In-Reply-To: From: Anup Patel Date: Mon, 30 Nov 2020 15:51:31 +0530 Message-ID: Subject: Re: [PATCH v15 10/17] RISC-V: KVM: Implement stage2 page table programming To: Jiangyifei Cc: Anup Patel , Palmer Dabbelt , Palmer Dabbelt , Paul Walmsley , Albert Ou , Paolo Bonzini , Alexander Graf , Atish Patra , Alistair Francis , Damien Le Moal , "kvm@vger.kernel.org" , "kvm-riscv@lists.infradead.org" , "linux-riscv@lists.infradead.org" , "linux-kernel@vger.kernel.org" , "Zhangxiaofeng (F)" , "Wubin (H)" , "dengkai (A)" , yinyipeng Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 24, 2020 at 2:56 PM Anup Patel wrote: > > On Mon, Nov 16, 2020 at 2:59 PM Jiangyifei wrote: > > > > > > > -----Original Message----- > > > From: Anup Patel [mailto:anup.patel@wdc.com] > > > Sent: Monday, November 9, 2020 7:33 PM > > > To: Palmer Dabbelt ; Palmer Dabbelt > > > ; Paul Walmsley ; > > > Albert Ou ; Paolo Bonzini > > > Cc: Alexander Graf ; Atish Patra ; > > > Alistair Francis ; Damien Le Moal > > > ; Anup Patel ; > > > kvm@vger.kernel.org; kvm-riscv@lists.infradead.org; > > > linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Anup Patel > > > ; Jiangyifei > > > Subject: [PATCH v15 10/17] RISC-V: KVM: Implement stage2 page table > > > programming > > > > > > This patch implements all required functions for programming the stage2 page > > > table for each Guest/VM. > > > > > > At high-level, the flow of stage2 related functions is similar from KVM > > > ARM/ARM64 implementation but the stage2 page table format is quite > > > different for KVM RISC-V. > > > > > > [jiangyifei: stage2 dirty log support] > > > Signed-off-by: Yifei Jiang > > > Signed-off-by: Anup Patel > > > Acked-by: Paolo Bonzini > > > Reviewed-by: Paolo Bonzini > > > --- > > > arch/riscv/include/asm/kvm_host.h | 12 + > > > arch/riscv/include/asm/pgtable-bits.h | 1 + > > > arch/riscv/kvm/Kconfig | 1 + > > > arch/riscv/kvm/main.c | 19 + > > > arch/riscv/kvm/mmu.c | 649 > > > +++++++++++++++++++++++++- > > > arch/riscv/kvm/vm.c | 6 - > > > 6 files changed, 672 insertions(+), 16 deletions(-) > > > > > > > ...... > > > > > > > > int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, @@ -69,27 +562,163 @@ > > > int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, > > > gpa_t gpa, unsigned long hva, > > > bool writeable, bool is_write) > > > { > > > - /* TODO: */ > > > - return 0; > > > + int ret; > > > + kvm_pfn_t hfn; > > > + short vma_pageshift; > > > + gfn_t gfn = gpa >> PAGE_SHIFT; > > > + struct vm_area_struct *vma; > > > + struct kvm *kvm = vcpu->kvm; > > > + struct kvm_mmu_page_cache *pcache = &vcpu->arch.mmu_page_cache; > > > + bool logging = (memslot->dirty_bitmap && > > > + !(memslot->flags & KVM_MEM_READONLY)) ? true : false; > > > + unsigned long vma_pagesize; > > > + > > > + mmap_read_lock(current->mm); > > > + > > > + vma = find_vma_intersection(current->mm, hva, hva + 1); > > > + if (unlikely(!vma)) { > > > + kvm_err("Failed to find VMA for hva 0x%lx\n", hva); > > > + mmap_read_unlock(current->mm); > > > + return -EFAULT; > > > + } > > > + > > > + if (is_vm_hugetlb_page(vma)) > > > + vma_pageshift = huge_page_shift(hstate_vma(vma)); > > > + else > > > + vma_pageshift = PAGE_SHIFT; > > > + vma_pagesize = 1ULL << vma_pageshift; > > > + if (logging || (vma->vm_flags & VM_PFNMAP)) > > > + vma_pagesize = PAGE_SIZE; > > > + > > > + if (vma_pagesize == PMD_SIZE || vma_pagesize == PGDIR_SIZE) > > > + gfn = (gpa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; > > > + > > > + mmap_read_unlock(current->mm); > > > + > > > + if (vma_pagesize != PGDIR_SIZE && > > > + vma_pagesize != PMD_SIZE && > > > + vma_pagesize != PAGE_SIZE) { > > > + kvm_err("Invalid VMA page size 0x%lx\n", vma_pagesize); > > > + return -EFAULT; > > > + } > > > + > > > + /* We need minimum second+third level pages */ > > > + ret = stage2_cache_topup(pcache, stage2_pgd_levels, > > > + KVM_MMU_PAGE_CACHE_NR_OBJS); > > > + if (ret) { > > > + kvm_err("Failed to topup stage2 cache\n"); > > > + return ret; > > > + } > > > + > > > + hfn = gfn_to_pfn_prot(kvm, gfn, is_write, NULL); > > > + if (hfn == KVM_PFN_ERR_HWPOISON) { > > > + send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva, > > > + vma_pageshift, current); > > > + return 0; > > > + } > > > + if (is_error_noslot_pfn(hfn)) > > > + return -EFAULT; > > > + > > > + /* > > > + * If logging is active then we allow writable pages only > > > + * for write faults. > > > + */ > > > + if (logging && !is_write) > > > + writeable = false; > > > + > > > + spin_lock(&kvm->mmu_lock); > > > + > > > + if (writeable) { > > > > Hi Anup, > > > > What is the purpose of "writable = !memslot_is_readonly(slot)" in this series? > > Where ? I don't see this line in any of the patches. > > > > > When mapping the HVA to HPA above, it doesn't know that the PTE writeable of stage2 is "!memslot_is_readonly(slot)". > > This may causes the difference between the writability of HVA->HPA and GPA->HPA. > > For example, GPA->HPA is writeable, but HVA->HPA is not writeable. > > Yes, this is possible particularly when Host kernel is updating writability > of HVA->HPA mappings for swapping in/out pages. > > > > > Is it better that the writability of HVA->HPA is also determined by whether the memslot is readonly in this change? > > Like this: > > - hfn = gfn_to_pfn_prot(kvm, gfn, is_write, NULL); > > + hfn = gfn_to_pfn_prot(kvm, gfn, writeable, NULL); > > The gfn_to_pfn_prot() needs to know what type of fault we > got (i.e read/write fault). Rest of the information (such as whether > slot is writable or not) is already available to gfn_to_pfn_prot(). > > The question here is should we pass "&writeable" or NULL as > last parameter to gfn_to_pfn_prot(). The recent JUMP label > support in Linux RISC-V causes problem on HW where PTE > 'A' and 'D' bits are not updated by HW so I have to change > last parameter of gfn_to_pfn_prot() from "&writeable" to NULL. > > I am still investigating this. This turned-out to be a bug in Spike which is not fixed. I will include following change in v16 patch series: diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index 241030956d47..dc2666b4180b 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -232,8 +232,7 @@ void __kvm_riscv_hfence_gvma_all(void); int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, - gpa_t gpa, unsigned long hva, - bool writeable, bool is_write); + gpa_t gpa, unsigned long hva, bool is_write); void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu); int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm); void kvm_riscv_stage2_free_pgd(struct kvm *kvm); diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index fcaeadc9b34d..56fda9ef70fd 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -689,11 +689,11 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva) int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, - gpa_t gpa, unsigned long hva, - bool writeable, bool is_write) + gpa_t gpa, unsigned long hva, bool is_write) { int ret; kvm_pfn_t hfn; + bool writeable; short vma_pageshift; gfn_t gfn = gpa >> PAGE_SHIFT; struct vm_area_struct *vma; @@ -742,7 +742,7 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, mmu_seq = kvm->mmu_notifier_seq; - hfn = gfn_to_pfn_prot(kvm, gfn, is_write, NULL); + hfn = gfn_to_pfn_prot(kvm, gfn, is_write, &writeable); if (hfn == KVM_PFN_ERR_HWPOISON) { send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva, vma_pageshift, current); diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c index f054406792a6..058cfa168abe 100644 --- a/arch/riscv/kvm/vcpu_exit.c +++ b/arch/riscv/kvm/vcpu_exit.c @@ -445,7 +445,7 @@ static int stage2_page_fault(struct kvm_vcpu *vcpu, struct kvm_run *run, }; } - ret = kvm_riscv_stage2_map(vcpu, memslot, fault_addr, hva, writeable, + ret = kvm_riscv_stage2_map(vcpu, memslot, fault_addr, hva, (trap->scause == EXC_STORE_GUEST_PAGE_FAULT) ? true : false); if (ret < 0) return ret; Regards, Anup