Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp394389pxf; Thu, 11 Mar 2021 06:31:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJyt/y2CUVAb4MA7D47evDrjrkrRyCLtdl5MhYU6L82ToYHAmdi+nxk9TDobbjwM/tAWdj/i X-Received: by 2002:a17:906:3750:: with SMTP id e16mr3368688ejc.75.1615473113069; Thu, 11 Mar 2021 06:31:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615473113; cv=none; d=google.com; s=arc-20160816; b=fnZIS1Ktbetv4WNvRLhDEV7LlqGlzCVWNxEsbmDVcz4oYKqMmxiSV9LJkIU5hYh2pk LqcvzxqvKGyR0TqLWeZHtgAWqra7Zdxg7DmaxirBXfGHOt2QKlJ1Zknz7UNEiEQUKWgM kwfJA7KTg2Q11BUzCCVFd4EuamXiCZILQ6YHU8HTAbsHVKYcIgUieR922ymTum5HId0D Kxw4YqatcLouxPDPbSwNBMudvN7ZQVD1oJWlT0kye2y/aFKhN84xtF3e3eAQetMlszIi A/THlD30cp2MUVsxm8Ya5gCWoSzxCDWlY3M1M63urf3obnv8U3qYSuQ98hCOxP0IqRlC s4qw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=De7+/M/pzMX+mRUhCVJLWp6LOaGWpGtUQN2E1LS/WSc=; b=DvQKdzUJbgzex1W/jaEAHnKYRGjMfed2IKRZKZjgyev7bvlNIFGH0VIGPXWY6RUgio Ok5MlrMu9e6uppABxTLnVb7CGuHfhL8KTt8Y8rhTGqc2p+cUIj/csZISULx9eGm6NKfL T6+3v6R8Nj3EHa7Cn2RBzpRFdouY9MXPpGSUIVMhSdHEgA46wB+ZZ/N1TJpbw+y/V7x7 xgW3UrSAWpXI3TmiWySOzU68LsQHd7hSuKiUIfccadtw/YgW4SZXHKHHSgzJ4MDgk8mx HKR4wm7fYGFgK1U9mQoqNDnf0DBOq+h0wB8FkCRXL6g6cnSQzdaaQPbXNIfKEZfd/wDa Sd7Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m4si1924099edc.119.2021.03.11.06.31.28; Thu, 11 Mar 2021 06:31:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233760AbhCKO2n (ORCPT + 99 others); Thu, 11 Mar 2021 09:28:43 -0500 Received: from szxga07-in.huawei.com ([45.249.212.35]:13880 "EHLO szxga07-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233597AbhCKO2a (ORCPT ); Thu, 11 Mar 2021 09:28:30 -0500 Received: from DGGEMS403-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4DxB8Y626Xz9rwr; Thu, 11 Mar 2021 22:26:37 +0800 (CST) Received: from [10.174.184.42] (10.174.184.42) by DGGEMS403-HUB.china.huawei.com (10.3.19.203) with Microsoft SMTP Server id 14.3.498.0; Thu, 11 Mar 2021 22:28:18 +0800 Subject: Re: [RFC PATCH] kvm: arm64: Try stage2 block mapping for host device MMIO To: Marc Zyngier References: <20210122083650.21812-1-zhukeqian1@huawei.com> <87y2euf5d2.wl-maz@kernel.org> CC: , , , , Will Deacon , Catalin Marinas , Mark Rutland , James Morse , Robin Murphy , Joerg Roedel , Daniel Lezcano , Thomas Gleixner , "Suzuki K Poulose" , Julien Thierry , Andrew Morton , Alexios Zavras , , From: Keqian Zhu Message-ID: Date: Thu, 11 Mar 2021 22:28:17 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <87y2euf5d2.wl-maz@kernel.org> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.184.42] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Marc, On 2021/3/11 16:43, Marc Zyngier wrote: > Digging this patch back from my Inbox... Yeah, thanks ;-) > > On Fri, 22 Jan 2021 08:36:50 +0000, > Keqian Zhu wrote: >> >> The MMIO region of a device maybe huge (GB level), try to use block >> mapping in stage2 to speedup both map and unmap. >> >> Especially for unmap, it performs TLBI right after each invalidation >> of PTE. If all mapping is of PAGE_SIZE, it takes much time to handle >> GB level range. >> >> Signed-off-by: Keqian Zhu >> --- >> arch/arm64/include/asm/kvm_pgtable.h | 11 +++++++++++ >> arch/arm64/kvm/hyp/pgtable.c | 15 +++++++++++++++ >> arch/arm64/kvm/mmu.c | 12 ++++++++---- >> 3 files changed, 34 insertions(+), 4 deletions(-) >> >> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h >> index 52ab38db04c7..2266ac45f10c 100644 >> --- a/arch/arm64/include/asm/kvm_pgtable.h >> +++ b/arch/arm64/include/asm/kvm_pgtable.h >> @@ -82,6 +82,17 @@ struct kvm_pgtable_walker { >> const enum kvm_pgtable_walk_flags flags; >> }; >> >> +/** >> + * kvm_supported_pgsize() - Get the max supported page size of a mapping. >> + * @pgt: Initialised page-table structure. >> + * @addr: Virtual address at which to place the mapping. >> + * @end: End virtual address of the mapping. >> + * @phys: Physical address of the memory to map. >> + * >> + * The smallest return value is PAGE_SIZE. >> + */ >> +u64 kvm_supported_pgsize(struct kvm_pgtable *pgt, u64 addr, u64 end, u64 phys); >> + >> /** >> * kvm_pgtable_hyp_init() - Initialise a hypervisor stage-1 page-table. >> * @pgt: Uninitialised page-table structure to initialise. >> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c >> index bdf8e55ed308..ab11609b9b13 100644 >> --- a/arch/arm64/kvm/hyp/pgtable.c >> +++ b/arch/arm64/kvm/hyp/pgtable.c >> @@ -81,6 +81,21 @@ static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level) >> return IS_ALIGNED(addr, granule) && IS_ALIGNED(phys, granule); >> } >> >> +u64 kvm_supported_pgsize(struct kvm_pgtable *pgt, u64 addr, u64 end, u64 phys) >> +{ >> + u32 lvl; >> + u64 pgsize = PAGE_SIZE; >> + >> + for (lvl = pgt->start_level; lvl < KVM_PGTABLE_MAX_LEVELS; lvl++) { >> + if (kvm_block_mapping_supported(addr, end, phys, lvl)) { >> + pgsize = kvm_granule_size(lvl); >> + break; >> + } >> + } >> + >> + return pgsize; >> +} >> + >> static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level) >> { >> u64 shift = kvm_granule_shift(level); >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >> index 7d2257cc5438..80b403fc8e64 100644 >> --- a/arch/arm64/kvm/mmu.c >> +++ b/arch/arm64/kvm/mmu.c >> @@ -499,7 +499,8 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu) >> int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, >> phys_addr_t pa, unsigned long size, bool writable) >> { >> - phys_addr_t addr; >> + phys_addr_t addr, end; >> + unsigned long pgsize; >> int ret = 0; >> struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, }; >> struct kvm_pgtable *pgt = kvm->arch.mmu.pgt; >> @@ -509,21 +510,24 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, >> >> size += offset_in_page(guest_ipa); >> guest_ipa &= PAGE_MASK; >> + end = guest_ipa + size; >> >> - for (addr = guest_ipa; addr < guest_ipa + size; addr += PAGE_SIZE) { >> + for (addr = guest_ipa; addr < end; addr += pgsize) { >> ret = kvm_mmu_topup_memory_cache(&cache, >> kvm_mmu_cache_min_pages(kvm)); >> if (ret) >> break; >> >> + pgsize = kvm_supported_pgsize(pgt, addr, end, pa); >> + >> spin_lock(&kvm->mmu_lock); >> - ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot, >> + ret = kvm_pgtable_stage2_map(pgt, addr, pgsize, pa, prot, >> &cache); >> spin_unlock(&kvm->mmu_lock); >> if (ret) >> break; >> >> - pa += PAGE_SIZE; >> + pa += pgsize; >> } >> >> kvm_mmu_free_memory_cache(&cache); > > There is one issue with this patch, which is that it only does half > the job. A VM_PFNMAP VMA can definitely be faulted in dynamically, and > in that case we force this to be a page mapping. This conflicts with > what you are doing here. Oh yes, these two paths should keep a same mapping logic. I try to search the "force_pte" and find out some discussion [1] between you and Christoffer. And I failed to get a reason about forcing pte mapping for device MMIO region (expect that we want to keep a same logic with the eager mapping path). So if you don't object to it, I will try to implement block mapping for device MMIO in user_mem_abort(). > > There is also the fact that if we can map things on demand, why are we > still mapping these MMIO regions ahead of time? Indeed. Though this provides good *startup* performance for guest accessing MMIO, it's hard to keep the two paths in sync. We can keep this minor optimization or delete it to avoid hard maintenance, which one do you prefer? BTW, could you please have a look at my another patch series[2] about HW/SW combined dirty log? ;) Thanks, Keqian [1] https://patchwork.kernel.org/project/linux-arm-kernel/patch/20191211165651.7889-2-maz@kernel.org/ [2] https://lore.kernel.org/linux-arm-kernel/20210126124444.27136-1-zhukeqian1@huawei.com/ > > Thanks, > > M. >