Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3991732pxb; Tue, 26 Jan 2021 09:33:52 -0800 (PST) X-Google-Smtp-Source: ABdhPJzTxSWkcPG73VYDJgEATO9rJwaZVCBBw5y7aUmyl7n8SOHrStgYjgxITGbfxnD7nYWDMIHZ X-Received: by 2002:a05:6402:1c0f:: with SMTP id ck15mr5267884edb.171.1611682432553; Tue, 26 Jan 2021 09:33:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611682432; cv=none; d=google.com; s=arc-20160816; b=pbK1xHZTkGgDJz7wTSqNo7D706anuqhDQBSeoabx07MKJ72ql41JJulvZ0PLLN03kh 3gajul41oaf9gZbodEurwMF9JYzkB8y98NSZk/M7dGkFjU0U6KmKxokuld7E0AMGR6uh x6r02yJv5HqysqoZN6IxZECrQKCQnKM6bHQRb+zKlJH+RgUswqOFX8esZkncaNYLo1YR bx2yASiXXMpDmELwqmBP0dKn8f/OLcZZbvLnR3qe3u939tjcfVKGTsMv/5T0wP+9OdQ1 L5wx8lqpL/IKbKMKy+geC9H0pmZCY064XeHQRJATRTDCxnxCQYXFXByi/MrZZfPTyf5X JOaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=vRDtU4uXZ3r4eawr52M9HfHQ2b7sH9j9zc5LfX4yTLY=; b=a+XbPDgpDkOq3Tv2Q+JDsZyctlnj5VzX3aZfumAdfFqXnmcrNwUurxJ1RaGGvOt/1k lZA0kvpDklzN1/4D73AM0cHQc3+5CM2wAvLk6qoFf1KX97/UpHxOl/p0RV+vCzUvGEHK ujXrj7yYhk71j2xapqUQ/qdfBLNO/XyuyiUDSr+UUSMyCCtMHNzY5KX1Qb0AKwGl8Dcx p6sysdnp7UoXx9QhbXsIeSVyupy5ZAz1Yqu7ajrVgNexYBp7JsRTn28IIG7IcygaNdqS QR+HIHlIbMNFr+ryOQ6hu5URnd83M2ihzwcKVO0ftGml7tixM84Vnc6QIsGSXGTLzk/M 14IA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id kl4si4411694ejc.341.2021.01.26.09.33.25; Tue, 26 Jan 2021 09:33:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730697AbhAZGW6 (ORCPT + 99 others); Tue, 26 Jan 2021 01:22:58 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:11158 "EHLO szxga04-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728828AbhAYN3s (ORCPT ); Mon, 25 Jan 2021 08:29:48 -0500 Received: from DGGEMS410-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4DPSN85MRQz15yBZ; Mon, 25 Jan 2021 19:30:32 +0800 (CST) Received: from [10.174.184.42] (10.174.184.42) by DGGEMS410-HUB.china.huawei.com (10.3.19.210) with Microsoft SMTP Server id 14.3.498.0; Mon, 25 Jan 2021 19:31:36 +0800 Subject: Re: [RFC PATCH] kvm: arm64: Try stage2 block mapping for host device MMIO To: Marc Zyngier References: <20210122083650.21812-1-zhukeqian1@huawei.com> <09d89355cdbbd19c456699774a9a980a@kernel.org> CC: , , , , Will Deacon , Catalin Marinas , Mark Rutland , James Morse , Robin Murphy , Joerg Roedel , Daniel Lezcano , Thomas Gleixner , "Suzuki K Poulose" , Julien Thierry , Andrew Morton , Alexios Zavras , , , From: Keqian Zhu Message-ID: <3526e416-aec3-9716-4c45-82aa962cd474@huawei.com> Date: Mon, 25 Jan 2021 19:31:36 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.184.42] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I forget to give the link of the bugfix I mentioned below :-). [1] https://lkml.org/lkml/2020/5/1/1294 On 2021/1/25 19:25, Keqian Zhu wrote: > Hi Marc, > > On 2021/1/22 17:45, Marc Zyngier wrote: >> On 2021-01-22 08:36, Keqian Zhu wrote: >>> The MMIO region of a device maybe huge (GB level), try to use block >>> mapping in stage2 to speedup both map and unmap. >>> >>> Especially for unmap, it performs TLBI right after each invalidation >>> of PTE. If all mapping is of PAGE_SIZE, it takes much time to handle >>> GB level range. >> >> This is only on VM teardown, right? Or do you unmap the device more ofet? >> Can you please quantify the speedup and the conditions this occurs in? > > Yes, and there are some other paths (includes what your patch series handles) will do the unmap action: > > 1、guest reboot without S2FWB: stage2_unmap_vm()which only unmaps guest regular RAM. > 2、userspace deletes memslot: kvm_arch_flush_shadow_memslot(). > 3、rollback of device MMIO mapping: kvm_arch_prepare_memory_region(). > 4、rollback of dirty log tracking: If we enable hugepage for guest RAM, after dirty log is stopped, > the newly created block mappings will unmap all page mappings. > 5、mmu notifier: kvm_unmap_hva_range(). AFAICS, we will use this path when VM teardown or guest resets pass-through devices. > The bugfix[1] gives the reason for unmapping MMIO region when guest resets pass-through devices. > > unmap related to MMIO region, as this patch solves: > point 1 is not applied. > point 2 occurs when userspace unplug pass-through devices. > point 3 can occurs, but rarely. > point 4 is not applied. > point 5 occurs when VM teardown or guest resets pass-through devices. > > And I had a look at your patch series, it can solve: > For VM teardown, elide CMO and perform VMALL instead of individually (But current kernel do not go through this path when VM teardown). > For rollback of dirty log tracking, elide CMO. > For kvm_unmap_hva_range, if event is MMU_NOTIFY_UNMAP. elide CMO. > > (But I doubt the CMOs in unmap. As we perform CMOs in user_mem_abort when install new stage2 mapping for VM, > maybe the CMO in unmap is unnecessary under all conditions :-) ?) > > So it shows that we are solving different parts of unmap, so they are not conflicting. At least this patch can > still speedup map of device MMIO region, and speedup unmap of device MMIO region even if we do not need to perform > CMO and TLBI ;-). > > speedup: unmap 8GB MMIO on FPGA. > > before after opt > cost 30+ minutes 949ms > > Thanks, > Keqian > >> >> I have the feeling that we are just circling around another problem, >> which is that we could rely on a VM-wide TLBI when tearing down the >> guest. I worked on something like that[1] a long while ago, and parked >> it for some reason. Maybe it is worth reviving. >> >> [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/elide-cmo-tlbi >> >>> >>> Signed-off-by: Keqian Zhu >>> --- >>> arch/arm64/include/asm/kvm_pgtable.h | 11 +++++++++++ >>> arch/arm64/kvm/hyp/pgtable.c | 15 +++++++++++++++ >>> arch/arm64/kvm/mmu.c | 12 ++++++++---- >>> 3 files changed, 34 insertions(+), 4 deletions(-) >>> >>> diff --git a/arch/arm64/include/asm/kvm_pgtable.h >>> b/arch/arm64/include/asm/kvm_pgtable.h >>> index 52ab38db04c7..2266ac45f10c 100644 >>> --- a/arch/arm64/include/asm/kvm_pgtable.h >>> +++ b/arch/arm64/include/asm/kvm_pgtable.h >>> @@ -82,6 +82,17 @@ struct kvm_pgtable_walker { >>> const enum kvm_pgtable_walk_flags flags; >>> }; >>> >>> +/** >>> + * kvm_supported_pgsize() - Get the max supported page size of a mapping. >>> + * @pgt: Initialised page-table structure. >>> + * @addr: Virtual address at which to place the mapping. >>> + * @end: End virtual address of the mapping. >>> + * @phys: Physical address of the memory to map. >>> + * >>> + * The smallest return value is PAGE_SIZE. >>> + */ >>> +u64 kvm_supported_pgsize(struct kvm_pgtable *pgt, u64 addr, u64 end, u64 phys); >>> + >>> /** >>> * kvm_pgtable_hyp_init() - Initialise a hypervisor stage-1 page-table. >>> * @pgt: Uninitialised page-table structure to initialise. >>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c >>> index bdf8e55ed308..ab11609b9b13 100644 >>> --- a/arch/arm64/kvm/hyp/pgtable.c >>> +++ b/arch/arm64/kvm/hyp/pgtable.c >>> @@ -81,6 +81,21 @@ static bool kvm_block_mapping_supported(u64 addr, >>> u64 end, u64 phys, u32 level) >>> return IS_ALIGNED(addr, granule) && IS_ALIGNED(phys, granule); >>> } >>> >>> +u64 kvm_supported_pgsize(struct kvm_pgtable *pgt, u64 addr, u64 end, u64 phys) >>> +{ >>> + u32 lvl; >>> + u64 pgsize = PAGE_SIZE; >>> + >>> + for (lvl = pgt->start_level; lvl < KVM_PGTABLE_MAX_LEVELS; lvl++) { >>> + if (kvm_block_mapping_supported(addr, end, phys, lvl)) { >>> + pgsize = kvm_granule_size(lvl); >>> + break; >>> + } >>> + } >>> + >>> + return pgsize; >>> +} >>> + >>> static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level) >>> { >>> u64 shift = kvm_granule_shift(level); >>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >>> index 7d2257cc5438..80b403fc8e64 100644 >>> --- a/arch/arm64/kvm/mmu.c >>> +++ b/arch/arm64/kvm/mmu.c >>> @@ -499,7 +499,8 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu) >>> int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, >>> phys_addr_t pa, unsigned long size, bool writable) >>> { >>> - phys_addr_t addr; >>> + phys_addr_t addr, end; >>> + unsigned long pgsize; >>> int ret = 0; >>> struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, }; >>> struct kvm_pgtable *pgt = kvm->arch.mmu.pgt; >>> @@ -509,21 +510,24 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, >>> phys_addr_t guest_ipa, >>> >>> size += offset_in_page(guest_ipa); >>> guest_ipa &= PAGE_MASK; >>> + end = guest_ipa + size; >>> >>> - for (addr = guest_ipa; addr < guest_ipa + size; addr += PAGE_SIZE) { >>> + for (addr = guest_ipa; addr < end; addr += pgsize) { >>> ret = kvm_mmu_topup_memory_cache(&cache, >>> kvm_mmu_cache_min_pages(kvm)); >>> if (ret) >>> break; >>> >>> + pgsize = kvm_supported_pgsize(pgt, addr, end, pa); >>> + >>> spin_lock(&kvm->mmu_lock); >>> - ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot, >>> + ret = kvm_pgtable_stage2_map(pgt, addr, pgsize, pa, prot, >>> &cache); >>> spin_unlock(&kvm->mmu_lock); >>> if (ret) >>> break; >>> >>> - pa += PAGE_SIZE; >>> + pa += pgsize; >>> } >>> >>> kvm_mmu_free_memory_cache(&cache); >> >> This otherwise looks neat enough. >> >> Thanks, >> >> M.