Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp2248261ybh; Mon, 9 Mar 2020 02:01:40 -0700 (PDT) X-Google-Smtp-Source: ADFU+vuc4pttprN7/CrKobhj2MJwclWTedfzzO2i0VLKNJCsjlMmiZm3zuJtoayLkMZdiTI6uuPw X-Received: by 2002:a9d:3df6:: with SMTP id l109mr12125628otc.284.1583744500412; Mon, 09 Mar 2020 02:01:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1583744500; cv=none; d=google.com; s=arc-20160816; b=QHRVNeP7T2J1JkVW/0gfN6hYYjIl1363HMRl3gFfrjYHZKUL8R8pwnGIitqtdIfFNA BXSoPL0JGm9+oGQg1stTeKdiAN4WM9lga3zRrFyQTMskaoGUQkrvra8ILgWjpcOMbYZS jOXuvE0y9aAeMeRjKatizXkn4REUv148QGapy+C0zbNXBDnJ08BAcf5nm1LPxe3ukmBr 1GrVi/b6s9o6+cbYxuF9P5zjuS+Gh0ytGlT7QEjCHthKw/cNFWsEDbCPO3HHs9odj8Do CefYV6fgLUhez9qpxO+z+3Nys3Qa4SuCFUcoutlGeB2j6WxK64N4wQwebahLe68/EQO0 ZtXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=o4JKzi+SM+QGZjg/xueX4/oMKmhFQjMgu0zdpmrc1aM=; b=SfeGECo16iTkmRRBccjz+eGAQgRPRi/agyTxzZhQ8kTXZ9AEhQSO529ijc1nYPWswD wHK/zlBjcJCQ8zexuLicUlr0dsPp3o9DjpAZpTsA7GVUKpfAHzA5xItkHR1D5KTahe8n VniI8inDYHV0lcZ0FTgC3BZdY+dL9ClZ3gnh62XhPA75cw+ojJeXIk+WIIvbnGWEv587 EcvwIDJ/hbTs4fIAdvpaUKYQGcSi/ZFYnmn0hiDlo2XrmJIJsigNRbNuUHgXz8Zt+KkP BkVRSjp9IkAz3vkFCAOQu+m7TxPWzs9Bi4IWE4bqtKakranGgYWpx2KQkg09HQYfNLEw fc7A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a21si1616326otk.277.2020.03.09.02.01.28; Mon, 09 Mar 2020 02:01:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726946AbgCII7s (ORCPT + 99 others); Mon, 9 Mar 2020 04:59:48 -0400 Received: from szxga07-in.huawei.com ([45.249.212.35]:41624 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726027AbgCII7r (ORCPT ); Mon, 9 Mar 2020 04:59:47 -0400 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 609C5D3890AF7904DCED; Mon, 9 Mar 2020 16:59:41 +0800 (CST) Received: from linux-kDCJWP.huawei.com (10.175.104.212) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.487.0; Mon, 9 Mar 2020 16:59:35 +0800 From: Keqian Zhu To: , , , CC: , Keqian Zhu , Jay Zhou , Paolo Bonzini , Peter Xu , Sean Christopherson Subject: [RFC] KVM: arm64: support enabling dirty log graually in small chunks Date: Mon, 9 Mar 2020 16:57:27 +0800 Message-ID: <20200309085727.1106-1-zhukeqian1@huawei.com> X-Mailer: git-send-email 2.19.1 MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.175.104.212] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org There is already support of enabling dirty log graually in small chunks for x86. This adds support for arm64. Under the Huawei Kunpeng 920 2.6GHz platform, I did some tests with a 128G linux VM and counted the time taken of memory_global_dirty_log_start, here is the numbers: VM Size Before After optimization 128G 527ms 4ms Signed-off-by: Keqian Zhu --- Cc: Jay Zhou Cc: Paolo Bonzini Cc: Peter Xu Cc: Sean Christopherson --- Documentation/virt/kvm/api.rst | 2 +- arch/arm64/include/asm/kvm_host.h | 4 ++++ virt/kvm/arm/mmu.c | 30 ++++++++++++++++++++++-------- 3 files changed, 27 insertions(+), 9 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 0adef66585b1..89d4f2680af1 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -5735,7 +5735,7 @@ will be initialized to 1 when created. This also improves performance because dirty logging can be enabled gradually in small chunks on the first call to KVM_CLEAR_DIRTY_LOG. KVM_DIRTY_LOG_INITIALLY_SET depends on KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (it is also only available on -x86 for now). +x86 and arm64 for now). KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index d87aa609d2b6..0deb2ac7d091 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -45,6 +46,9 @@ #define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2) #define KVM_REQ_RECORD_STEAL KVM_ARCH_REQ(3) +#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \ + KVM_DIRTY_LOG_INITIALLY_SET) + DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); extern unsigned int kvm_sve_max_vl; diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index e3b9ee268823..5c7ca84dec85 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1438,9 +1438,11 @@ static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end) * @pud: pointer to pud entry * @addr: range start address * @end: range end address + * @wp_ptes: write protect ptes or not */ static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud, - phys_addr_t addr, phys_addr_t end) + phys_addr_t addr, phys_addr_t end, + bool wp_ptes) { pmd_t *pmd; phys_addr_t next; @@ -1453,7 +1455,7 @@ static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud, if (pmd_thp_or_huge(*pmd)) { if (!kvm_s2pmd_readonly(pmd)) kvm_set_s2pmd_readonly(pmd); - } else { + } else if (wp_ptes) { stage2_wp_ptes(pmd, addr, next); } } @@ -1465,9 +1467,11 @@ static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud, * @pgd: pointer to pgd entry * @addr: range start address * @end: range end address + * @wp_ptes: write protect ptes or not */ static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd, - phys_addr_t addr, phys_addr_t end) + phys_addr_t addr, phys_addr_t end, + bool wp_ptes) { pud_t *pud; phys_addr_t next; @@ -1480,7 +1484,7 @@ static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd, if (!kvm_s2pud_readonly(pud)) kvm_set_s2pud_readonly(pud); } else { - stage2_wp_pmds(kvm, pud, addr, next); + stage2_wp_pmds(kvm, pud, addr, next, wp_ptes); } } } while (pud++, addr = next, addr != end); @@ -1491,8 +1495,10 @@ static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd, * @kvm: The KVM pointer * @addr: Start address of range * @end: End address of range + * @wp_ptes: Write protect ptes or not */ -static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, + phys_addr_t end, bool wp_ptes) { pgd_t *pgd; phys_addr_t next; @@ -1513,7 +1519,7 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) break; next = stage2_pgd_addr_end(kvm, addr, end); if (stage2_pgd_present(kvm, *pgd)) - stage2_wp_puds(kvm, pgd, addr, next); + stage2_wp_puds(kvm, pgd, addr, next, wp_ptes); } while (pgd++, addr = next, addr != end); } @@ -1535,6 +1541,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot) struct kvm_memslots *slots = kvm_memslots(kvm); struct kvm_memory_slot *memslot = id_to_memslot(slots, slot); phys_addr_t start, end; + bool wp_ptes; if (WARN_ON_ONCE(!memslot)) return; @@ -1543,7 +1550,14 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot) end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT; spin_lock(&kvm->mmu_lock); - stage2_wp_range(kvm, start, end); + /* + * If we're with initial-all-set, we don't need to write protect + * any small page because they're reported as dirty already. + * However we still need to write-protect huge pages so that the + * page split can happen lazily on the first write to the huge page. + */ + wp_ptes = !kvm_dirty_log_manual_protect_and_init_set(kvm); + stage2_wp_range(kvm, start, end, wp_ptes); spin_unlock(&kvm->mmu_lock); kvm_flush_remote_tlbs(kvm); } @@ -1567,7 +1581,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, phys_addr_t start = (base_gfn + __ffs(mask)) << PAGE_SHIFT; phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT; - stage2_wp_range(kvm, start, end); + stage2_wp_range(kvm, start, end, true); } /* -- 2.19.1