Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp436125ybh; Tue, 10 Mar 2020 01:27:50 -0700 (PDT) X-Google-Smtp-Source: ADFU+vv1C8Y78Bsy2nxDpaNyLc61K1sW10rfWRNizN2Ckvp11Hjh4d4/wH9e0a5W7BZHr9YL4+IW X-Received: by 2002:a9d:4c10:: with SMTP id l16mr4807701otf.109.1583828870159; Tue, 10 Mar 2020 01:27:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1583828870; cv=none; d=google.com; s=arc-20160816; b=vljilBrRPU14ufrNwrud/q2Lkojn+7XfxrzF+jeUwrqnkn5PlB/xb2K6apYBY3urVq eHIHvQdDrQyTBE3tOXUkHqEVYnSKsyAJHusvgYgTs0RADpe1cVGbS7uXb9swma+B46vM ZA4bv31YaTCxQEhR95kTuUlXgxyltyoTbQoYMub/Ug22CFuPQpBjxiEHO5h8eYeO52UY HnoAsIuAD//Fz2g9jKW7HPES1S1JyLJslJ0zX0eQQ8G+QL5DH95dsVUhCcJcUPIeqPq/ EoYvWhaHO8lnvXYZElxkltI+GGg2eZFsewXptlQEVJ1M8upeeBjZ1RAi+Seut/dObu8L 7Elg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=i2rz1wjArK1mKyuViOXujjQJySH5vHONHxkRHpAfehE=; b=OGyIP4NIkXQjphkCe/N5jRwHRT+PMSeX2UMH4f7SK5EOH2BLQuDHuvacRSV+tVn+z9 tZcwoDLaMrPDAmcjUVDeazHPp9JCKCTlFfETHw9bXGVT8DSmbhc4Gk1jj7a/Tp6pWxe/ YBFooH7oEpFevU4i8ZNDygCzT2m/OehJ6IoyFYLY8zXAclpxiPZ0PETTBsMOxguO0iFT +tp6uTqQLCkcXbT7RhdTkUix/kB5GuTminhxTjazBjrN6bj7OiHuESWfSQiYdEZL8MX3 Ol5OnLcilge6oui0M4j8PhJLbUyNTB4MsggUhjdxzQp+oSPvLFEo5fEI8RqUStjiV3u3 E+WQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q16si7979231otm.231.2020.03.10.01.27.37; Tue, 10 Mar 2020 01:27:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726483AbgCJI0f (ORCPT + 99 others); Tue, 10 Mar 2020 04:26:35 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:11612 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725919AbgCJI0f (ORCPT ); Tue, 10 Mar 2020 04:26:35 -0400 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id E5E52B5FFA0075FDFCFE; Tue, 10 Mar 2020 16:26:25 +0800 (CST) Received: from [127.0.0.1] (10.173.221.230) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.487.0; Tue, 10 Mar 2020 16:26:19 +0800 Subject: Re: [RFC] KVM: arm64: support enabling dirty log graually in small chunks To: Marc Zyngier References: <20200309085727.1106-1-zhukeqian1@huawei.com> <4b85699ec1d354cc73f5302560231f86@misterjones.org> CC: , , , , "Jay Zhou" , Sean Christopherson , Paolo Bonzini , "James Morse" , Julien Thierry , Suzuki K Poulose From: zhukeqian Message-ID: <64925c8b-af3d-beb5-bc9b-66ef1e47f92d@huawei.com> Date: Tue, 10 Mar 2020 16:26:07 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <4b85699ec1d354cc73f5302560231f86@misterjones.org> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.173.221.230] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Marc, On 2020/3/9 19:45, Marc Zyngier wrote: > Kegian, > > In the future, please Cc me on your KVM/arm64 patches, as well as > all the reviewers mentioned in the MAINTAINERS file. > > On 2020-03-09 08:57, Keqian Zhu wrote: >> There is already support of enabling dirty log graually > > gradually? > Yeah, gradually. :) >> in small chunks for x86. This adds support for arm64. >> >> Under the Huawei Kunpeng 920 2.6GHz platform, I did some >> tests with a 128G linux VM and counted the time taken of > > Linux Thanks. > >> memory_global_dirty_log_start, here is the numbers: >> >> VM Size Before After optimization >> 128G 527ms 4ms > > What does this benchmark do? Can you please provide a pointer to it? > I will explain this in following text. >> >> Signed-off-by: Keqian Zhu >> --- >> Cc: Jay Zhou >> Cc: Paolo Bonzini >> Cc: Peter Xu >> Cc: Sean Christopherson >> --- >> Documentation/virt/kvm/api.rst | 2 +- >> arch/arm64/include/asm/kvm_host.h | 4 ++++ >> virt/kvm/arm/mmu.c | 30 ++++++++++++++++++++++-------- >> 3 files changed, 27 insertions(+), 9 deletions(-) >> >> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst >> index 0adef66585b1..89d4f2680af1 100644 >> --- a/Documentation/virt/kvm/api.rst >> +++ b/Documentation/virt/kvm/api.rst >> @@ -5735,7 +5735,7 @@ will be initialized to 1 when created. This >> also improves performance because >> dirty logging can be enabled gradually in small chunks on the first call >> to KVM_CLEAR_DIRTY_LOG. KVM_DIRTY_LOG_INITIALLY_SET depends on >> KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (it is also only available on >> -x86 for now). >> +x86 and arm64 for now). > > What is this based on? I can't find this in -next, and you provide no > context whatsoever. This is based on branch "queue" of git://git.kernel.org/pub/scm/virt/kvm/kvm.git > > I assume this is related to this: > https://lore.kernel.org/kvm/20200227013227.1401-1-jianjay.zhou@huawei.com/ > Yes, you are right. The background is that in [https://patchwork.kernel.org/cover/10702447/], Paolo made an optimization for dirty log sync used by VM migration. Currently the dirty log sync logic is getting and clearing dirty log at the same time for each KVM memslot. This will lead to obvious problem for large guests. As described by Paolo, "First, and less important, it can take kvm->mmu_lock for an extended period of time. Second, its user can actually see many false positives in some cases." There will be enough time for guests mark page dirty again between Qemu synchronizes dirty log and actually sends these page, so both guests and Qemu will suffer unnecessary overhead. Paolo introduced a new KVM ioctl. "The new KVM_CLEAR_DIRTY_LOG ioctl can operate on a 64-page granularity rather than requiring to sync a full memslot. This way the mmu_lock is taken for small amounts of time, and only a small amount of time will pass between write protection of pages and the sending of their content." The changes made by Paolo have been merge to mainline kernel. And the userspace counterpart (Qemu) also has been updated. After that, Jay Zhou declared an optimization about enable dirty log (The link you paste above) based on Paolo's work. When enabling dirty log, we dont need to write protect PTEs now. All PTEs will be write protected after first round RAM sending. > Is there a userspace counterpart to it? > As this KVM/x86 related changes have not been merged to mainline kernel, some little modification is needed on mainline Qemu. As I tested this patch on a 128GB RAM Linux VM with no huge pages, the time of enabling dirty log will decrease obviously. >> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name >> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make >> diff --git a/arch/arm64/include/asm/kvm_host.h >> b/arch/arm64/include/asm/kvm_host.h >> index d87aa609d2b6..0deb2ac7d091 100644 >> --- a/arch/arm64/include/asm/kvm_host.h >> +++ b/arch/arm64/include/asm/kvm_host.h >> @@ -16,6 +16,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> #include >> @@ -45,6 +46,9 @@ >> #define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2) >> #define KVM_REQ_RECORD_STEAL KVM_ARCH_REQ(3) >> >> +#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \ >> + KVM_DIRTY_LOG_INITIALLY_SET) >> + >> DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); >> >> extern unsigned int kvm_sve_max_vl; >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index e3b9ee268823..5c7ca84dec85 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1438,9 +1438,11 @@ static void stage2_wp_ptes(pmd_t *pmd, >> phys_addr_t addr, phys_addr_t end) >> * @pud: pointer to pud entry >> * @addr: range start address >> * @end: range end address >> + * @wp_ptes: write protect ptes or not >> */ >> static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud, >> - phys_addr_t addr, phys_addr_t end) >> + phys_addr_t addr, phys_addr_t end, >> + bool wp_ptes) > > If you are going to pass extra parameters like this, make it at least > extensible (unsigned long flags, for example). > OK, I will use flags in formal patch. >> { >> pmd_t *pmd; >> phys_addr_t next; >> @@ -1453,7 +1455,7 @@ static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud, >> if (pmd_thp_or_huge(*pmd)) { >> if (!kvm_s2pmd_readonly(pmd)) >> kvm_set_s2pmd_readonly(pmd); >> - } else { >> + } else if (wp_ptes) { >> stage2_wp_ptes(pmd, addr, next); >> } >> } [...] Thanks, keqian