Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3802012pxb; Mon, 1 Feb 2021 05:15:35 -0800 (PST) X-Google-Smtp-Source: ABdhPJzHdkEnpYVZmC/qgdsg+aBpZuxI9vh5FojTUXNZaeZ5y6LS2TMSPAHGxXgIDIrrvG0sq64w X-Received: by 2002:a17:906:244c:: with SMTP id a12mr3601121ejb.33.1612185335478; Mon, 01 Feb 2021 05:15:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612185335; cv=none; d=google.com; s=arc-20160816; b=xJUwQdaKTaBsz40g+PTBWPYoENaF3AR0So2zANgJCYLHIGZvmevf/FEAB2VrQ30My2 DGWq5qfn/CuyPz9oxdKZW6dRZdAh1SpAU/rfLh6AVJjxOFaoBUw7N9vXZOBkFZ4Udf+Q ecCCTNtqrXeBsyfrq8htc1wpQDRi9o0gYcagrkRP95wX1/bvx+smJwNxwcngj3PzPaYa HI41w4GWvOn68mbeckQ1mdPJ/JTwnDgdlKw37gUCwZBmQXiNQJup4qoC68bJ/GrDOv6u nRz0xcsSGIz5W+xwZalTFJEYWP413I1GdUNAsk40AUB7w2nlbtaFzqBOkUXWQpYbyJSP pLog== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=FR4bEDPmdYa+o4otAiRlEPQeRmMe7setFd2Cu9eYh3o=; b=QpBhIeJeXzmRQvvR+B6zgwX9bUJogPIFui28S3Ii6We4zzqxdb/wdYdVyfRY0+8dsw JivRT9BBCqXSWfJqw7z3UNLLwZrZzu0nJ4em+aUsmRX21lgaaHIsjg04TLlhplgZj/km VYGCeEHQdw2fVXmz97docoIXJJe3K26xl3NwDe7+7GIxGjL7Uz5lt4NLNJNFHpmhS9qQ g8pCDmKoXlUZnBQwLhfsA/Q8bq+vM/lACQ+hwydPWIXdESUtCXZEmFDid9/vpIMgW3XH 9Hvyfjbhi4PY+NCUfaCPfYceSvELYzBxus4hizYVifEuaPyb394RnrgliuL+6PTN9ixl suAA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id di23si9887813edb.22.2021.02.01.05.15.09; Mon, 01 Feb 2021 05:15:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231652AbhBANNl (ORCPT + 99 others); Mon, 1 Feb 2021 08:13:41 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:11654 "EHLO szxga04-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229545AbhBANNj (ORCPT ); Mon, 1 Feb 2021 08:13:39 -0500 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4DTpHc4F8Sz162jk; Mon, 1 Feb 2021 21:11:40 +0800 (CST) Received: from [10.174.184.42] (10.174.184.42) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.498.0; Mon, 1 Feb 2021 21:12:49 +0800 Subject: Re: [RFC PATCH 0/7] kvm: arm64: Implement SW/HW combined dirty log To: Marc Zyngier References: <20210126124444.27136-1-zhukeqian1@huawei.com> CC: , , , , Will Deacon , Catalin Marinas , Alex Williamson , Kirti Wankhede , "Cornelia Huck" , Mark Rutland , James Morse , Robin Murphy , Suzuki K Poulose , , , , , From: Keqian Zhu Message-ID: Date: Mon, 1 Feb 2021 21:12:49 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <20210126124444.27136-1-zhukeqian1@huawei.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.184.42] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Marc, Do you have time to have a look at this? Thanks ;-) Keqian. On 2021/1/26 20:44, Keqian Zhu wrote: > The intention: > > On arm64 platform, we tracking dirty log of vCPU through guest memory abort. > KVM occupys some vCPU time of guest to change stage2 mapping and mark dirty. > This leads to heavy side effect on VM, especially when multi vCPU race and > some of them block on kvm mmu_lock. > > DBM is a HW auxiliary approach to log dirty. MMU chages PTE to be writable if > its DBM bit is set. Then KVM doesn't occupy vCPU time to log dirty. > > About this patch series: > > The biggest problem of apply DBM for stage2 is that software must scan PTs to > collect dirty state, which may cost much time and affect downtime of migration. > > This series realize a SW/HW combined dirty log that can effectively solve this > problem (The smmu side can also use this approach to solve dma dirty log tracking). > > The core idea is that we do not enable hardware dirty at start (do not add DBM bit). > When a arbitrary PT occurs fault, we execute soft tracking for this PT and enable > hardware tracking for its *nearby* PTs (e.g. Add DBM bit for nearby 16PTs). Then when > sync dirty log, we have known all PTs with hardware dirty enabled, so we do not need > to scan all PTs. > > mem abort point mem abort point > ↓ ↓ > --------------------------------------------------------------- > |********| | |********| | | > --------------------------------------------------------------- > ↑ ↑ > set DBM bit of set DBM bit of > this PT section (64PTEs) this PT section (64PTEs) > > We may worry that when dirty rate is over-high we still need to scan too much PTs. > We mainly concern the VM stop time. With Qemu dirty rate throttling, the dirty memory > is closing to the VM stop threshold, so there is a little PTs to scan after VM stop. > > It has the advantages of hardware tracking that minimizes side effect on vCPU, > and also has the advantages of software tracking that controls vCPU dirty rate. > Moreover, software tracking helps us to scan PTs at some fixed points, which > greatly reduces scanning time. And the biggest benefit is that we can apply this > solution for dma dirty tracking. > > Test: > > Host: Kunpeng 920 with 128 CPU 512G RAM. Disable Transparent Hugepage (Ensure test result > is not effected by dissolve of block page table at the early stage of migration). > VM: 16 CPU 16GB RAM. Run 4 pair of (redis_benchmark+redis_server). > > Each run 5 times for software dirty log and SW/HW conbined dirty log. > > Test result: > > Gain 5%~7% improvement of redis QPS during VM migration. > VM downtime is not affected fundamentally. > About 56.7% of DBM is effectively used. > > Keqian Zhu (7): > arm64: cpufeature: Add API to report system support of HWDBM > kvm: arm64: Use atomic operation when update PTE > kvm: arm64: Add level_apply parameter for stage2_attr_walker > kvm: arm64: Add some HW_DBM related pgtable interfaces > kvm: arm64: Add some HW_DBM related mmu interfaces > kvm: arm64: Only write protect selected PTE > kvm: arm64: Start up SW/HW combined dirty log > > arch/arm64/include/asm/cpufeature.h | 12 +++ > arch/arm64/include/asm/kvm_host.h | 6 ++ > arch/arm64/include/asm/kvm_mmu.h | 7 ++ > arch/arm64/include/asm/kvm_pgtable.h | 45 ++++++++++ > arch/arm64/kvm/arm.c | 125 ++++++++++++++++++++++++++ > arch/arm64/kvm/hyp/pgtable.c | 130 ++++++++++++++++++++++----- > arch/arm64/kvm/mmu.c | 47 +++++++++- > arch/arm64/kvm/reset.c | 8 +- > 8 files changed, 351 insertions(+), 29 deletions(-) >