Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp950590pxb; Wed, 3 Mar 2021 22:14:42 -0800 (PST) X-Google-Smtp-Source: ABdhPJzNB6pOorxOEjr1ZU2DNVuE7cOz31qXbH3P2+PqyZQ48mu62H5Z+NISdxDdWvk6R51AiORp X-Received: by 2002:a05:6402:1152:: with SMTP id g18mr2741549edw.18.1614838481962; Wed, 03 Mar 2021 22:14:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614838481; cv=none; d=google.com; s=arc-20160816; b=DZDujqIFI63UZKrL0lvWuGM1yxs18/mfEA9GRB9Ofch5xr/kmW2pFcURrOO6bJu6lc QkaFSY0bqJj76pU+aWz7DFFtV9fVWP5KppAiVsNSFX7cqkeJZVUWUO9h7C+nk5Pmig14 jK47tr48wXbdv80cDz6wDgW1QCT8eFn7wlAjE1gzyCWRCUGxjKD1dhVobi6j8+HR4BZ8 upcbw1SrJvpz92mEPe/QKMNhQfbh+BhL1XFbFD/sSUYXlyJ5d8BR+nI1LjZ9xLXtYYmB nBWlrLlxveQTLpzB7ibjZuT87gwXK/0pKCvSwCWQA2wFr8HHQ8Nu4O01UL98HSyhFxWy Taaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=9pNrtx2uM7nikbyPiG6PwRB/OdMjk+YShgCBp98ccBI=; b=Q0u1B29tgYnABIoOQylMz8zGiOywEJJGpETntYy1h2YuyNaNfEXAG7hYuQr0QxJWLx wo3N2HHuXO8RjpBLJ76PvUr2WGq5TD/2EN3PHqkO1JsZTl2okdSpyghWD8SCx57tHl9Z 11P4G9gDjqlJIgqyM+QJLRXnyZb6nCOS6FgGvHvIdW9JfUr1EzOOMT4zXFx7q5UI/fCm +PudMN76sAqPt3YKYqEUy/lCuSSBQJWFxkGoIsMgpLlkwrJ8FWEbZqrhzL4+S3oYiNS/ ZromOV2n24M5r1oEGijI/wt+xlG32wSQqPPTyxoA78QQIiFr60/vCjGwpXkYEocvMGKd 1PmA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e22si16289115edu.9.2021.03.03.22.14.20; Wed, 03 Mar 2021 22:14:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244160AbhCBMBV (ORCPT + 99 others); Tue, 2 Mar 2021 07:01:21 -0500 Received: from szxga05-in.huawei.com ([45.249.212.191]:13039 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349968AbhCBLpa (ORCPT ); Tue, 2 Mar 2021 06:45:30 -0500 Received: from DGGEMS412-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4DqZSh6TyQzMh72; Tue, 2 Mar 2021 19:21:08 +0800 (CST) Received: from [10.174.184.42] (10.174.184.42) by DGGEMS412-HUB.china.huawei.com (10.3.19.212) with Microsoft SMTP Server id 14.3.498.0; Tue, 2 Mar 2021 19:23:05 +0800 Subject: Re: [RFC PATCH 0/7] kvm: arm64: Implement SW/HW combined dirty log To: , , , , Marc Zyngier , Will Deacon , Catalin Marinas References: <20210126124444.27136-1-zhukeqian1@huawei.com> CC: Alex Williamson , Kirti Wankhede , Cornelia Huck , Mark Rutland , James Morse , Robin Murphy , Suzuki K Poulose , , , , , From: Keqian Zhu Message-ID: <4716a83b-5dad-4dbc-6661-e1f05abbd29c@huawei.com> Date: Tue, 2 Mar 2021 19:23:05 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <20210126124444.27136-1-zhukeqian1@huawei.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.184.42] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi everyone, Any comments are welcome :). Thanks, Keqian On 2021/1/26 20:44, Keqian Zhu wrote: > The intention: > > On arm64 platform, we tracking dirty log of vCPU through guest memory abort. > KVM occupys some vCPU time of guest to change stage2 mapping and mark dirty. > This leads to heavy side effect on VM, especially when multi vCPU race and > some of them block on kvm mmu_lock. > > DBM is a HW auxiliary approach to log dirty. MMU chages PTE to be writable if > its DBM bit is set. Then KVM doesn't occupy vCPU time to log dirty. > > About this patch series: > > The biggest problem of apply DBM for stage2 is that software must scan PTs to > collect dirty state, which may cost much time and affect downtime of migration. > > This series realize a SW/HW combined dirty log that can effectively solve this > problem (The smmu side can also use this approach to solve dma dirty log tracking). > > The core idea is that we do not enable hardware dirty at start (do not add DBM bit). > When a arbitrary PT occurs fault, we execute soft tracking for this PT and enable > hardware tracking for its *nearby* PTs (e.g. Add DBM bit for nearby 16PTs). Then when > sync dirty log, we have known all PTs with hardware dirty enabled, so we do not need > to scan all PTs. > > mem abort point mem abort point > ↓ ↓ > --------------------------------------------------------------- > |********| | |********| | | > --------------------------------------------------------------- > ↑ ↑ > set DBM bit of set DBM bit of > this PT section (64PTEs) this PT section (64PTEs) > > We may worry that when dirty rate is over-high we still need to scan too much PTs. > We mainly concern the VM stop time. With Qemu dirty rate throttling, the dirty memory > is closing to the VM stop threshold, so there is a little PTs to scan after VM stop. > > It has the advantages of hardware tracking that minimizes side effect on vCPU, > and also has the advantages of software tracking that controls vCPU dirty rate. > Moreover, software tracking helps us to scan PTs at some fixed points, which > greatly reduces scanning time. And the biggest benefit is that we can apply this > solution for dma dirty tracking. > > Test: > > Host: Kunpeng 920 with 128 CPU 512G RAM. Disable Transparent Hugepage (Ensure test result > is not effected by dissolve of block page table at the early stage of migration). > VM: 16 CPU 16GB RAM. Run 4 pair of (redis_benchmark+redis_server). > > Each run 5 times for software dirty log and SW/HW conbined dirty log. > > Test result: > > Gain 5%~7% improvement of redis QPS during VM migration. > VM downtime is not affected fundamentally. > About 56.7% of DBM is effectively used. > > Keqian Zhu (7): > arm64: cpufeature: Add API to report system support of HWDBM > kvm: arm64: Use atomic operation when update PTE > kvm: arm64: Add level_apply parameter for stage2_attr_walker > kvm: arm64: Add some HW_DBM related pgtable interfaces > kvm: arm64: Add some HW_DBM related mmu interfaces > kvm: arm64: Only write protect selected PTE > kvm: arm64: Start up SW/HW combined dirty log > > arch/arm64/include/asm/cpufeature.h | 12 +++ > arch/arm64/include/asm/kvm_host.h | 6 ++ > arch/arm64/include/asm/kvm_mmu.h | 7 ++ > arch/arm64/include/asm/kvm_pgtable.h | 45 ++++++++++ > arch/arm64/kvm/arm.c | 125 ++++++++++++++++++++++++++ > arch/arm64/kvm/hyp/pgtable.c | 130 ++++++++++++++++++++++----- > arch/arm64/kvm/mmu.c | 47 +++++++++- > arch/arm64/kvm/reset.c | 8 +- > 8 files changed, 351 insertions(+), 29 deletions(-) >