Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp977756pxf; Thu, 25 Mar 2021 20:22:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyywuuKni4EPJmaB+wgq1Y7B94+L6hDyUnFmy7OsVOtAvgOIDjUz09h3TziTPUtU2mhJw9a X-Received: by 2002:a17:906:4407:: with SMTP id x7mr12913189ejo.546.1616728947132; Thu, 25 Mar 2021 20:22:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616728947; cv=none; d=google.com; s=arc-20160816; b=lpb8TwlNNYyvaK9sQTHrrdK8Vkk9ilELz5wkXogWHzhE/yAlOGcruBBV4xMOGbygAy pUwyby4mjgnPOzc2xCUUWcHvI0Xdu29zf2z+65CGWQ0R8yJaG0+8YHNBcvciE5TWtsai gFq/XVDEBUif+i4NAO4KP3QZpd1hfhE4hsNzvyblofbkmaBcvuco82oLGlRsp1ZOHLvq 0Tr6d7ZPqCoAdkifeoEX+lycev+RtQZ9iYwQFsThksTQu912pQMPiJIfm+8htBx//LEa f55nN24LOn8nOeJ+w4Vpz/kHEBcawqn5rXqFL55NZn/kR0I4qTpFk2YtQTabqTuLFfZV mPHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:subject:cc:to:from; bh=BkyQ78cjfwIPGpSCOIhxIoZceP3rWvSiZQleH6onjns=; b=nyu+TK2BWuGrAMoLrfxbyYRc6Ylq8SQFPqzJan/gWYzGdWhg+x0teOhn3vRD+YDAc8 DiMwGlgnB0JhILhwbMvZPw71/Q6DlrFHx0M4+JOunIfubi3ILbKJQOzQV2PupHtHNtCc JfyPKTA/+w+K1wkhrhsJnjKWpZQscJAH5qVF4p9McgZistwP1QD5JAxY8zOiE0odJUcf IyBgdNOR1ZizPTbvT2e8ilZqRt7HfyRJA1znNlpx8wScJcysIUz0KPASsnO3JeTCN1FT 5EHiuS7lHDoBmLJsdr+p+qCiGZp8qLqTDo8zIXzotimu/U4BCX0+sJPqSfcaxpxu0Ets sZ9w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bn9si5950448ejb.23.2021.03.25.20.22.04; Thu, 25 Mar 2021 20:22:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231146AbhCZDSN (ORCPT + 99 others); Thu, 25 Mar 2021 23:18:13 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:14552 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230273AbhCZDRg (ORCPT ); Thu, 25 Mar 2021 23:17:36 -0400 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4F66Xg5MWCzPmsB; Fri, 26 Mar 2021 11:14:59 +0800 (CST) Received: from DESKTOP-TMVL5KK.china.huawei.com (10.174.187.128) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.498.0; Fri, 26 Mar 2021 11:17:25 +0800 From: Yanan Wang To: Marc Zyngier , Will Deacon , "Alexandru Elisei" , Catalin Marinas , , , , CC: James Morse , Julien Thierry , Suzuki K Poulose , Gavin Shan , Quentin Perret , , , , Yanan Wang Subject: [RFC PATCH v3 0/2] KVM: arm64: Improve efficiency of stage2 page table Date: Fri, 26 Mar 2021 11:16:52 +0800 Message-ID: <20210326031654.3716-1-wangyanan55@huawei.com> X-Mailer: git-send-email 2.8.4.windows.1 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.174.187.128] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, This is a new version of the series [1] that I have posted before. It makes some efficiency improvement of stage2 page table code and there are some test results to quantify the benefit of each patch. [1] v2: https://lore.kernel.org/lkml/20210310094319.18760-1-wangyanan55@huawei.com/ Although there hasn't been any feedback about v2, I am certain that there should be a big change for the series after plenty of discussion with Alexandru Elisei. A conclusion was drew that CMOs are still needed for the scenario of coalescing tables, and as a result the benefit of patch #3 in v2 becomes rather little judging from the test results. So drop this patch and keep the others which still remain meaningful. Changelogs: v2->v3: - drop patch #3 in v2 - retest v3 based on v5.12-rc2 v1->v2: - rebased on top of mainline v5.12-rc2 - also move CMOs of I-cache to the fault handlers - retest v2 based on v5.12-rc2 - v1: https://lore.kernel.org/lkml/20210208112250.163568-1-wangyanan55@huawei.com/ About this v3 series: Patch #1: We currently uniformly permorm CMOs of D-cache and I-cache in function user_mem_abort before calling the fault handlers. If we get concurrent guest faults(e.g. translation faults, permission faults) or some really unnecessary guest faults caused by BBM, CMOs for the first vcpu are necessary while the others later are not. By moving CMOs to the fault handlers, we can easily identify conditions where they are really needed and avoid the unnecessary ones. As it's a time consuming process to perform CMOs especially when flushing a block range, so this solution reduces much load of kvm and improve efficiency of the page table code. So let's move both clean of D-cache and invalidation of I-cache to the map path and move only invalidation of I-cache to the permission path. Since the original APIs for CMOs in mmu.c are only called in function user_mem_abort, we now also move them to pgtable.c. The following results represent the benefit of patch #1 alone, and they were tested by [2] (kvm/selftest) that I have posted recently. [2] https://lore.kernel.org/lkml/20210302125751.19080-1-wangyanan55@huawei.com/ When there are muitiple vcpus concurrently accessing the same memory region, we can test the execution time of KVM creating new mappings, updating the permissions of old mappings from RO to RW, and rebuilding the blocks after they have been split. hardware platform: HiSilicon Kunpeng920 Server host kernel: Linux mainline v5.12-rc2 cmdline: ./kvm_page_table_test -m 4 -s anonymous -b 1G -v 80 (80 vcpus, 1G memory, page mappings(normal 4K)) KVM_CREATE_MAPPINGS: before 104.35s -> after 90.42s +13.35% KVM_UPDATE_MAPPINGS: before 78.64s -> after 75.45s + 4.06% cmdline: ./kvm_page_table_test -m 4 -s anonymous_thp -b 20G -v 40 (40 vcpus, 20G memory, block mappings(THP 2M)) KVM_CREATE_MAPPINGS: before 15.66s -> after 6.92s +55.80% KVM_UPDATE_MAPPINGS: before 178.80s -> after 123.35s +31.00% KVM_REBUILD_BLOCKS: before 187.34s -> after 131.76s +30.65% cmdline: ./kvm_page_table_test -m 4 -s anonymous_hugetlb_1gb -b 20G -v 40 (40 vcpus, 20G memory, block mappings(HUGETLB 1G)) KVM_CREATE_MAPPINGS: before 104.54s -> after 3.70s +96.46% KVM_UPDATE_MAPPINGS: before 174.20s -> after 115.94s +33.44% KVM_REBUILD_BLOCKS: before 103.95s -> after 2.96s +97.15% Patch #2: A new method to distinguish cases of memcache allocations is introduced. By comparing fault_granule and vma_pagesize, cases that require allocations from memcache and cases that don't can be distinguished completely. Yanan Wang (2): KVM: arm64: Move CMOs from user_mem_abort to the fault handlers KVM: arm64: Distinguish cases of memcache allocations completely arch/arm64/include/asm/kvm_mmu.h | 31 --------------- arch/arm64/kvm/hyp/pgtable.c | 68 +++++++++++++++++++++++++------- arch/arm64/kvm/mmu.c | 48 ++++++++-------------- 3 files changed, 69 insertions(+), 78 deletions(-) -- 2.19.1