Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp706995pxb; Thu, 15 Apr 2021 04:52:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxzkFAYM/nnJciv00faqBL49ivyDIcsbPZ602S+ILnB2jlnQnD6SRVGYJRdvLD+2d8e6d22 X-Received: by 2002:a05:6402:14d7:: with SMTP id f23mr3713839edx.218.1618487532849; Thu, 15 Apr 2021 04:52:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618487532; cv=none; d=google.com; s=arc-20160816; b=BPnwJTBoVSBVXAVoZOyBvmaxBuAtJBZlWqscN8UlTqhzXHqXfvy7fg5F79kWOAo5pE Y85zKQHvkVMheLTphsouj49k7KH4nK5adDv2gdGJ/Oxxmw9/WUpZQ+1oWAXB+7B49wjA uzjHr10O/L2sYH8O1lQjZEhcSdSORavpV3vgrsxC/HmfHAyWky41brt/RjcKSJ1eq4nl 6AFE/gx8IInsQCLwwMJ7KwvdrR6OWX9BbLLzCei9nJP0WJHVUTg52WdlEaiwd4v46Mol e+Dja0W+jBcN45oZs1A768R2C59f4oyfDlZZ7U0TcHRuPp2KWo0GPrYGWqQOXkQKSCtT iEtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:subject:cc:to:from; bh=3RZifypqSgDewi8+d/4oXXAMPWXJy0ff061if2TUivU=; b=cBn67Q6lm2xZKSCO4JOLwXsjLSXZCSAr5HNkLA5U5P8tn/vjelmuXGXtd481JSAnOq awXgmPd0tR1fwb47rXa4mKPkdCDnS83akGUNrsudS2Ycf/1qIwySwyjIXvDQ6YtLaXl1 gir+E2TEMGTTMRTLxnCxD2Sb0Ii0qRFcWcRq8FA1JqrkvNB3zKX7nyv8IPdyMLxVcSB8 AlQ+aLMOCJ35aj+k82nyP5tcCeyl3vlG0R34Xq6c42BkoMkHEEmbupTBqB6ookR0lVTw 1aPnXWKk9LsFJ1fTV34Qb/lKiDdL66gVVbJT857xqR0j7LdAV3bzxT7fgbgGs1jgAsQT dbjQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i16si2203574edc.588.2021.04.15.04.51.50; Thu, 15 Apr 2021 04:52:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232755AbhDOLvN (ORCPT + 99 others); Thu, 15 Apr 2021 07:51:13 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:16123 "EHLO szxga04-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232641AbhDOLvH (ORCPT ); Thu, 15 Apr 2021 07:51:07 -0400 Received: from DGGEMS407-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4FLcz764x4zpYXY; Thu, 15 Apr 2021 19:47:47 +0800 (CST) Received: from DESKTOP-TMVL5KK.china.huawei.com (10.174.187.128) by DGGEMS407-HUB.china.huawei.com (10.3.19.207) with Microsoft SMTP Server id 14.3.498.0; Thu, 15 Apr 2021 19:50:33 +0800 From: Yanan Wang To: Marc Zyngier , Will Deacon , "Quentin Perret" , Alexandru Elisei , , , , CC: Catalin Marinas , James Morse , Julien Thierry , "Suzuki K Poulose" , Gavin Shan , , , , Yanan Wang Subject: [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table Date: Thu, 15 Apr 2021 19:50:26 +0800 Message-ID: <20210415115032.35760-1-wangyanan55@huawei.com> X-Mailer: git-send-email 2.8.4.windows.1 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.174.187.128] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, This series makes some efficiency improvement of guest stage-2 page table code, and there are some test results to quantify the benefit. The code has been re-arranged based on the latest kvmarm/next tree. Descriptions: We currently uniformly permorm CMOs of D-cache and I-cache in function user_mem_abort before calling the fault handlers. If we get concurrent guest faults(e.g. translation faults, permission faults) or some really unnecessary guest faults caused by BBM, CMOs for the first vcpu are necessary while the others later are not. By moving CMOs to the fault handlers, we can easily identify conditions where they are really needed and avoid the unnecessary ones. As it's a time consuming process to perform CMOs especially when flushing a block range, so this solution reduces much load of kvm and improve efficiency of the stage-2 page table code. In this series, patch #1, #3, #4 make preparation for place movement of CMOs (adapt to the latest stage-2 page table framework). And patch #2, #5 move CMOs of D-cache and I-cache to the fault handlers. Patch #6 introduces a new way to distinguish cases of memcache allocations. The following are results in v3 to represent the benefit introduced by movement of CMOs, and they were tested by [1] (kvm/selftest) that I have posted recently. [1] https://lore.kernel.org/lkml/20210302125751.19080-1-wangyanan55@huawei.com/ When there are muitiple vcpus concurrently accessing the same memory region, we can test the execution time of KVM creating new mappings, updating the permissions of old mappings from RO to RW, and the time of re-creating the blocks after they have been split. hardware platform: HiSilicon Kunpeng920 Server host kernel: Linux mainline v5.12-rc2 cmdline: ./kvm_page_table_test -m 4 -s anonymous -b 1G -v 80 (80 vcpus, 1G memory, page mappings(normal 4K)) KVM_CREATE_MAPPINGS: before 104.35s -> after 90.42s +13.35% KVM_UPDATE_MAPPINGS: before 78.64s -> after 75.45s + 4.06% cmdline: ./kvm_page_table_test -m 4 -s anonymous_thp -b 20G -v 40 (40 vcpus, 20G memory, block mappings(THP 2M)) KVM_CREATE_MAPPINGS: before 15.66s -> after 6.92s +55.80% KVM_UPDATE_MAPPINGS: before 178.80s -> after 123.35s +31.00% KVM_REBUILD_BLOCKS: before 187.34s -> after 131.76s +30.65% cmdline: ./kvm_page_table_test -m 4 -s anonymous_hugetlb_1gb -b 20G -v 40 (40 vcpus, 20G memory, block mappings(HUGETLB 1G)) KVM_CREATE_MAPPINGS: before 104.54s -> after 3.70s +96.46% KVM_UPDATE_MAPPINGS: before 174.20s -> after 115.94s +33.44% KVM_REBUILD_BLOCKS: before 103.95s -> after 2.96s +97.15% --- Changelogs: v4->v5: - rebased on the latest kvmarm/tree to adapt to the new stage-2 page-table code - v4: https://lore.kernel.org/lkml/20210409033652.28316-1-wangyanan55@huawei.com/ v3->v4: - perform D-cache flush if we are not mapping device memory - rebased on top of mainline v5.12-rc6 - v3: https://lore.kernel.org/lkml/20210326031654.3716-1-wangyanan55@huawei.com/ v2->v3: - drop patch #3 in v2 - retest v3 based on v5.12-rc2 - v2: https://lore.kernel.org/lkml/20210310094319.18760-1-wangyanan55@huawei.com/ v1->v2: - rebased on top of mainline v5.12-rc2 - also move CMOs of I-cache to the fault handlers - retest v2 based on v5.12-rc2 - v1: https://lore.kernel.org/lkml/20210208112250.163568-1-wangyanan55@huawei.com/ --- Yanan Wang (6): KVM: arm64: Introduce KVM_PGTABLE_S2_GUEST stage-2 flag KVM: arm64: Move D-cache flush to the fault handlers KVM: arm64: Add mm_ops member for structure stage2_attr_data KVM: arm64: Provide invalidate_icache_range at non-VHE EL2 KVM: arm64: Move I-cache flush to the fault handlers KVM: arm64: Distinguish cases of memcache allocations completely arch/arm64/include/asm/kvm_mmu.h | 31 ------------- arch/arm64/include/asm/kvm_pgtable.h | 38 ++++++++++------ arch/arm64/kvm/hyp/nvhe/cache.S | 11 +++++ arch/arm64/kvm/hyp/pgtable.c | 65 +++++++++++++++++++++++----- arch/arm64/kvm/mmu.c | 51 ++++++++-------------- 5 files changed, 107 insertions(+), 89 deletions(-) -- 2.23.0