Received: by 2002:a25:ca44:0:0:0:0:0 with SMTP id a65csp1312322ybg; Mon, 27 Jul 2020 13:35:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwWkE/LBD0s8qJiydpx22Bp++gxkNdqhdkqFrjOg6dimAjHLfzShtc1BvFvrN6g3T91/Vac X-Received: by 2002:a05:6402:19a5:: with SMTP id o5mr6382025edz.283.1595882100550; Mon, 27 Jul 2020 13:35:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595882100; cv=none; d=google.com; s=arc-20160816; b=WltAAKKz8xJ/RhjRCN9BKN2ZvDoCsSO7S9HWVkmO4Loj0cq7SIasErvnoUUG+omsaw mqQ0HcZxYx0dLUbSH2sLrUOkjVT65NTcnldAca1a5TYGNIgU4mysnbaGwzd5I5J290jT 4+CEE7gwrpIZa0uVkOINFabHtrDepUm7eA5eKE02e7clqNyScWPX3uMcz+WNoy9cIOpG ar36I+xFjo0Cxd6y1C9Pnxy2hX71U9aosPqhkT9Q3NaWKAhyNvCep1612ZptWFqvCtqx rR3pj5YY7ehOnr6wXqS0TaDVWUzJn3NQnpmNI6RYYc+D/DbNExLlDP43P6knzVNvhH1c l7/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:from:subject:mime-version :message-id:date:dkim-signature; bh=8+GWCqToRsYuSGreGCapWzOVpnWd7snaZLCa1XNJr0I=; b=NXTIBxCreyT1uGzSxK6xBMq9nEVF/vqe65Gk/ddX1BDCHzrhUfOy5thileDcbABe/v i+2zOpyJTfAw7kLJdrL/H5QCWB+HiVOpW5YzazB8WrJsHiyuSAOrLIy1VQ/Gx7l2ZGzN hs/eijvdhZw0z/Cm9EHSaSyu8A2IVCOMPLSzC5goVGOfLK9sWUHbuELJgZMOMJM6endA 6vEIHol3uLeX9Gg3bsadMmHZsL2pvMNVSqKgRvYQfNrKOJZu9jSlf/65VcykJgjw45sM mqHulxXcbAwDdPPRuPFt51fhls3NXc4gTZpQwcPcLixHU3xJpOohUihRk+ZEW0CjHNnr Fd9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=SGMEjLSs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id lw6si6180999ejb.506.2020.07.27.13.34.38; Mon, 27 Jul 2020 13:35:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=SGMEjLSs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728254AbgG0Uda (ORCPT + 99 others); Mon, 27 Jul 2020 16:33:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728917AbgG0Ud3 (ORCPT ); Mon, 27 Jul 2020 16:33:29 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1965CC0619D4 for ; Mon, 27 Jul 2020 13:33:29 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id i9so22318095ybg.1 for ; Mon, 27 Jul 2020 13:33:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=8+GWCqToRsYuSGreGCapWzOVpnWd7snaZLCa1XNJr0I=; b=SGMEjLSsF+Ur+M/B7vxrNKGkbM9VGYJ4OL16bAu+jA4wlVRqUKmhkovlTjCvo5cwU9 /GR6A4itCCVvgEwtK6ZdxnsEsFauXhRtDAeadxCpTuMTAFSYGn3km+Wp/WDZ5pV/gE4h 5OPZoJf92JLkU7KvFmGXdg4/sOkHWtZBnadjwZtWWIw3EWWfeUUoEaQa7B/7pGV01SWb m1h0X+ZNhWSnCC/KQ7tVoclpMbFgTjvcOXkHZJ5eTjyTTCkc12F3Xg9ylL2tYHwxh9iF RMSKvHSOapmw+rwteyyXBBlypKDGCfdEQfDDkF+XzY69jwq96n38yxI8cUB000mfvGjj CEMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=8+GWCqToRsYuSGreGCapWzOVpnWd7snaZLCa1XNJr0I=; b=fITN5XFBP8WtUsRdX69yoTGmXOtcLOAaNvgbv7pRbbmjJmlk5XeFoEmI7+lBnvEtPm cenVIgbrGhmjTFO6YTEtoFVlOp6BlHAIfSgfpUo4AQaOktxVuT36z5o/DwJRYuvyQ/w7 0tmrllv0Q5otb27y2+aPm+y4VN3n8vx9XThl64Bb+9D7Po1npLlcho08lXwSbVL2ojds 58/qwvbbAXv7arlQnMiJBRyM7SE18u8fxrXMGY9NSWlKD+1zCAcSpyeCJ62j1AjjnIS+ 5wu7h2aKx6Np+l/gE+5yw4E420/r7NSzySgw4WZ/MAkG3yc9xuJ+XW/zr3gT1a2VoHRD 9Gyg== X-Gm-Message-State: AOAM530MBw/w1Q+0TvRkwRHu4ilYnkafaMRdo+7BNt4l9GeJkWuyyVp9 9ogromjElFZzmnr9EPJW80jzcaD+8aSAQtcqxK7l5q2zKPlyY/2zFSz4qhUhR8QjIRGWR4Eh+vL 7MRivoWvbB8J2S5wTC74V+olKN/OgwJFBK0goBQR2Ba8ajFLdP9zbTwXius0BiOoeF1md/io6 X-Received: by 2002:a25:31d4:: with SMTP id x203mr37893178ybx.396.1595882008151; Mon, 27 Jul 2020 13:33:28 -0700 (PDT) Date: Mon, 27 Jul 2020 13:33:24 -0700 Message-Id: <20200727203324.2614917-1-bgardon@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.28.0.rc0.142.g3c755180ce-goog Subject: [PATCH 1/1] kvm: mmu: zap pages when zapping only parent From: Ben Gardon To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Paolo Bonzini , Peter Xu , Sean Christopherson , Peter Shier , Ben Gardon Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When the KVM MMU zaps a page, it will recursively zap the unsynced child pages, but not the synced ones. This can create problems over time when running many nested guests because it leaves unlinked pages which will not be freed until the page quota is hit. With the default page quota of 20 shadow pages per 1000 guest pages, this looks like a memory leak and can degrade MMU performance. In a recent benchmark, substantial performance degradation was observed: An L1 guest was booted with 64G memory. 2G nested Windows guests were booted, 10 at a time for 20 iterations. (200 total boots) Windows was used in this benchmark because they touch all of their memory on startup. By the end of the benchmark, the nested guests were taking ~10% longer to boot. With this patch there is no degradation in boot time. Without this patch the benchmark ends with hundreds of thousands of stale EPT02 pages cluttering up rmaps and the page hash map. As a result, VM shutdown is also much slower: deleting memslot 0 was observed to take over a minute. With this patch it takes just a few miliseconds. If TDP is enabled, zap child shadow pages when zapping the only parent shadow page. Tested by running the kvm-unit-tests suite on an Intel Haswell machine. No regressions versus commit c34b26b98cac ("KVM: MIPS: clean up redundant 'kvm_run' parameters"), or warnings. Reviewed-by: Peter Shier Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/mmu.c | 49 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 44 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index fa506aaaf0194..c550bc3831dcc 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2626,13 +2626,52 @@ static bool mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, return false; } -static void kvm_mmu_page_unlink_children(struct kvm *kvm, - struct kvm_mmu_page *sp) +static int kvm_mmu_page_unlink_children(struct kvm *kvm, + struct kvm_mmu_page *sp, + struct list_head *invalid_list) { unsigned i; + int zapped = 0; + + for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { + u64 *sptep = sp->spt + i; + u64 spte = *sptep; + struct kvm_mmu_page *child_sp; + + /* + * Zap the page table entry, unlinking any potential child + * page + */ + mmu_page_zap_pte(kvm, sp, sptep); + + /* If there is no child page for this spte, continue */ + if (!is_shadow_present_pte(spte) || + is_last_spte(spte, sp->role.level)) + continue; + + /* + * If TDP is enabled, then any shadow pages are part of either + * the EPT01 or an EPT02. In either case, do not expect the + * same pattern of page reuse seen in x86 PTs for + * copy-on-write and similar techniques. In this case, it is + * unlikely that a parentless shadow PT will be used again in + * the near future. Zap it to keep the rmaps and page hash + * maps from filling up with stale EPT02 pages. + */ + if (!tdp_enabled) + continue; + + child_sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK); + if (WARN_ON_ONCE(!child_sp)) + continue; + + /* Zap the page if it has no remaining parent pages */ + if (!child_sp->parent_ptes.val) + zapped += kvm_mmu_prepare_zap_page(kvm, child_sp, + invalid_list); + } - for (i = 0; i < PT64_ENT_PER_PAGE; ++i) - mmu_page_zap_pte(kvm, sp, sp->spt + i); + return zapped; } static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp) @@ -2678,7 +2717,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm, trace_kvm_mmu_prepare_zap_page(sp); ++kvm->stat.mmu_shadow_zapped; *nr_zapped = mmu_zap_unsync_children(kvm, sp, invalid_list); - kvm_mmu_page_unlink_children(kvm, sp); + *nr_zapped += kvm_mmu_page_unlink_children(kvm, sp, invalid_list); kvm_mmu_unlink_parents(kvm, sp); /* Zapping children means active_mmu_pages has become unstable. */ -- 2.28.0.rc0.142.g3c755180ce-goog