Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp4280899pxb; Mon, 8 Feb 2021 12:18:58 -0800 (PST) X-Google-Smtp-Source: ABdhPJwRw1grPyjrMrbXSDlXxDdoLF7MognozQ31nyQX2Tt1pEmRfVSdcugcuqMqnaVdw/fkPjkZ X-Received: by 2002:a17:906:7f83:: with SMTP id f3mr18360409ejr.282.1612815538511; Mon, 08 Feb 2021 12:18:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612815538; cv=none; d=google.com; s=arc-20160816; b=RpfBbCGX/ukAQy445QAV5qE5mXOhykkukgZEE6bwkmMMvz/icGMoqPRZAamiaFiPl6 /mYVbGXJx2vVT30oHAMBth2gislryZu0z0hIbNJS+qiHyCkqmQ2e2DP8U/vtSoJC5GS5 jh0tb45mxlZ0651sTVKkI3q1a0QZp2KtQn2iHPDggadhN9t/XyaYdSQYU7J77J0l79BC YbIaAXu1h/rdeudlZMJX7KGssI+XLHpUcA2AHeQrrUkkDuXlYPjoeFqASgeb3YubJBc8 LangFClqqEh8QzQxLMigQhakRPwJniqZKkVibEhpGe0fs7JMqfvAwGwhP8k1U6ECFeWD 0Sww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=L/XgJ8WbadKa4niUofKB1dI6OqpstZA7ihRIMnAPV4U=; b=IBRA2TT0k+Vd9piifLP9qin8NCrgyiZtG91SB+zI8Q9BdSFbu2HZmyipkNnXw+PGVv LG4fwAFUiQXwDE9FYPBdoLV23twvvcwWNE4QhvjkttVbFYUGlrTFkLxzXDTDehYXdCui VgpOqRTo2qEREz3HfaRgEtGudFmhzaIpAGKK0VCgSInQ12uF+/7iLQ8C3KaeMz5nlhvz UDrN6USZWdxucyl5/oEIGmfn31F4JWNrVGOTBY5wknIZ/wvUoRVHXNWMfFw4Ozh+76Kt E6wau3scZAmzumvKL3QoUFBD/ukt6kl10etUf2Dp2ylLKurhLoBSLRNhiGpHbuzPluIE ZGBQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m19si15278144edd.202.2021.02.08.12.18.34; Mon, 08 Feb 2021 12:18:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236788AbhBHURv (ORCPT + 99 others); Mon, 8 Feb 2021 15:17:51 -0500 Received: from vps-vb.mhejs.net ([37.28.154.113]:60334 "EHLO vps-vb.mhejs.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233253AbhBHSwh (ORCPT ); Mon, 8 Feb 2021 13:52:37 -0500 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.93.0.4) (envelope-from ) id 1l9BdK-00044c-5D; Mon, 08 Feb 2021 19:51:38 +0100 From: "Maciej S. Szmigiero" To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2] KVM: x86/mmu: Make HVA handler retpoline-friendly Date: Mon, 8 Feb 2021 19:51:32 +0100 Message-Id: <732d3fe9eb68aa08402a638ab0309199fa89ae56.1612810129.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.30.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Maciej S. Szmigiero" When retpolines are enabled they have high overhead in the inner loop inside kvm_handle_hva_range() that iterates over the provided memory area. Let's mark this function and its TDP MMU equivalent __always_inline so compiler will be able to change the call to the actual handler function inside each of them into a direct one. This significantly improves performance on the unmap test on the existing kernel memslot code (tested on a Xeon 8167M machine): 30 slots in use: Test Before After Improvement Unmap 0.0353s 0.0334s 5% Unmap 2M 0.00104s 0.000407s 61% 509 slots in use: Test Before After Improvement Unmap 0.0742s 0.0740s None Unmap 2M 0.00221s 0.00159s 28% Looks like having an indirect call in these functions (and, so, a retpoline) might have interfered with unrolling of the whole loop in the CPU. Signed-off-by: Maciej S. Szmigiero --- Changes from v1: * Switch from static dispatch to __always_inline annotation. * Separate this patch from the rest of log(n) memslot code changes. * Redo benchmarks. arch/x86/kvm/mmu/mmu.c | 21 +++++++++++---------- arch/x86/kvm/mmu/tdp_mmu.c | 16 +++++++++++----- 2 files changed, 22 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 6d16481aa29d..38d7a38609d4 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1456,16 +1456,17 @@ static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator) slot_rmap_walk_okay(_iter_); \ slot_rmap_walk_next(_iter_)) -static int kvm_handle_hva_range(struct kvm *kvm, - unsigned long start, - unsigned long end, - unsigned long data, - int (*handler)(struct kvm *kvm, - struct kvm_rmap_head *rmap_head, - struct kvm_memory_slot *slot, - gfn_t gfn, - int level, - unsigned long data)) +static __always_inline int +kvm_handle_hva_range(struct kvm *kvm, + unsigned long start, + unsigned long end, + unsigned long data, + int (*handler)(struct kvm *kvm, + struct kvm_rmap_head *rmap_head, + struct kvm_memory_slot *slot, + gfn_t gfn, + int level, + unsigned long data)) { struct kvm_memslots *slots; struct kvm_memory_slot *memslot; diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index b56d604809b8..f26c2269291f 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -639,11 +639,17 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, return ret; } -static int kvm_tdp_mmu_handle_hva_range(struct kvm *kvm, unsigned long start, - unsigned long end, unsigned long data, - int (*handler)(struct kvm *kvm, struct kvm_memory_slot *slot, - struct kvm_mmu_page *root, gfn_t start, - gfn_t end, unsigned long data)) +static __always_inline int +kvm_tdp_mmu_handle_hva_range(struct kvm *kvm, + unsigned long start, + unsigned long end, + unsigned long data, + int (*handler)(struct kvm *kvm, + struct kvm_memory_slot *slot, + struct kvm_mmu_page *root, + gfn_t start, + gfn_t end, + unsigned long data)) { struct kvm_memslots *slots; struct kvm_memory_slot *memslot;