Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp915214pxf; Thu, 1 Apr 2021 17:59:57 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzkfuFUpJZOXMeefkUCgSpA3IW2tkx5RKKwYsFpJqFcJBqzh77IOXDyuZ0Tw05J+3pBtCkb X-Received: by 2002:a05:6602:80d:: with SMTP id z13mr8791288iow.17.1617325197291; Thu, 01 Apr 2021 17:59:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617325197; cv=none; d=google.com; s=arc-20160816; b=AHvRgVaJs6pHujYU14mPeFTFwxHoaB+QDhc5FUUA6m9Q4kiy6jGi761dOm0mldghyp huSXcDEmQeEu2RHw87LKFaAmfbThc7WO7YoRNlX76zsgh6dI0LwadJufadyaq/sFMkfN 9ddQ5HrknTBE6wwfpZjGD/WQxX2KR190uIsbPXHGVLIMcczKxDSCPcZ5shkF6WB+huxe GWXvtxtE6uimrPnCS0s2Icfa3BvjdIrDdURpwPlwq60ind7rEqb5LZDvEcao5Ps4RlR9 R89GqZ7m2PdQ4Lwc+h+8K/IXNRv6vmxjk0PkZQjHf/GoDWcnJcYxu2x91XGgBOI9vr+p /Kxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:references:mime-version :message-id:in-reply-to:date:reply-to:dkim-signature; bh=0/ecd8NmYTmeaiIYnkr6nuUXDKg3hRMT4+OQ0nyX4DE=; b=Ct8pfkPZu2sCnVtOvP9WGITMfzU/IgkeI5GU2uj7EUDbo+FZJEUfw3SWrh3c/Eob/k PPwtYfxbC/NWKu2/Flo7jZ2Yn2L7qQ/st9XiYFNNHpoBgFCBytPD1lN1otSI+4SZxU7k l6ijwWYn2xd6AuSxI3a4q49vnPIMtHsUNpCbhowikWk6XWfnsa4WmBDFSg7MmdIS/OOG rioOrEfRbXfpFSqE9ZFD7MOOcGAaLRzMczKOTco3sSDV1rpJ9E/2kCEIWxwaBQnSiItJ mtpajAYKqXyQPXmmx9ga5Ig+pIjbXtBkTCpKa4Bt3PXHd5XzbJ7q8+DekqPm+e0ylckT ebqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=oYFGYSyo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 12si1554255ilz.54.2021.04.01.17.59.43; Thu, 01 Apr 2021 17:59:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=oYFGYSyo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231168AbhDBA5q (ORCPT + 99 others); Thu, 1 Apr 2021 20:57:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234318AbhDBA5W (ORCPT ); Thu, 1 Apr 2021 20:57:22 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 845DEC0613B5 for ; Thu, 1 Apr 2021 17:57:19 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id y13so7578028ybk.20 for ; Thu, 01 Apr 2021 17:57:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=0/ecd8NmYTmeaiIYnkr6nuUXDKg3hRMT4+OQ0nyX4DE=; b=oYFGYSyo5B35UKJnfDl/ptvTjCVW8KNCbB5+juLbSKFFyhdI1o15HSF2gl9NbqxKmR WTeg8loXzFmw7sYPBcDz9SnE1ep/TQ8R11Jo0v6/+BXHFnVXD5GetugNTwD0b5Bqt1L/ mx+Wv2fDdgWFKAi4FQThRwTMjWA7vs3uA/I3KI9C+xT2UVJEymjeiJxY9WdDCMtNY1Gh GqU4Bg7UboHpTTQPVmttgw2eCvKNRYVbBfciANMUqXXzeQ8x+WKQOPRt9i/fgo0c9Wxl r3IbL0WnEuE8LcmbAeXvX6nCRCfN/9RtrNuT+vA3q5p7oxMFeV4Y6S1fLZYoQC3OglWL 5a1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=0/ecd8NmYTmeaiIYnkr6nuUXDKg3hRMT4+OQ0nyX4DE=; b=HHgIH0TV5/jC2lOe4vvkentVBCyJCur3tmPogigzMry4Fmr2SITP+owVF3x8R6ZoG4 VGPPhWFeUkO2SUlhRBmJDktpS9Paj/YlmBVl3SA10Av2tSQETJst5awRqXSJycGECVee V3x9c5ABtNwQr1N8mstLJpaExbFX3NG1VWXybq0VYKjYBB2xb3j4JSdowd4cLRz5FneM R8fXcWku+9KLPESEzTo9OFcrGl8HItgxwX9t89nfVHCfqte39Ug8/3cN9riu0XKsnlEb /ReQtrUqWgEoP6kl49jCGf1eSMdU+77I2A7hT/Z9rGq8lG+12eiCODdwpvh4sDvQdNJt wZLQ== X-Gm-Message-State: AOAM532HFg6c7r1Npq6DbcfUyCRwxlYZSGnekMR12iPKt5Ti53hqLK3w nYYPwLcE05ajfCGmyXmeAjzR55HJTIA= X-Received: from seanjc798194.pdx.corp.google.com ([2620:15c:f:10:c0b4:8b8:bb34:6a56]) (user=seanjc job=sendgmr) by 2002:a25:74d2:: with SMTP id p201mr16375842ybc.406.1617325038741; Thu, 01 Apr 2021 17:57:18 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 1 Apr 2021 17:56:56 -0700 In-Reply-To: <20210402005658.3024832-1-seanjc@google.com> Message-Id: <20210402005658.3024832-9-seanjc@google.com> Mime-Version: 1.0 References: <20210402005658.3024832-1-seanjc@google.com> X-Mailer: git-send-email 2.31.0.208.g409f899ff0-goog Subject: [PATCH v2 08/10] KVM: Take mmu_lock when handling MMU notifier iff the hva hits a memslot From: Sean Christopherson To: Marc Zyngier , Huacai Chen , Aleksandar Markovic , Paul Mackerras , Paolo Bonzini Cc: James Morse , Julien Thierry , Suzuki K Poulose , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, linux-mips@vger.kernel.org, kvm@vger.kernel.org, kvm-ppc@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Gardon Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Defer acquiring mmu_lock in the MMU notifier paths until a "hit" has been detected in the memslots, i.e. don't take the lock for notifications that don't affect the guest. For small VMs, spurious locking is a minor annoyance. And for "volatile" setups where the majority of notifications _are_ relevant, this barely qualifies as an optimization. But, for large VMs (hundreds of threads) with static setups, e.g. no page migration, no swapping, etc..., the vast majority of MMU notifier callbacks will be unrelated to the guest, e.g. will often be in response to the userspace VMM adjusting its own virtual address space. In such large VMs, acquiring mmu_lock can be painful as it blocks vCPUs from handling page faults. In some scenarios it can even be "fatal" in the sense that it causes unacceptable brownouts, e.g. when rebuilding huge pages after live migration, a significant percentage of vCPUs will be attempting to handle page faults. x86's TDP MMU implementation is especially susceptible to spurious locking due it taking mmu_lock for read when handling page faults. Because rwlock is fair, a single writer will stall future readers, while the writer is itself stalled waiting for in-progress readers to complete. This is exacerbated by the MMU notifiers often firing multiple times in quick succession, e.g. moving a page will (always?) invoke three separate notifiers: .invalidate_range_start(), invalidate_range_end(), and .change_pte(). Unnecessarily taking mmu_lock each time means even a single spurious sequence can be problematic. Note, this optimizes only the unpaired callbacks. Optimizing the .invalidate_range_{start,end}() pairs is more complex and will be done in a future patch. Suggested-by: Ben Gardon Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 25ecb5235e17..f6697ad741ed 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -482,10 +482,10 @@ static void kvm_null_fn(void) static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, const struct kvm_hva_range *range) { + bool ret = false, locked = false; struct kvm_gfn_range gfn_range; struct kvm_memory_slot *slot; struct kvm_memslots *slots; - bool ret = false; int i, idx; /* A null handler is allowed if and only if on_lock() is provided. */ @@ -493,11 +493,13 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, IS_KVM_NULL_FN(range->handler))) return 0; - KVM_MMU_LOCK(kvm); - idx = srcu_read_lock(&kvm->srcu); + /* The on_lock() path does not yet support lock elision. */ if (!IS_KVM_NULL_FN(range->on_lock)) { + locked = true; + KVM_MMU_LOCK(kvm); + range->on_lock(kvm, range->start, range->end); if (IS_KVM_NULL_FN(range->handler)) @@ -532,6 +534,10 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot); gfn_range.slot = slot; + if (!locked) { + locked = true; + KVM_MMU_LOCK(kvm); + } ret |= range->handler(kvm, &gfn_range); } } @@ -540,7 +546,8 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, kvm_flush_remote_tlbs(kvm); out_unlock: - KVM_MMU_UNLOCK(kvm); + if (locked) + KVM_MMU_UNLOCK(kvm); srcu_read_unlock(&kvm->srcu, idx); -- 2.31.0.208.g409f899ff0-goog