Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp706142pxm; Fri, 25 Feb 2022 17:50:18 -0800 (PST) X-Google-Smtp-Source: ABdhPJxSCKsm2bWOoAGDl85poiwm9eSOhjCVvgkk8dWCe68H0LILUp1KZiK+HN+LfNd3l6lQPkQp X-Received: by 2002:a17:90a:6542:b0:1bd:149f:1c29 with SMTP id f2-20020a17090a654200b001bd149f1c29mr933304pjs.240.1645840217990; Fri, 25 Feb 2022 17:50:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645840217; cv=none; d=google.com; s=arc-20160816; b=ZL88b5L7jPP1IMjwuHLG3Li3Kv+95MEFzc974dzJ7G8/Oz3FM55r+4boVU6gj7gDeo E+emKzecylVFIAqSIYKyEiMYUf+OilqTed913F87zz5MAWAm5OqMWtYo3nigpgGOQbi1 VeiH8bxRHCr2944cV8acczjIq+OH3/oHHLbNpyo2w+btdPoRRSBM4K63tpmqYyqQItBq 4chgxkMoYnbpKv5Rlo6AwtjFxjVaOtG1G7gjC/WLxcv/5fgsCYOd8bfaBHSBkxZelwJQ +TChWGBaldASNRj42zZc5fNBmnIH6Pu0HgVrNnSae1NO5OtNJbi4YxwcMPgxLxV36o/K Y+XQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:references:mime-version :message-id:in-reply-to:date:reply-to:dkim-signature; bh=MilK8zUZX7tWJl2GgjJY8oNeKvoyoiy4aBCti+o2Dno=; b=sgchSs1KMRrxq1G9rhcZ4+mGBh5DAWKX63tD9OjGO+jHg61Kv5A6jdaqe1dlKQaOAt sbYg9vOIeY9ys73XIDiiaaqjOLCdpNkEtfY+FKz79aBhzeWz2Aq+SZjO/+0vZLvMqTo8 ACClQzLyaI+xPPjCB7RDfHERbtPIE/slsKc0TcHPpn0CsFW9ueakb2BhY+hgldMPD0/H NwKwSxJj/JZ+dQQSobbiG2cDm/gA1TgSLxpO3Rx/Pay13/4iSv6v8M3B3KZH/SLAARqK YXVXzwBFNeADq7bRPIPh/GoyJSGpNWUThbymDrL3U9QkqaHMJZZ7lJ2JXqeitYnQOcNd QCFA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=azOeWuW+; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id r4-20020a170902be0400b0015143f5d4b8si1189804pls.118.2022.02.25.17.50.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Feb 2022 17:50:17 -0800 (PST) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=azOeWuW+; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C76D1211ED3; Fri, 25 Feb 2022 17:37:38 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233651AbiBZATG (ORCPT + 99 others); Fri, 25 Feb 2022 19:19:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50392 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241262AbiBZARw (ORCPT ); Fri, 25 Feb 2022 19:17:52 -0500 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC68022D672 for ; Fri, 25 Feb 2022 16:16:47 -0800 (PST) Received: by mail-pj1-x1049.google.com with SMTP id t7-20020a17090a3b4700b001bc366c58faso6469966pjf.4 for ; Fri, 25 Feb 2022 16:16:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=MilK8zUZX7tWJl2GgjJY8oNeKvoyoiy4aBCti+o2Dno=; b=azOeWuW+lWy8w8HIL49LW9KhszQbW+YNWD8IHPgIc79FX9F9n3TXODYKi2HAmjESdo HOXPis4YQybgWPgSa31snGdX4dlIAVkB9Yva2qwEEiqK/z6DI1r5iw8PVKGyV8sZFp26 NKY9hJKEKL7NHbwOIm4FOe+6io3nDecTa17lwWjUGSyRR08BEvTKr9V1c2u071oaOFZD bC+F2p0SAaU18Xfrju7A1ThpQzKfyc7V0pHszBilaXsXz+TjfNJ1ZPCi7/KX24B4qLX4 v6o/Zqfmx6jE5Ve7LSoUfC4V2CUgRoY05MqIfyWwuNZzJn1fAx39O7gmmUpXmE+sUR+h 8GNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=MilK8zUZX7tWJl2GgjJY8oNeKvoyoiy4aBCti+o2Dno=; b=Xvies2fbQp5tAR+H8oUIg4sXGLwgcGYws7FWmu6b2+HhbnfQQa2JJ+ZQqdwidvzeTR g+/FhQwBs51docH/G99/SbOq0iSgW8HFMd0y3yQEiOB0AZDGqdrVCLx5MIb5/dj+Cbv6 NayV7b5sdzOLgx25MCvqSYjbazuoZp63/+OtSBKcAJMGUnUXUIBdzqLr8JLUubtm9zL4 rQ6y2BGX9SwBwOkHiSzMSy0lvjitZisQUGbQ1pz6Rpqk1YhObx8LUt6sqXxR34uNVXlR zYezMGjVMibdIeqAX9Srngi10uGd3kLFNEDG7S5/E3PH787Dcqj0nzowPw49XZ1FKqkh f5sA== X-Gm-Message-State: AOAM53078Mh/BijqyLL5S3Ie0PoR1lrnjASibyMa91FROMhRzGsH8QHB NDKIlhJNFowIT8xubzPl35fipeFsVDI= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a65:5bcc:0:b0:378:4b73:4fe9 with SMTP id o12-20020a655bcc000000b003784b734fe9mr2939194pgr.533.1645834598449; Fri, 25 Feb 2022 16:16:38 -0800 (PST) Reply-To: Sean Christopherson Date: Sat, 26 Feb 2022 00:15:40 +0000 In-Reply-To: <20220226001546.360188-1-seanjc@google.com> Message-Id: <20220226001546.360188-23-seanjc@google.com> Mime-Version: 1.0 References: <20220226001546.360188-1-seanjc@google.com> X-Mailer: git-send-email 2.35.1.574.g5d30c73bfb-goog Subject: [PATCH v3 22/28] KVM: x86/mmu: Zap defunct roots via asynchronous worker From: Sean Christopherson To: Paolo Bonzini , Christian Borntraeger , Janosch Frank , Claudio Imbrenda Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , David Hildenbrand , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Ben Gardon , Mingwei Zhang Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Zap defunct roots, a.k.a. roots that have been invalidated after their last reference was initially dropped, asynchronously via the system work queue instead of forcing the work upon the unfortunate task that happened to drop the last reference. If a vCPU task drops the last reference, the vCPU is effectively blocked by the host for the entire duration of the zap. If the root being zapped happens be fully populated with 4kb leaf SPTEs, e.g. due to dirty logging being active, the zap can take several hundred seconds. Unsurprisingly, most guests are unhappy if a vCPU disappears for hundreds of seconds. E.g. running a synthetic selftest that triggers a vCPU root zap with ~64tb of guest memory and 4kb SPTEs blocks the vCPU for 900+ seconds. Offloading the zap to a worker drops the block time to <100ms. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu_internal.h | 8 +++- arch/x86/kvm/mmu/tdp_mmu.c | 65 ++++++++++++++++++++++++++++----- 2 files changed, 63 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index be063b6c91b7..1bff453f7cbe 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -65,7 +65,13 @@ struct kvm_mmu_page { struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */ tdp_ptep_t ptep; }; - DECLARE_BITMAP(unsync_child_bitmap, 512); + union { + DECLARE_BITMAP(unsync_child_bitmap, 512); + struct { + struct work_struct tdp_mmu_async_work; + void *tdp_mmu_async_data; + }; + }; struct list_head lpage_disallowed_link; #ifdef CONFIG_X86_32 diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index ec28a88c6376..4151e61245a7 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -81,6 +81,38 @@ static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head) static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root, bool shared); +static void tdp_mmu_zap_root_async(struct work_struct *work) +{ + struct kvm_mmu_page *root = container_of(work, struct kvm_mmu_page, + tdp_mmu_async_work); + struct kvm *kvm = root->tdp_mmu_async_data; + + read_lock(&kvm->mmu_lock); + + /* + * A TLB flush is not necessary as KVM performs a local TLB flush when + * allocating a new root (see kvm_mmu_load()), and when migrating vCPU + * to a different pCPU. Note, the local TLB flush on reuse also + * invalidates any paging-structure-cache entries, i.e. TLB entries for + * intermediate paging structures, that may be zapped, as such entries + * are associated with the ASID on both VMX and SVM. + */ + tdp_mmu_zap_root(kvm, root, true); + + /* + * Drop the refcount using kvm_tdp_mmu_put_root() to test its logic for + * avoiding an infinite loop. By design, the root is reachable while + * it's being asynchronously zapped, thus a different task can put its + * last reference, i.e. flowing through kvm_tdp_mmu_put_root() for an + * asynchronously zapped root is unavoidable. + */ + kvm_tdp_mmu_put_root(kvm, root, true); + + read_unlock(&kvm->mmu_lock); + + kvm_put_kvm(kvm); +} + void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root, bool shared) { @@ -142,15 +174,26 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root, refcount_set(&root->tdp_mmu_root_count, 1); /* - * Zap the root, then put the refcount "acquired" above. Recursively - * call kvm_tdp_mmu_put_root() to test the above logic for avoiding an - * infinite loop by freeing invalid roots. By design, the root is - * reachable while it's being zapped, thus a different task can put its - * last reference, i.e. flowing through kvm_tdp_mmu_put_root() for a - * defunct root is unavoidable. + * Attempt to acquire a reference to KVM itself. If KVM is alive, then + * zap the root asynchronously in a worker, otherwise it must be zapped + * directly here. Wait to do this check until after the refcount is + * reset so that tdp_mmu_zap_root() can safely yield. + * + * In both flows, zap the root, then put the refcount "acquired" above. + * When putting the reference, use kvm_tdp_mmu_put_root() to test the + * above logic for avoiding an infinite loop by freeing invalid roots. + * By design, the root is reachable while it's being zapped, thus a + * different task can put its last reference, i.e. flowing through + * kvm_tdp_mmu_put_root() for a defunct root is unavoidable. */ - tdp_mmu_zap_root(kvm, root, shared); - kvm_tdp_mmu_put_root(kvm, root, shared); + if (kvm_get_kvm_safe(kvm)) { + root->tdp_mmu_async_data = kvm; + INIT_WORK(&root->tdp_mmu_async_work, tdp_mmu_zap_root_async); + schedule_work(&root->tdp_mmu_async_work); + } else { + tdp_mmu_zap_root(kvm, root, shared); + kvm_tdp_mmu_put_root(kvm, root, shared); + } } enum tdp_mmu_roots_iter_type { @@ -954,7 +997,11 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm) /* * Zap all roots, including invalid roots, as all SPTEs must be dropped - * before returning to the caller. + * before returning to the caller. Zap directly even if the root is + * also being zapped by a worker. Walking zapped top-level SPTEs isn't + * all that expensive and mmu_lock is already held, which means the + * worker has yielded, i.e. flushing the work instead of zapping here + * isn't guaranteed to be any faster. * * A TLB flush is unnecessary, KVM zaps everything if and only the VM * is being destroyed or the userspace VMM has exited. In both cases, -- 2.35.1.574.g5d30c73bfb-goog