Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp1703235pxm; Fri, 4 Mar 2022 01:37:03 -0800 (PST) X-Google-Smtp-Source: ABdhPJx0Vl0EWUWwu8dUD5bEeWabUaoOPf9FYPCay+fwd4liY8dti/1Vfxk8d/Rs+Y3vcMnRDlE3 X-Received: by 2002:a63:5110:0:b0:374:2312:1860 with SMTP id f16-20020a635110000000b0037423121860mr33550053pgb.146.1646386623034; Fri, 04 Mar 2022 01:37:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646386623; cv=none; d=google.com; s=arc-20160816; b=t4/HoNwcwGW/8HsfR+yv9eJSQjeB58YaCSW3Yzkz8ZFgiNqGKIvUWUuUTFEDgqIUl9 GwNfIGSkWPZ38HiIMltVYQB9ChtuQRDWmNMWzvXL1Qzgqnts75FncS8O7zWeNLXI3AxN xVX4Zgb15FDPyhV3grJwl1Z2B0EJtdSBd8ZX/gEKet+ie6FGUxeRhO16tD2gO4skx3VI LIbSecXbNiTTv0mA0UskdlUTSXlKfEP0t+Bq1ycBsvJryH5iEPYk/IVWjLgFH27SUcRc DPwxmyxPOr+SJuT/WJEAjchL8OUdL63Wab1N7VR4vWnfHufJCbBL24yp4MwfaP73hWCV Rjwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=W/TzNiPeigL0ybLFFuG1hFcpyfv81UUM7woffq6f+EE=; b=J4cBxvULGYTgo5CviH52L++tEuZBuLcH7juwdbLnFGEXXJmTjMW9QsWbgppWUgeTHX izoS3qu8JqZ+1nXrLn8v2vOnH6V8Q/FRjDBnSW3yPU2UgaNTQnpcLZSctqWO8AxavC7u 4AiPBdpgOyWYzCoZxlNB/X9WTdEMaYOcGcITMORFxw0Hk/IIhak2DqtgMbPmGJC8GsGT oNzwuCq4g3+IAGpjx4q/I0MiB05V2ayEvywGL60lo8ZPUEVxfiFmpsHnj1Qw4aZMin0J ZS2tJJBLW0XuNzidBSn6Oksf96AbMDyH2Ksc9lWtUvl4tPWaRcuKocA98MCoQABC1GDj mkbA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=MMHEBXue; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e14-20020a056a001a8e00b004e1c075fb86si4407898pfv.285.2022.03.04.01.36.46; Fri, 04 Mar 2022 01:37:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=MMHEBXue; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234893AbiCCWJC (ORCPT + 99 others); Thu, 3 Mar 2022 17:09:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232050AbiCCWJA (ORCPT ); Thu, 3 Mar 2022 17:09:00 -0500 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADA593C71B for ; Thu, 3 Mar 2022 14:08:13 -0800 (PST) Received: by mail-pl1-x631.google.com with SMTP id z2so6009418plg.8 for ; Thu, 03 Mar 2022 14:08:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=W/TzNiPeigL0ybLFFuG1hFcpyfv81UUM7woffq6f+EE=; b=MMHEBXueAzqDEsE5wiiM5H+gGRnExkrrDKWkUaSEsk8DUCsdP27CLVo1vaTeJagjSr CiSEEsg3qJYJhHI+vrnS5A90omkj8FP4rq2zSR/D4ysjtAigzqxd7qwbyD4sPZ02KMNP K1eiuAfmfwVG6l7GIMdKYvviXNx51N2HchR1qs5UwuIIDSFUegSoQvUBgRARyBt3uRwU N8Ejvgj3Eg3iYS5s35YWhFr+571WFqfNXsmYNCP/gtaJI350Ui9f6V5AQrAz34Mtarl9 WLyBVNlYcqmvorwlbfpNrLd1eejIvs5IcScpPf/nleW6qvKLVUuwBtRKlAyYSME2X5Yi 8GEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=W/TzNiPeigL0ybLFFuG1hFcpyfv81UUM7woffq6f+EE=; b=4xRrR2pI/ROaAZ3nZJphA0/1QcKARBDReBfxwE6cMjAR8z5WAKPQsn6jviIwmQ2y2f +jnud8YX6rmtVzO+kcMi0OSWyZY4/4j1tZdwXzLUjORJ99jo53hJkizwwjsGFcKQDZjq C0q7j2NAeWi1JfmZBXGz229UjFE00S9lL4IKAdWyibjRVYp561/96VgUE16vGrBZZaxn 2bQyBrFFTXjYCp/uFigeDxdmNjQkQJi+k9QupNKS7t6RpVykUwGPv0W3eNz549oK60cW aYmh2uySxUW2a7T3Y88CmQz7IPoJXFkM7qSgt59cwgIS4ckyNfO37Mqq2uOv12Jsi1W2 vpdA== X-Gm-Message-State: AOAM532O+kyFXJIVwBOVKJHeXY1V9ZE6DZduvo1PBWLLOD/s89x2A9Qa yhX15SMgZQz15ggb8Kis7C6LXg== X-Received: by 2002:a17:90a:20a:b0:1be:e850:1a37 with SMTP id c10-20020a17090a020a00b001bee8501a37mr7569061pjc.28.1646345292897; Thu, 03 Mar 2022 14:08:12 -0800 (PST) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id m20-20020a634c54000000b003739af127c9sm2889926pgl.70.2022.03.03.14.08.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Mar 2022 14:08:12 -0800 (PST) Date: Thu, 3 Mar 2022 22:08:08 +0000 From: Sean Christopherson To: Paolo Bonzini Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , David Hildenbrand , David Matlack , Ben Gardon , Mingwei Zhang Subject: Re: [PATCH v4 24/30] KVM: x86/mmu: Zap defunct roots via asynchronous worker Message-ID: References: <20220303193842.370645-1-pbonzini@redhat.com> <20220303193842.370645-25-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220303193842.370645-25-pbonzini@redhat.com> X-Spam-Status: No, score=-18.1 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 03, 2022, Paolo Bonzini wrote: > Zap defunct roots, a.k.a. roots that have been invalidated after their > last reference was initially dropped, asynchronously via the system work > queue instead of forcing the work upon the unfortunate task that happened > to drop the last reference. > > If a vCPU task drops the last reference, the vCPU is effectively blocked > by the host for the entire duration of the zap. If the root being zapped > happens be fully populated with 4kb leaf SPTEs, e.g. due to dirty logging > being active, the zap can take several hundred seconds. Unsurprisingly, > most guests are unhappy if a vCPU disappears for hundreds of seconds. > > E.g. running a synthetic selftest that triggers a vCPU root zap with > ~64tb of guest memory and 4kb SPTEs blocks the vCPU for 900+ seconds. > Offloading the zap to a worker drops the block time to <100ms. > > Co-developed-by: Sean Christopherson > Signed-off-by: Sean Christopherson > Reviewed-by: Ben Gardon > Message-Id: <20220226001546.360188-23-seanjc@google.com> > Signed-off-by: Paolo Bonzini > --- > arch/x86/kvm/mmu/tdp_mmu.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index e24a1bff9218..2456f880508d 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -170,13 +170,24 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root, > */ > if (!kvm_tdp_root_mark_invalid(root)) { > refcount_set(&root->tdp_mmu_root_count, 1); > - tdp_mmu_zap_root(kvm, root, shared); > > /* > - * Give back the reference that was added back above. We now > + * If the struct kvm is alive, we might as well zap the root > + * in a worker. The worker takes ownership of the reference we > + * just added to root and is flushed before the struct kvm dies. Not a fan of the "we might as well zap the root in a worker", IMO we should require going forward that invalidated, reachable TDP MMU roots are always zapped in a worker > + */ > + if (likely(refcount_read(&kvm->users_count))) { > + tdp_mmu_schedule_zap_root(kvm, root); Regarding the need for kvm_tdp_mmu_invalidate_all_roots() to guard against re-queueing a root for zapping, this is the point where it becomes functionally problematic. When "fast zap" was the only user of tdp_mmu_schedule_zap_root(), re-queueing was benign as the work struct was guaranteed to not be currently queued. But this code runs outside of slots_lock, and so a root that was "put" but hasn't finished zapping can be observed and re-queued by the "fast zap. I think it makes sense to create a rule/invariant that an invalidated TDP MMU root _must_ be zapped via the work queue. Then I.e. do this as fixup: diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 40bf861b622a..cff4f2102a63 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1019,8 +1019,9 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm) * of invalidated roots; the list is effectively the list of work items in * the workqueue. * - * Skip roots that are already queued for zapping, flushing the work queue will - * ensure invalidated roots are zapped regardless of when they were queued. + * Skip roots that are already invalid and thus queued for zapping, flushing + * the work queue will ensure invalid roots are zapped regardless of when they + * were queued. * * Because mmu_lock is held for write, it should be impossible to observe a * root with zero refcount,* i.e. the list of roots cannot be stale. @@ -1034,13 +1035,12 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) lockdep_assert_held_write(&kvm->mmu_lock); list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { - if (root->tdp_mmu_async_data) + if (kvm_tdp_root_mark_invalid(root)) continue; if (WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) continue; - root->role.invalid = true; tdp_mmu_schedule_zap_root(kvm, root); } } > + return; > + } > + > + /* > + * The struct kvm is being destroyed, zap synchronously and give > + * back immediately the reference that was added above. We now > * know that the root is invalid, so go ahead and free it if > * no one has taken a reference in the meanwhile. > */ > + tdp_mmu_zap_root(kvm, root, shared); > if (!refcount_dec_and_test(&root->tdp_mmu_root_count)) > return; > } > -- > 2.31.1 > >