Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp2994817rwr; Fri, 21 Apr 2023 18:59:26 -0700 (PDT) X-Google-Smtp-Source: AKy350aRkCj4ipPjoRqrxCLuJzsQdV/4KkpMFj/pOsBB+P5V5EqbI7QV2f89SLY0RSTR36LCllu1 X-Received: by 2002:a17:903:2450:b0:1a0:563e:b0c4 with SMTP id l16-20020a170903245000b001a0563eb0c4mr8315836pls.2.1682128765735; Fri, 21 Apr 2023 18:59:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682128765; cv=none; d=google.com; s=arc-20160816; b=cO14Yi+aNQS+bPBvxkoZISb+pbKy2I360npWYXdixRm0CjHP9N2Nr7w5mPI8H+PDTF CkoRWcfuzlXVEXwb+/sYbFJHRLNnVK6SEZX43avgjKvOpPVdWsgoj3O7IVDHywAFxfZT VTLtgncyULquqDuqVCs1zN13uQAr4TDt4Dz+3+DATo6P8K6qpQRTL7SWmKmCwXHSRixo XKAOvyY6k+MUyfzQMavbEOfOXm+UZxS18nbOYb/SgGSJae0pobarc7GZ7mMHvaZmll7l iAekfSU7xhri/1B/GlMsd5FgwTRdJ/Jb5sHVyXErLLa8grffu43kF+1TkIlGFphJPdfg r+yQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:from:subject :message-id:references:mime-version:in-reply-to:date:dkim-signature; bh=IQdNhpWZDysF1UahR095KnSwqZZsAIRpm1aTBOeoxU0=; b=Z9IBT2CPDAIIpY6sT5A9ENvxdExHHWQ57J9X70Mxa8ypvoOmOTKEEnFNySYiIQbUlR 7CqdO+rESmkdXJIu8pvwj63AH5UAmyMRfsDcMQmqI1PNXNJ0N6vLMzgzhj/i3ZywFXLH 7jHZdsnIhcroOp9O/MWDzvmjXHTNPd/bxU0Q7UTrVtNcHsesLfu/lrDbrcrFMoRT5lJl YHo1d7xFYSv6cr1BktmImQbh8nTwPTuqdJDegmrURnzGm2vvjbSzLImd6bIxPmFPV92v REoMdoZGHulPC6wOwRMbXn3spVuXSF3efDBRkQ7nJtK/7Bd3n8us2Le8n3o2s6HoexSw Uaqg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=WvBNdsrm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f6-20020a170902ce8600b001a2a864bd4dsi6429741plg.396.2023.04.21.18.59.04; Fri, 21 Apr 2023 18:59:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=WvBNdsrm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233600AbjDVB4l (ORCPT + 99 others); Fri, 21 Apr 2023 21:56:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232989AbjDVB4j (ORCPT ); Fri, 21 Apr 2023 21:56:39 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33EEC1719 for ; Fri, 21 Apr 2023 18:56:38 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-51b41410216so1628533a12.0 for ; Fri, 21 Apr 2023 18:56:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682128597; x=1684720597; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=IQdNhpWZDysF1UahR095KnSwqZZsAIRpm1aTBOeoxU0=; b=WvBNdsrmV3zKyzEzVd5WJbwXsxfW67XDwZajqzJsoe5/qS8aKiZC25DdwamG22qL6P pOEeHuaZK2WXX6LGpiaQNIBtapa+enyHQ3U6So6xs1wWhEwuP6g5Y87o+vFl7Ncl0FqJ aYPy5R3UpftMUJ2Et77SQTLDXmIxV+NGmjW43VWWPCwUWy6VQ0sym7Y44JghDOXDlOqh r0btZ4Wmj5NknxwgPJ8hD/0A5NAUUs1K3/nu4gi0FhzTZfOz9uhzg5kr+7AprgOPq1Ig Qak78Ut7eio8UZFGwrEoZe02RLOwNMhl/Q2sfzgRh2JM3lFX5eEtNS6gMx0wsVKF04EG uhMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682128597; x=1684720597; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=IQdNhpWZDysF1UahR095KnSwqZZsAIRpm1aTBOeoxU0=; b=TIZ72+/xIlwZ0ElDoNjMqe4eVweZ3J7rGHzH/gP/2IchmaliCV3Z5g4s35gWGBD7rw BE5qBaJMtinIGtOvbGDplYYjcKk/gSHGCdt5YfBtQqtY0zyOC8s0GCvsCsyVC6rSlIDi Ai26MliMfxMaPjERv3CeVF7p8c6oEVb0JuVhHKv/tKBZF4ZITQHsXdd2vY2jxU9PkHsY IX7bNofIgvbQSkUpd0teewkWvU3BBOSBhSWHTCjSnFGniQb0MkH1mxxQTOCEnGLHpTa4 HGm5NgX5UX6EDR36btUqcCopY+n+/G5jY9X8L0qx3uEAE5IxszpB2FJsr/0jXgeVeJzR h15Q== X-Gm-Message-State: AAQBX9f5DieSCAwsGNvLQtAcQPJUw9OAOXQGwv4seNWlgSF/MhgVQYaN ttpq7PgLiWjiMMRczb62IFIRWwS0M18= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a63:40c1:0:b0:50b:189d:d1a with SMTP id n184-20020a6340c1000000b0050b189d0d1amr1635019pga.10.1682128597618; Fri, 21 Apr 2023 18:56:37 -0700 (PDT) Date: Fri, 21 Apr 2023 18:56:35 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230421214946.2571580-1-seanjc@google.com> Message-ID: Subject: Re: [PATCH v2] KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated From: Sean Christopherson To: David Matlack Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Jeremi Piotrowski , Ben Gardon Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 21, 2023, David Matlack wrote: > On Fri, Apr 21, 2023 at 2:49=E2=80=AFPM Sean Christopherson wrote: > > > > void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) > > { > > struct kvm_mmu_page *root; > > > > - lockdep_assert_held_write(&kvm->mmu_lock); > > - list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { > > - if (!root->role.invalid && > > - !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) { > > + /* > > + * Note! mmu_lock isn't held when destroying the VM! There ca= n't be > > + * other references to @kvm, i.e. nothing else can invalidate r= oots, > > + * but walking the list of roots does need to be guarded agains= t roots > > + * being deleted by the asynchronous zap worker. > > + */ > > + rcu_read_lock(); > > + > > + list_for_each_entry_rcu(root, &kvm->arch.tdp_mmu_roots, link) { >=20 > I see that roots are removed from the list with list_del_rcu(), so I > agree this should be safe. >=20 > KVM could, alternatively, acquire the mmu_lock in > kvm_mmu_uninit_tdp_mmu(), which would let us keep the lockdep > assertion and drop the rcu_read_lock() + comment. That might be worth > it in case someone accidentally adds a call to > kvm_tdp_mmu_invalidate_all_roots() without mmu_lock outside of VM > teardown. kvm_mmu_uninit_tdp_mmu() is not a particularly performance > sensitive path and adding the mmu_lock wouldn't add much overhead > anyway (it would block for at most a few milliseconds waiting for the > async work to reschedule). Heh, I actually started to ping you off-list to specifically discuss this o= ption, but then decided that not waiting those few milliseconds might be worthwhil= e for some use cases. I also couldn't quite convince myself that it would only b= e a few milliseconds, e.g. if the worker is zapping a fully populated 5-level root,= there are no other tasks scheduled on _its_ CPU, and CONFIG_PREEMPTION=3Dn (which= neuters rwlock_needbreak()). The other reason I opted for not taking mmu_lock is that, with the persiste= nt roots approach, I don't think it's actually strictly necessary for kvm_mmu_zap_al= l_fast() to invaliate roots while holding mmu_lock for write. Holding slots_lock en= sures that only a single task can be doing invalidations, versus the old model wh= ere putting the last reference to a root could happen just about anywhere. And allocating a new root and zapping from mmu_noitifiers requires holding mmu_= lock for write, so I _think_ we could getaway with holding mmu_lock for read. Maybe= . It's largely a moot point since kvm_mmu_zap_all_fast() needs to hold mmu_lo= ck for write anyways to play nice with the shadow MMU, i.e. I don't expect us to e= ver want to pursue a change in this area. But at the same time I was strugglin= g to write a comment explaining why the VM destruction path "had" to take mmu_lo= ck.