Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp6257304rwr; Mon, 24 Apr 2023 16:58:21 -0700 (PDT) X-Google-Smtp-Source: AKy350aFioh4zE7giLqGiBGLzHjbpAK2D0a0AnnKEo6432P26ZHe85fT5KH/ppdA7cZXmtlmOHNC X-Received: by 2002:a05:6a20:a482:b0:db:f682:65ed with SMTP id y2-20020a056a20a48200b000dbf68265edmr14624364pzk.61.1682380700979; Mon, 24 Apr 2023 16:58:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682380700; cv=none; d=google.com; s=arc-20160816; b=ntVoyX7BxoHoYsrlkbaB27LAgEARTNfexhvsyPCg37gFRwZz1PXqU++H8K+izV50w3 c40byQc88ZNtuvTg7Or2MAojdxiqlpa2RsMoaGP5W88voI38SlWIXDtAwUPGudckoxOA jP+PSqZavT8Yvv/ozblY7S43WjPRtNHe7cONcBXchHmYJWO960gvWyhlAFm+8DvEnAfD iSA5+AxPHMCZ1e9bIcIE90dUlH5NudwaNxqoBl5+ywwTMJYybX3W9kUjz/eAoKkQVfgT x4elIEH+jsKwd0nbQBDWM2mQhrcNSq834EuzLT/8LbVy2lHbrgovq9JGe5FAYPLw1l2E FOTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=rIp06ywmX0lfxxI9OT2FwiSpXuGSU6q4dMRtZXHd20U=; b=BLwbo5qZnYdQbLRhH5gmdo9IjAiZyP6Ox1oSCr6cMMAvOYP+rGD1dc/QnCLvdEZHEk d+h0neLUtH7ULnW0YAFZlyN1i4dVW6X0zv8XbB3UW2zskwXrOEEmuMgTmg+x4IMjB5+G 1LvUJwL/UdGdaTB5U9XXyuyEXud334D00HSs6ORMzDqbNrjcaqRunODhdSkbUGs/nQ+l JXB/oTRyDLBjvynXOeuMvmLoJytzbyCcLPa8yrEKzRL/aWaqdyahQjnrZqlRXlCehqTj CcewNmEYevHqNjbqfSWSJej7N0T6WRKQV7yZnvYDW3SZi605hoRrgI0nHvjrtwLxxTqQ LLpw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b="VSR3MR/l"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y23-20020a63ce17000000b0051b4db5fd75si11785990pgf.850.2023.04.24.16.58.09; Mon, 24 Apr 2023 16:58:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b="VSR3MR/l"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233003AbjDXXz0 (ORCPT + 99 others); Mon, 24 Apr 2023 19:55:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232969AbjDXXzY (ORCPT ); Mon, 24 Apr 2023 19:55:24 -0400 Received: from mail-vk1-xa2f.google.com (mail-vk1-xa2f.google.com [IPv6:2607:f8b0:4864:20::a2f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C7B13584 for ; Mon, 24 Apr 2023 16:55:23 -0700 (PDT) Received: by mail-vk1-xa2f.google.com with SMTP id 71dfb90a1353d-4404c674cefso1696406e0c.2 for ; Mon, 24 Apr 2023 16:55:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682380522; x=1684972522; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=rIp06ywmX0lfxxI9OT2FwiSpXuGSU6q4dMRtZXHd20U=; b=VSR3MR/lpCZRqEQxMdSEB9m9wlwJlM3k+HH7FSqPbDpjJx977/r3olR7BuyQ3/16Ql s8N+sMD1s+jtsk7/62opG5ABBjstHJmiLhe2v+QPcsJ+U/zdXr2Z3t2A9kVGbazOWfL8 kINMWjdo9ocOEikko54ju2zprRaNw0/ajTtKs/Ca4dWWPsh1p+O18xXpcFOAwDP39FcO GIeojA8KleffKcEUg3LwZ7FTdsFvR24laN4xb98+5yc/Suozu8MEQzdLhgGENfd2IjMe kvwjszy1Iq/D63qaYxsx9WTv3WWHMr7J0z1cSQArazTPgyrQDy+soL0bS74x1EuUYwSC mgsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682380522; x=1684972522; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rIp06ywmX0lfxxI9OT2FwiSpXuGSU6q4dMRtZXHd20U=; b=IdRJhYMaygnRBiBy9huYv8oDMzw/a7wMFkdYsVlnDxcv1rllp9rTtLEuFURDKiyEaW Tzg6+3ZzKYGjODyZREszurTbxgkC0tKAEsC719xQfvWhL8RGsqjoSqLr3+ksWKEHeSQ1 uoUtR7Q6NThnFaHDfpSlP63fP1EXBEGH393YAzvBKWCiwEDQnBw2c0Gu+1SzbeYTfXTh QG2w2rvPLT13gXguA12AGS8ipN5W3v9J9iclswsU8h3n0r8TYwCe9+wMK+1jSthOZUVF KFX8SvphAhZg7DuRizYkUiTo11MR9oC3dfgkyGvFmiEI5PIM9XwwN9v8Xee/slh5vcIB Pj+Q== X-Gm-Message-State: AAQBX9dfBs9gLjYMyUOGy/bG9qQtTA90Sw+Fbzgbiohx4hBhEVotv//+ nrqQ6HM2AQawuZPpMz6CSdD6kYfXWaGNInt+MIN2Ww== X-Received: by 2002:a1f:3fce:0:b0:43f:c5fc:b406 with SMTP id m197-20020a1f3fce000000b0043fc5fcb406mr3823429vka.5.1682380522423; Mon, 24 Apr 2023 16:55:22 -0700 (PDT) MIME-Version: 1.0 References: <20230421214946.2571580-1-seanjc@google.com> In-Reply-To: From: David Matlack Date: Mon, 24 Apr 2023 16:54:56 -0700 Message-ID: Subject: Re: [PATCH v2] KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated To: Sean Christopherson Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Jeremi Piotrowski , Ben Gardon Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 21, 2023 at 6:56=E2=80=AFPM Sean Christopherson wrote: > > On Fri, Apr 21, 2023, David Matlack wrote: > > On Fri, Apr 21, 2023 at 2:49=E2=80=AFPM Sean Christopherson wrote: > > > > > > void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) > > > { > > > struct kvm_mmu_page *root; > > > > > > - lockdep_assert_held_write(&kvm->mmu_lock); > > > - list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { > > > - if (!root->role.invalid && > > > - !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) { > > > + /* > > > + * Note! mmu_lock isn't held when destroying the VM! There = can't be > > > + * other references to @kvm, i.e. nothing else can invalidate= roots, > > > + * but walking the list of roots does need to be guarded agai= nst roots > > > + * being deleted by the asynchronous zap worker. > > > + */ > > > + rcu_read_lock(); > > > + > > > + list_for_each_entry_rcu(root, &kvm->arch.tdp_mmu_roots, link)= { > > > > I see that roots are removed from the list with list_del_rcu(), so I > > agree this should be safe. > > > > KVM could, alternatively, acquire the mmu_lock in > > kvm_mmu_uninit_tdp_mmu(), which would let us keep the lockdep > > assertion and drop the rcu_read_lock() + comment. That might be worth > > it in case someone accidentally adds a call to > > kvm_tdp_mmu_invalidate_all_roots() without mmu_lock outside of VM > > teardown. kvm_mmu_uninit_tdp_mmu() is not a particularly performance > > sensitive path and adding the mmu_lock wouldn't add much overhead > > anyway (it would block for at most a few milliseconds waiting for the > > async work to reschedule). > > Heh, I actually started to ping you off-list to specifically discuss this= option, > but then decided that not waiting those few milliseconds might be worthwh= ile for > some use cases. I also couldn't quite convince myself that it would only= be a few > milliseconds, e.g. if the worker is zapping a fully populated 5-level roo= t, there > are no other tasks scheduled on _its_ CPU, and CONFIG_PREEMPTION=3Dn (whi= ch neuters > rwlock_needbreak()). Good point. At some point we're going to have to fix that. > > The other reason I opted for not taking mmu_lock is that, with the persis= tent roots > approach, I don't think it's actually strictly necessary for kvm_mmu_zap_= all_fast() > to invaliate roots while holding mmu_lock for write. Holding slots_lock = ensures > that only a single task can be doing invalidations, versus the old model = where > putting the last reference to a root could happen just about anywhere. A= nd > allocating a new root and zapping from mmu_noitifiers requires holding mm= u_lock for > write, so I _think_ we could getaway with holding mmu_lock for read. May= be. > > It's largely a moot point since kvm_mmu_zap_all_fast() needs to hold mmu_= lock for > write anyways to play nice with the shadow MMU, i.e. I don't expect us to= ever > want to pursue a change in this area. But at the same time I was struggl= ing to > write a comment explaining why the VM destruction path "had" to take mmu_= lock. Yeah, probably because it really isn't necessary :). It'd be nice to keep around the lockdep assertion though for the other (and future) callers. The cleanest options I can think of are: 1. Pass in a bool "vm_teardown" kvm_tdp_mmu_invalidate_all_roots() and use that to gate the lockdep assertion. 2. Take the mmu_lock for read in kvm_mmu_uninit_tdp_mmu() and pass down bool shared to kvm_tdp_mmu_invalidate_all_roots(). Both would satisfy your concern of not blocking teardown on the async worker and my concern of keeping the lockdep check. I think I prefer (1) since, as you point out, taking the mmu_lock at all is unnecessary.