Received: by 2002:a9a:4c47:0:b029:116:c383:538 with SMTP id u7csp938591lko; Tue, 13 Jul 2021 13:23:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx92CVkmW+B/y/MTsW402J0VRaxtWQ+Fr3GMEDTOibjupy/UMn5FrEGp5mO6M53CDvOwSiG X-Received: by 2002:a05:6402:22aa:: with SMTP id cx10mr8047245edb.0.1626207834813; Tue, 13 Jul 2021 13:23:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626207834; cv=none; d=google.com; s=arc-20160816; b=JlqxNnonnHnhth9wwMq667ZHjuyxTjfFZXM8dJNC6af38XWt1pR/UlvAXVnfMqh+me FWazJwHVOgzIMhCJQ/svL9AIqWmUqBrtvoP+A1u/tlqj4twZF020wtHs4fYGvrzSwNTh CgCEOC2+jHdu/GkT4DsTq6eATu4QZRx6oGmcNm59rOD7ZwhACS4TjF0wgSVa0LUAtvI+ svzyGYav1VgLxqLCMZumSM1UcECgeiWytI6zSaSIEhuv+msol2h5IAPhGtTPT/JR5U// 0AdgfymG6+lZAwDvE9dxjuOg9/tFVzvXOlWN3OOydrx1JMmAUUov1EiNeKrl4/YtIzlr FvLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=ApntNZUKGH4ouQpQjiCPIANirQV7gTDdHNRaTDbpwkI=; b=hUX1Hu6Of1kAfafdcLDZFvqeGFgODFZK/InZFgnSWGDGdDxHcKDAkzp577E6qmeX3j Nq0bLsKwGRjLOcgTjWo4A5K9r4tfbawdewJBv8CC6neFqsdrCxM/t7EXGbWr0eS4ixmX i1WwptAG2emcZ2LIES+z4sltlfrBHKoyLai2vHyXz69wTx5pmSOn8T7AgWitf1WYW9Ne rMvKUH03pDhJUJ9G/ig9sarSqkcdu+yBYSa0PxAp594dlo/J8LW+Du/EB+eLpBPdoz/u 1Z+Gq0aEREOnJhvjDj6DhbctkC0E1gb1B2P88zNkmttw11iX5vhPVLFlubaIlzuw7RS8 rU8A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="AtOjI9/b"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j10si22608887ejv.442.2021.07.13.13.23.16; Tue, 13 Jul 2021 13:23:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="AtOjI9/b"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235048AbhGMUWe (ORCPT + 99 others); Tue, 13 Jul 2021 16:22:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234615AbhGMUWe (ORCPT ); Tue, 13 Jul 2021 16:22:34 -0400 Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58949C0613E9 for ; Tue, 13 Jul 2021 13:19:44 -0700 (PDT) Received: by mail-pf1-x42d.google.com with SMTP id j9so14966489pfc.5 for ; Tue, 13 Jul 2021 13:19:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=ApntNZUKGH4ouQpQjiCPIANirQV7gTDdHNRaTDbpwkI=; b=AtOjI9/bDSq3Rjimdi/lTu2X5RJ4USrs8ebWyNH6X63y3bZrp7837V6sQUvuLcTyv4 VtyVHTzRHTl86zcyXa04Ko/aPaXLX6RHfoKq2rUhU1nfHFkpeJ1f9s1Pm7dy4Kt+5eWC d/o1ycJX6wU5/3PgMFHVozAtyBn/DqjG4CJQ1qYHhmDQwQkkLEh+O1DIW2xGP7GajfKz VAbEY3TpcAlWdQUseO/sWsama1xVFLkxjWboWUPLQ7QhkpkcriDYWIocbkSwF8sED/y7 CVH15vKlve4Eg7/PKKCrWivT0rsM7+3fCWxadkYZwDxbO5UkZNrMXgV1/JYZdMTeRb2d YT9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ApntNZUKGH4ouQpQjiCPIANirQV7gTDdHNRaTDbpwkI=; b=Ej6I+Hru2ttTvQAer3/3wfDsRI39j2jO8t5QJkw3/haYvTLD6np+aGIBvHL2Zr7pgt wTBV/PIBigCcrG15yQBc2aRsYxLLYLiMvnFSqzSkZ/l0vGfZu0zUEm/ds3/WxbjkB2HW TLidyipC1hMi4Bfyif6IH4JMbcltAO/3A2zmMC40mZ78Aa5OKP/C6e/1SUD8txxAbFRH WZcuEvmBQxIBNG/EDKvs1Wt//qRzOqgTWGuVYSNW6aReGhNylwsQZsrS7dQCcDqt1Rr4 h6Dku75hzGjs2daUCKw7me2Dc7pFleDMDFZGIzsuMwu163NDCwiB88+8lgUmX+lS+bAX MzSg== X-Gm-Message-State: AOAM5336Fa+9BVZ9EYIXkdP2sLp8o4A9cdSf5KfYiYdOpWkCRa46YBSA ufnbyzxndVl9259sEtLnphIbYw== X-Received: by 2002:a65:56ca:: with SMTP id w10mr5732194pgs.107.1626207583582; Tue, 13 Jul 2021 13:19:43 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id k189sm23339698pgk.14.2021.07.13.13.17.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jul 2021 13:18:11 -0700 (PDT) Date: Tue, 13 Jul 2021 20:17:10 +0000 From: Sean Christopherson To: Paolo Bonzini Cc: isaku.yamahata@intel.com, Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , erdemaktas@google.com, Connor Kuehl , x86@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, isaku.yamahata@gmail.com, Sean Christopherson Subject: Re: [RFC PATCH v2 16/69] KVM: x86/mmu: Zap only leaf SPTEs for deleted/moved memslot by default Message-ID: References: <78d02fee3a21741cc26f6b6b2fba258cd52f2c3c.1625186503.git.isaku.yamahata@intel.com> <3ef7f4e7-cfda-98fe-dd3e-1b084ef86bd4@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3ef7f4e7-cfda-98fe-dd3e-1b084ef86bd4@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 06, 2021, Paolo Bonzini wrote: > On 03/07/21 00:04, isaku.yamahata@intel.com wrote: > > From: Sean Christopherson > > > > Zap only leaf SPTEs when deleting/moving a memslot by default, and add a > > module param to allow reverting to the old behavior of zapping all SPTEs > > at all levels and memslots when any memslot is updated. > > > > Signed-off-by: Sean Christopherson > > Signed-off-by: Isaku Yamahata > > --- > > arch/x86/kvm/mmu/mmu.c | 21 ++++++++++++++++++++- > > 1 file changed, 20 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 8d5876dfc6b7..5b8a640f8042 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -85,6 +85,9 @@ __MODULE_PARM_TYPE(nx_huge_pages_recovery_ratio, "uint"); > > static bool __read_mostly force_flush_and_sync_on_reuse; > > module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644); > > +static bool __read_mostly memslot_update_zap_all; > > +module_param(memslot_update_zap_all, bool, 0444); > > + > > /* > > * When setting this variable to true it enables Two-Dimensional-Paging > > * where the hardware walks 2 page tables: > > @@ -5480,11 +5483,27 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm) > > return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages)); > > } > > +static void kvm_mmu_zap_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) > > +{ > > + /* > > + * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst > > + * case scenario we'll have unused shadow pages lying around until they > > + * are recycled due to age or when the VM is destroyed. > > + */ > > + write_lock(&kvm->mmu_lock); > > + slot_handle_level(kvm, slot, kvm_zap_rmapp, PG_LEVEL_4K, > > + KVM_MAX_HUGEPAGE_LEVEL, true); > > + write_unlock(&kvm->mmu_lock); > > +} > > + > > static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm, > > struct kvm_memory_slot *slot, > > struct kvm_page_track_notifier_node *node) > > { > > - kvm_mmu_zap_all_fast(kvm); > > + if (memslot_update_zap_all) > > + kvm_mmu_zap_all_fast(kvm); > > + else > > + kvm_mmu_zap_memslot(kvm, slot); > > } > > void kvm_mmu_init_vm(struct kvm *kvm) > > > > This is the old patch that broke VFIO for some unknown reason. Yes, my white whale :-/ > The commit message should at least say why memslot_update_zap_all is not true > by default. Also, IIUC the bug still there with NX hugepage splits disabled, I strongly suspect the bug is also there with hugepage splits enabled, it's just masked and/or harder to hit. > but what if the TDP MMU is enabled? This should not be a module param. IIRC, the original code I wrote had it as a per-VM flag that wasn't even exposed to the user, i.e. TDX guests always do the partial flush and non-TDX guests always do the full flush. I think that's the least awful approach if we can't figure out the underlying bug before TDX is ready for inclusion.