Received: by 2002:ab2:620c:0:b0:1ef:ffd0:ce49 with SMTP id o12csp1229330lqt; Tue, 19 Mar 2024 17:51:39 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUvfGPfOQ+dCQc2TMt7IWLgiMxFJ/psJtkxh7vF9MUKpeG7zE5oK0DavKu5JYW5qSjuk8oGzWMlNI4kxOY/WpuKaH2ZPee8SOrCMWg9Zw== X-Google-Smtp-Source: AGHT+IERBjuJil70X/3DWQiAOfqOgBS066llADUXXPvPH94hpoKYvl2xx+bYXxaP41Q5h1814lkN X-Received: by 2002:a05:6402:2789:b0:568:c309:f7f5 with SMTP id b9-20020a056402278900b00568c309f7f5mr7100774ede.6.1710895899744; Tue, 19 Mar 2024 17:51:39 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1710895899; cv=pass; d=google.com; s=arc-20160816; b=C7B0/glLl8hkrkMzr1fl8F+RRBpR/1vyGNyyt7RVktXTQgH0qrWmO16I4WyDHSqUo7 Qh+AJz4TAFS18rU+iiRQMglgNlR9JO1zGbsTlfTm1m6ONkrm/+ZEde4BMsg/Zggy+mJm oi2BXySxh1uGi/P/xedIxc5L3F0rs0D2OFSY1o8pk8YvszQ+L3MQeaR3dGEOS1P3k5Dm YHX+NSGl+Vd5aRMPyC4FNdUUXJrbg5YVFBoOMoJS2kXVaD2x3O9JWMAiiE8gkFSjwhtL ez9lmf9VoxM0JL+jCXaew1AzEboDmJJIDgHeY9ogxfrAFktn5ws5clk/nBmF7kRqi/g3 oFLA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:references:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date :reply-to:dkim-signature; bh=73z0LxtdomfWG0GJE09D1ttIG4ILmbtU6RU20yVkqf4=; fh=Zs1tcTz6oZhS03pFbq6XFH1JxYu3Jwxold58CCDy7i0=; b=B2VCH1Eapkqz3r4MXyYU88IpDztG5ZXUp9/YwYjT5u02Sz/9nWQfzugoTO/naUfokk 9m572Ng14NjbufuMklifWifVf7lrhifEtjjdRrQKmSe27y85oBK/A0vInnXzntGe+3bS Qb+CFtnsvmIg8IWT39dpnCawLAR4s75NASY72+h742/f3lqmquZQ/EB4X9fn8M1ts9Cg KoHB2QlxISO0+VqwQYpQmiyT2N2X204wrbHiCHYiNrQECUbA8NfcTkYO5ciVIXCzaa2V RB6GB76hh2Sw6Jz7SLU5k3Tq+XUXmdzfMlGkHCVnoPHyXVS6zl+uZiWFKTmKWwKCQqTR epXQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=mh901u8t; arc=pass (i=1 spf=pass spfdomain=flex--seanjc.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-108285-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-108285-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id g1-20020a056402428100b0056ba99b13f1si316049edc.649.2024.03.19.17.51.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Mar 2024 17:51:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-108285-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=mh901u8t; arc=pass (i=1 spf=pass spfdomain=flex--seanjc.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-108285-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-108285-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 53A7D1F22AC6 for ; Wed, 20 Mar 2024 00:51:39 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8B925EAE7; Wed, 20 Mar 2024 00:50:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="mh901u8t" Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1E631388 for ; Wed, 20 Mar 2024 00:50:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710895844; cv=none; b=YK9bd6sM/wD7vU17sflaLfKkV9PlmJIUfxKt//tlQxP4BPloJmw/qvBkvFFBoj28zdqUPyWUAL/BKlSKH+j9D9IBhM2Q/r66WiPks5mk8JVWs8YJIXEWTFcEsjVG3A2waA3dw9h+6/lxaCKHODLhMTwFSxCxHJFmEz8duZwgZFk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710895844; c=relaxed/simple; bh=e9vCkrtfJaABLYAyKrQ0HWBYDJdjKHVlxWPsqtjWiU8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=mdW66Mr7X/R65I1mE11JMAJTX00+FIdC/EIVlL1PoQ4ni+FdI3El86CPg4r5EHXP+BaKDJlyoXYEQliQwR1dk7InUfYHegIxgc3von6ptDLRrp0doQiY08Vzk7js/sj1ATdHKq42zo8HRDMYoYU8qagGZdE70RL4UCFPy3CS2wc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=mh901u8t; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-5cfd6ba1c11so4430678a12.0 for ; Tue, 19 Mar 2024 17:50:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1710895842; x=1711500642; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=73z0LxtdomfWG0GJE09D1ttIG4ILmbtU6RU20yVkqf4=; b=mh901u8t7A54U2oZkNub4mTJqcIJ7r6ar6ZLym0UGnJFrCgbUOfjM0E6Gq15UQFSQy RSK1Rnk9FPCboZydal4qJhq3WZGrOsoVEKozytT+WM/BIUY/P+esReMCKKtB5T/hWxeY 24KJT1c3Q0KescU5r1QTa0ef21/Jo7AVT5Z+ID1lGYTORA6LKTDoh/X0X9O9j6XL5Rco gCkmxglKkv11TCT86KQWTOI5YKyWPRs3WzkvaV94tnDMl958WaAH6aoP2yDoSSdlqenE PaDgIHnxkJDqFF1UF8nKNn3DuYRvp5NzLJKWucICAHy5DLqeviJXvKsIZcmXVMSEqeZ6 2j9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710895842; x=1711500642; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=73z0LxtdomfWG0GJE09D1ttIG4ILmbtU6RU20yVkqf4=; b=bs10lZ0wZeP77hPH+x+SmWorN5FreVhVRNevWi3IPUKD1Fq71rdLUA+otcqHoXTwxe sbjFNRfnd3Jwud/2uSqNtVySj5CLvX4TrTrsxwnfqsG006OyOlykj+bgEpVgaKwu1kR+ UF216GzEvIPx+LDVxQlp1OzzOvFTw8VrX1MdXt1FTwbnRX00bzUeUDcnkpJw0YgCMFxM YkOvcTtFJClQZaN5hTsqblTUWIUFcmN/vUHS+M9mRCr1gVCXl4SKGcvqnPm//zTNKI8v A/OPIgzmPIOGuW67roux+Huo/BK/6BZtevrqah0cnossfbG3KfRunGLUxyO6pzPmmGA3 +xKQ== X-Forwarded-Encrypted: i=1; AJvYcCUbJpR4hV0vU0A5YNJa1VIV2EZjJem8jWsRTyyb0sArmeSoqoP79psK03MOg1vz0H3j/4scE4mP/fqKKB6p3YiMxQ8K6iYH5feMzeAt X-Gm-Message-State: AOJu0YzDdPbuvJKOBJQ6e60JUoU89upEhr07RxxZvNfa4R0J/mF/irPG Znz3ogwUH/o3gCppv9PBy1dq1wt5TicQJb+QyQM4c14aPNwR/kLADv8Z4alC/JmlWppK70arzcQ /7g== X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a02:488:b0:5e8:57aa:3609 with SMTP id bw8-20020a056a02048800b005e857aa3609mr6141pgb.9.1710895841989; Tue, 19 Mar 2024 17:50:41 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 19 Mar 2024 17:50:23 -0700 In-Reply-To: <20240320005024.3216282-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240320005024.3216282-1-seanjc@google.com> X-Mailer: git-send-email 2.44.0.291.gc1ea87d7ee-goog Message-ID: <20240320005024.3216282-4-seanjc@google.com> Subject: [RFC PATCH 3/4] KVM: x86/mmu: Mark page/folio accessed only when zapping leaf SPTEs From: Sean Christopherson To: Paolo Bonzini , Sean Christopherson Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Hildenbrand , David Matlack , David Stevens , Matthew Wilcox Content-Type: text/plain; charset="UTF-8" Mark folios as accessed only when zapping leaf SPTEs, which is a rough heuristic for "only in response to an mmu_notifier invalidation". Page aging and LRUs are tolerant of false negatives, i.e. KVM doesn't need to be precise for correctness, and re-marking folios as accessed when zapping entire roots or when zapping collapsible SPTEs is expensive and adds very little value. E.g. when a VM is dying, all of its memory is being freed; marking folios accessed at that time provides no known value. Similarly, because KVM makes folios as accessed when creating SPTEs, marking all folios as accessed when userspace happens to delete a memslot doesn't add value. The folio was marked access when the old SPTE was created, and will be marked accessed yet again if a vCPU accesses the pfn again after reloading a new root. Zapping collapsible SPTEs is a similar story; marking folios accessed just because userspace disable dirty logging is a side effect of KVM behavior, not a deliberate goal. Mark folios accessed when the primary MMU might be invalidating mappings, e.g. instead of completely dropping calls to kvm_set_pfn_accessed(), as such zappings are not KVM initiated, i.e. might actually be related to page aging and LRU activity. Note, x86 is the only KVM architecture that "double dips"; every other arch marks pfns as accessed only when mapping into the guest, not when mapping into the guest _and_ when removing from the guest. Signed-off-by: Sean Christopherson --- Documentation/virt/kvm/locking.rst | 76 +++++++++++++++--------------- arch/x86/kvm/mmu/mmu.c | 4 +- arch/x86/kvm/mmu/tdp_mmu.c | 7 ++- 3 files changed, 43 insertions(+), 44 deletions(-) diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst index 02880d5552d5..8b3bb9fe60bf 100644 --- a/Documentation/virt/kvm/locking.rst +++ b/Documentation/virt/kvm/locking.rst @@ -138,49 +138,51 @@ Then, we can ensure the dirty bitmaps is correctly set for a gfn. 2) Dirty bit tracking -In the origin code, the spte can be fast updated (non-atomically) if the +In the original code, the spte can be fast updated (non-atomically) if the spte is read-only and the Accessed bit has already been set since the Accessed bit and Dirty bit can not be lost. But it is not true after fast page fault since the spte can be marked writable between reading spte and updating spte. Like below case: -+------------------------------------------------------------------------+ -| At the beginning:: | -| | -| spte.W = 0 | -| spte.Accessed = 1 | -+------------------------------------+-----------------------------------+ -| CPU 0: | CPU 1: | -+------------------------------------+-----------------------------------+ -| In mmu_spte_clear_track_bits():: | | -| | | -| old_spte = *spte; | | -| | | -| | | -| /* 'if' condition is satisfied. */| | -| if (old_spte.Accessed == 1 && | | -| old_spte.W == 0) | | -| spte = 0ull; | | -+------------------------------------+-----------------------------------+ -| | on fast page fault path:: | -| | | -| | spte.W = 1 | -| | | -| | memory write on the spte:: | -| | | -| | spte.Dirty = 1 | -+------------------------------------+-----------------------------------+ -| :: | | -| | | -| else | | -| old_spte = xchg(spte, 0ull) | | -| if (old_spte.Accessed == 1) | | -| kvm_set_pfn_accessed(spte.pfn);| | -| if (old_spte.Dirty == 1) | | -| kvm_set_pfn_dirty(spte.pfn); | | -| OOPS!!! | | -+------------------------------------+-----------------------------------+ ++-------------------------------------------------------------------------+ +| At the beginning:: | +| | +| spte.W = 0 | +| spte.Accessed = 1 | ++-------------------------------------+-----------------------------------+ +| CPU 0: | CPU 1: | ++-------------------------------------+-----------------------------------+ +| In mmu_spte_update():: | | +| | | +| old_spte = *spte; | | +| | | +| | | +| /* 'if' condition is satisfied. */ | | +| if (old_spte.Accessed == 1 && | | +| old_spte.W == 0) | | +| spte = new_spte; | | ++-------------------------------------+-----------------------------------+ +| | on fast page fault path:: | +| | | +| | spte.W = 1 | +| | | +| | memory write on the spte:: | +| | | +| | spte.Dirty = 1 | ++-------------------------------------+-----------------------------------+ +| :: | | +| | | +| else | | +| old_spte = xchg(spte, new_spte);| | +| if (old_spte.Accessed && | | +| !new_spte.Accessed) | | +| flush = true; | | +| if (old_spte.Dirty && | | +| !new_spte.Dirty) | | +| flush = true; | | +| OOPS!!! | | ++-------------------------------------+-----------------------------------+ The Dirty bit is lost in this case. diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index bd2240b94ff6..0a6c6619d213 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -539,10 +539,8 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte) * to guarantee consistency between TLB and page tables. */ - if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte)) { + if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte)) flush = true; - kvm_set_pfn_accessed(spte_to_pfn(old_spte)); - } if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte)) flush = true; diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 5866a664f46e..340d5af454c6 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -520,10 +520,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, if (was_present && !was_leaf && (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared); - - if (was_leaf && is_accessed_spte(old_spte) && - (!is_present || !is_accessed_spte(new_spte) || pfn_changed)) - kvm_set_pfn_accessed(spte_to_pfn(old_spte)); } /* @@ -841,6 +837,9 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, tdp_mmu_iter_set_spte(kvm, &iter, 0); + if (is_accessed_spte(iter.old_spte)) + kvm_set_pfn_accessed(spte_to_pfn(iter.old_spte)); + /* * Zappings SPTEs in invalid roots doesn't require a TLB flush, * see kvm_tdp_mmu_zap_invalidated_roots() for details. -- 2.44.0.291.gc1ea87d7ee-goog