Received: by 2002:ab2:6203:0:b0:1f5:f2ab:c469 with SMTP id o3csp889529lqt; Fri, 19 Apr 2024 13:44:02 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCX17OGiYjLmGtWr0HLuu60VBVT++AS40wyVvcK3K/gNO6dxiX/KI5QtCS084QIpgEi63BK0RrIL/BopuZnPqhMNrxqIsBSRLOgTthCXgw== X-Google-Smtp-Source: AGHT+IGbp1rkUTYC3/EISr1FVdG6oTcEX0s8dmTg7ZUdLhzjw2kdM3Xn7BmwPXyWtJeTOwVyf5RE X-Received: by 2002:a05:6a00:190a:b0:6ea:f3fb:26fe with SMTP id y10-20020a056a00190a00b006eaf3fb26femr10578741pfi.12.1713559442240; Fri, 19 Apr 2024 13:44:02 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713559442; cv=pass; d=google.com; s=arc-20160816; b=U0Kzs1eqUF1xh+002Cz2GM+WzBnPjDwEuCVTW6HZ5580Z5CDw/bh/HSVj8ysKkC0I8 JxU+I7APnt9iFOym1LP83xrijvOatT4enBxogNtpbNYDor0JDFU6V32BTbsfbfLyxNsQ QZhfgVHzeEgKnq5TVI8FqmUCCaNJw2SIMG+9cNyj/2LX1D/6baOpMrnh0T+hSIvBsp5F yDRNom2EH/ugAIolQAlpb/sltRaJGaQ6SZJtyWyON2RCVzJgsg9jyGUbstElb6ZVuV7w M6ye9guqoZau+UIo217djgt0dF/1YzbrhthY62KkrXNmr9R7QGD9AOGQweArlEQ9fuJi 13xw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=YrZKOhtpV6MTNw07l3xNfXkFWnpKfDtuHjOgNMG4wsw=; fh=PSNkj+ibIKGwxhQDdDfPmK6EUp8fEF4QY5GiAdAarS4=; b=AkaGjPwk2mzMwDyREwNPJhDWkq17bsbYGJSlkqwp7NZHMZo3GKUDZVT8pkTR2tQ0b1 eBffSFo8D9D8yAakdSBpZO617gdXbLaSGEF2CNxrSRgUgMk9+MqWxV4aKCaeFXorb8FA QY9mPlOWT/QIn7viNk535V81PgKw7JcLuMLctVt5wFZI9OzOD4lcJRwbrCGiwng/p4/S 0WRf53VaIhDdYJrXpi7ixyk1yPWsKrHGn7jUw+8tnPhnNXFN3Fs8v6k3K0OuHD7pP84I toKNf7MIPrmi4bcgdch+MlXPhfK43WYSu0pgGWdHqKEyyfSLVznjWrpX0/5wvytxIDqR eeHA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=rOLB0vQm; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-151941-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-151941-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id ea15-20020a056a004c0f00b006ecf002e2e1si3871405pfb.109.2024.04.19.13.44.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Apr 2024 13:44:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-151941-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=rOLB0vQm; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-151941-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-151941-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id C1BB3B22F05 for ; Fri, 19 Apr 2024 20:42:43 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9D8BE2BAEF; Fri, 19 Apr 2024 20:42:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rOLB0vQm" Received: from mail-il1-f182.google.com (mail-il1-f182.google.com [209.85.166.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 87E1529CE5 for ; Fri, 19 Apr 2024 20:42:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713559352; cv=none; b=YbUa4CIeShdtFqmD/OSTa6jxO/vJ+0oju+D8R7xL92kwXkk6KAlr7ObZ/ZxqUMpHxUJtxyM16VYRnZY/kMw+VxXhNeBoypq4oYr0zCHtxJAZY/AAACCWUDR7HV6iqHKsbAxJxaNXdsyDDlnEx80Ig4UrHWk3rFHg7LGMZXOB3G8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713559352; c=relaxed/simple; bh=BBtbBsUgk9pQNW+wDDToJ7bi4ACX0MuuhFvEzmiruDk=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=ZyCAr7neybl6q1Jtc9Pi8fE+1JtCchE/eTO9ymsIA+mWQJSGHFkiCMGSlplkTUsN4x33nyb8ua4FQY5KWddyDRMDrkQk3UcNaESmpporzuoX0uXYiKMA9uT149vUkal8TqN2nn1XiY5PIgMU68XgbSRX12kFv1+Jqe0ZjBRPuvs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=rOLB0vQm; arc=none smtp.client-ip=209.85.166.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-il1-f182.google.com with SMTP id e9e14a558f8ab-36b34effb04so7225ab.1 for ; Fri, 19 Apr 2024 13:42:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1713559349; x=1714164149; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=YrZKOhtpV6MTNw07l3xNfXkFWnpKfDtuHjOgNMG4wsw=; b=rOLB0vQmoKoB6mGeNEfyMJtkvheevRaUnqZmvvBOfgjp3nPiNjj+TIldYJ8g5UypoR d9QsQ4CP9Euax3dYuGqFrsWY19KG4nohgs9Me8c3sMbLsss1nT+E3Y3bP9tzKqLcyIg4 0TayRg41oEdh7aplyZSsFuJ8brDiS4Xgcp8qv+mcWNZiJbzy7ffKhcyIcZXKpKKGMIa2 WyLMAHCqVTuBHqlIibnn7+SMFPBJTyH85ikek1M4A9HnKxioYjRLm+WFsFByqh+vJF3k oOS7MQX4/sSoynMNZ8GBCBRKme4KiY14IydvXu8a2UuDxYynDXCo80GEiZbsbOp4EyHZ iTxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713559349; x=1714164149; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YrZKOhtpV6MTNw07l3xNfXkFWnpKfDtuHjOgNMG4wsw=; b=RNhF8jrkl0yzPAOX1gQF1kvdq19PhBClCKCyobwVtY640B9hDRCrsRUfP1B3crgpoO IJPJP9zs0yjehzfMs1YLlmDevq1Nm0TfyoIRleJZ58I8kZpYF+gKgXBgX5Hkz7E5dTSo E4RO+Gu3BRyT4w4KE+NkvJgRJAnxB860oOD2dKpazYkzZnEDxnLQOCcGfgcTTyJiBTK0 YEJarGZrIhMqq+4HIkyjOu7zTIGfejk3FY67Yp38PxKHNGBLykKDdK0TbhoBHACi6Tz2 TW3r7DOlxByFVDem+Drmx4DzzAKmrXadJyMl+ntrD1dJv4LtO6038k3jcurV5hdd1VWc 1R2w== X-Forwarded-Encrypted: i=1; AJvYcCWCFPwm9R3toTBElR4V1/Y7hompQPOFYeqHngWxPgGBlhwQfqZ+jsmCiba/30LQ/d0on1YjX5d3wuGo9lM93A4TqTKrsozHMj/EqGp/ X-Gm-Message-State: AOJu0YxE3Ypmsz+jJKgeZI/LoxU5MD8Wzi56RF9/LxXog6aP7IFjf2xY zKwO3r2q+Jom6ZmwSgJoRXHGsyAtX9oFlJoV9S3+O5i6C7YAwo2Oe8HvROXH0FAS0Bi4MIWoam6 s6d3MZak9dSrEffGTvbwPuS1XOTtS2EGPzx6h X-Received: by 2002:a05:6e02:784:b0:36b:36d5:ca38 with SMTP id q4-20020a056e02078400b0036b36d5ca38mr4736ils.27.1713559348560; Fri, 19 Apr 2024 13:42:28 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> <20240401232946.1837665-4-jthoughton@google.com> In-Reply-To: From: James Houghton Date: Fri, 19 Apr 2024 13:41:52 -0700 Message-ID: Subject: Re: [PATCH v3 3/7] KVM: Add basic bitmap support into kvm_mmu_notifier_test/clear_young To: David Matlack Cc: Andrew Morton , Paolo Bonzini , Yu Zhao , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Apr 12, 2024 at 1:28=E2=80=AFPM David Matlack = wrote: > > On 2024-04-01 11:29 PM, James Houghton wrote: > > Add kvm_arch_prepare_bitmap_age() for architectures to indiciate that > > they support bitmap-based aging in kvm_mmu_notifier_test_clear_young() > > and that they do not need KVM to grab the MMU lock for writing. This > > function allows architectures to do other locking or other preparatory > > work that it needs. > > There's a lot going on here. I know it's extra work but I think the > series would be easier to understand and simplify if you introduced the > KVM support for lockless test/clear_young() first, and then introduce > support for the bitmap-based look-around. Yeah I think this is the right thing to do. Thanks. > > Specifically: > > 1. Make all test/clear_young() notifiers lockless. i.e. Move the > mmu_lock into the architecture-specific code (kvm_age_gfn() and > kvm_test_age_gfn()). > > 2. Convert KVM/x86's kvm_{test,}_age_gfn() to be lockless for the TDP > MMU. > > 4. Convert KVM/arm64's kvm_{test,}_age_gfn() to hold the mmu_lock in > read-mode. > > 5. Add bitmap-based look-around support to KVM/x86 and KVM/arm64 > (probably 2-3 patches). This all sounds good to me. Thanks for laying it out for me -- this should be a lot simpler. > > > > > If an architecture does not implement kvm_arch_prepare_bitmap_age() or > > is unable to do bitmap-based aging at runtime (and marks the bitmap as > > unreliable): > > 1. If a bitmap was provided, we inform the caller that the bitmap is > > unreliable (MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE). > > 2. If a bitmap was not provided, fall back to the old logic. > > > > Also add logic for architectures to easily use the provided bitmap if > > they are able. The expectation is that the architecture's implementatio= n > > of kvm_gfn_test_age() will use kvm_gfn_record_young(), and > > kvm_gfn_age() will use kvm_gfn_should_age(). > > > > Suggested-by: Yu Zhao > > Signed-off-by: James Houghton > > --- > > include/linux/kvm_host.h | 60 ++++++++++++++++++++++++++ > > virt/kvm/kvm_main.c | 92 +++++++++++++++++++++++++++++----------- > > 2 files changed, 127 insertions(+), 25 deletions(-) > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index 1800d03a06a9..5862fd7b5f9b 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -1992,6 +1992,26 @@ extern const struct _kvm_stats_desc kvm_vm_stats= _desc[]; > > extern const struct kvm_stats_header kvm_vcpu_stats_header; > > extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[]; > > > > +/* > > + * Architectures that support using bitmaps for kvm_age_gfn() and > > + * kvm_test_age_gfn should return true for kvm_arch_prepare_bitmap_age= () > > + * and do any work they need to prepare. The subsequent walk will not > > + * automatically grab the KVM MMU lock, so some architectures may opt > > + * to grab it. > > + * > > + * If true is returned, a subsequent call to kvm_arch_finish_bitmap_ag= e() is > > + * guaranteed. > > + */ > > +#ifndef kvm_arch_prepare_bitmap_age > > +static inline bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *mn= ) > > I find the name of these architecture callbacks misleading/confusing. > The lockless path is used even when a bitmap is not provided. i.e. > bitmap can be NULL in between kvm_arch_prepare/finish_bitmap_age(). Yes. I am really terrible at picking names.... I'm happy to just nix this, following your other suggestions. > > > +{ > > + return false; > > +} > > +#endif > > +#ifndef kvm_arch_finish_bitmap_age > > +static inline void kvm_arch_finish_bitmap_age(struct mmu_notifier *mn)= {} > > +#endif > > kvm_arch_finish_bitmap_age() seems unnecessary. I think the KVM/arm64 > code could acquire/release the mmu_lock in read-mode in > kvm_test_age_gfn() and kvm_age_gfn() right? Yes you're right, except that the way it is now, we only lock/unlock once for the notifier instead of once for each overlapping memslot, but that's not an issue, as you mention below. > > > + > > #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER > > static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) > > { > > @@ -2076,9 +2096,16 @@ static inline bool mmu_invalidate_retry_gfn_unsa= fe(struct kvm *kvm, > > return READ_ONCE(kvm->mmu_invalidate_seq) !=3D mmu_seq; > > } > > > > +struct test_clear_young_metadata { > > + unsigned long *bitmap; > > + unsigned long bitmap_offset_end; > > bitmap_offset_end is unused. Indeed, sorry about that. > > > + unsigned long end; > > + bool unreliable; > > +}; > > union kvm_mmu_notifier_arg { > > pte_t pte; > > unsigned long attributes; > > + struct test_clear_young_metadata *metadata; > > nit: Maybe s/metadata/test_clear_young/ ? Yes, that's better. > > > }; > > > > struct kvm_gfn_range { > > @@ -2087,11 +2114,44 @@ struct kvm_gfn_range { > > gfn_t end; > > union kvm_mmu_notifier_arg arg; > > bool may_block; > > + bool lockless; > > Please document this as it's somewhat subtle. A reader might think this > implies the entire operation runs without taking the mmu_lock. Will do, and I'll improve the comments for the other "lockless" variables. (In fact, it might be better to rename/adjust this one to "mmu_lock_taken" instead -- it's a little more obvious what that means.) > > > }; > > bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)= ; > > bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); > > bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); > > bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range); > > + > > +static inline void kvm_age_set_unreliable(struct kvm_gfn_range *range) > > +{ > > + struct test_clear_young_metadata *args =3D range->arg.metadata; > > + > > + args->unreliable =3D true; > > +} > > +static inline unsigned long kvm_young_bitmap_offset(struct kvm_gfn_ran= ge *range, > > + gfn_t gfn) > > +{ > > + struct test_clear_young_metadata *args =3D range->arg.metadata; > > + > > + return hva_to_gfn_memslot(args->end - 1, range->slot) - gfn; > > +} > > +static inline void kvm_gfn_record_young(struct kvm_gfn_range *range, g= fn_t gfn) > > +{ > > + struct test_clear_young_metadata *args =3D range->arg.metadata; > > + > > + WARN_ON_ONCE(gfn < range->start || gfn >=3D range->end); > > + if (args->bitmap) > > + __set_bit(kvm_young_bitmap_offset(range, gfn), args->bitm= ap); > > +} > > +static inline bool kvm_gfn_should_age(struct kvm_gfn_range *range, gfn= _t gfn) > > +{ > > + struct test_clear_young_metadata *args =3D range->arg.metadata; > > + > > + WARN_ON_ONCE(gfn < range->start || gfn >=3D range->end); > > + if (args->bitmap) > > + return test_bit(kvm_young_bitmap_offset(range, gfn), > > + args->bitmap); > > + return true; > > +} > > #endif > > > > #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index d0545d88c802..7d80321e2ece 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -550,6 +550,7 @@ struct kvm_mmu_notifier_range { > > on_lock_fn_t on_lock; > > bool flush_on_ret; > > bool may_block; > > + bool lockless; > > }; > > > > /* > > @@ -598,6 +599,8 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hv= a_range(struct kvm *kvm, > > struct kvm_memslots *slots; > > int i, idx; > > > > + BUILD_BUG_ON(sizeof(gfn_range.arg) !=3D sizeof(gfn_range.arg.pte)= ); > > + > > if (WARN_ON_ONCE(range->end <=3D range->start)) > > return r; > > > > @@ -637,15 +640,18 @@ static __always_inline kvm_mn_ret_t __kvm_handle_= hva_range(struct kvm *kvm, > > gfn_range.start =3D hva_to_gfn_memslot(hva_start,= slot); > > gfn_range.end =3D hva_to_gfn_memslot(hva_end + PA= GE_SIZE - 1, slot); > > gfn_range.slot =3D slot; > > + gfn_range.lockless =3D range->lockless; > > > > if (!r.found_memslot) { > > r.found_memslot =3D true; > > - KVM_MMU_LOCK(kvm); > > - if (!IS_KVM_NULL_FN(range->on_lock)) > > - range->on_lock(kvm); > > - > > - if (IS_KVM_NULL_FN(range->handler)) > > - break; > > + if (!range->lockless) { > > + KVM_MMU_LOCK(kvm); > > + if (!IS_KVM_NULL_FN(range->on_loc= k)) > > + range->on_lock(kvm); > > + > > + if (IS_KVM_NULL_FN(range->handler= )) > > + break; > > + } > > } > > r.ret |=3D range->handler(kvm, &gfn_range); > > } > > @@ -654,7 +660,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hv= a_range(struct kvm *kvm, > > if (range->flush_on_ret && r.ret) > > kvm_flush_remote_tlbs(kvm); > > > > - if (r.found_memslot) > > + if (r.found_memslot && !range->lockless) > > KVM_MMU_UNLOCK(kvm); > > > > srcu_read_unlock(&kvm->srcu, idx); > > @@ -682,19 +688,24 @@ static __always_inline int kvm_handle_hva_range(s= truct mmu_notifier *mn, > > return __kvm_handle_hva_range(kvm, &range).ret; > > } > > > > -static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_no= tifier *mn, > > - unsigned long st= art, > > - unsigned long en= d, > > - gfn_handler_t ha= ndler) > > +static __always_inline int kvm_handle_hva_range_no_flush( > > + struct mmu_notifier *mn, > > + unsigned long start, > > + unsigned long end, > > + gfn_handler_t handler, > > + union kvm_mmu_notifier_arg arg, > > + bool lockless) > > { > > struct kvm *kvm =3D mmu_notifier_to_kvm(mn); > > const struct kvm_mmu_notifier_range range =3D { > > .start =3D start, > > .end =3D end, > > .handler =3D handler, > > + .arg =3D arg, > > .on_lock =3D (void *)kvm_null_fn, > > .flush_on_ret =3D false, > > .may_block =3D false, > > + .lockless =3D lockless, > > }; > > > > return __kvm_handle_hva_range(kvm, &range).ret; > > @@ -909,15 +920,36 @@ static int kvm_mmu_notifier_clear_flush_young(str= uct mmu_notifier *mn, > > kvm_age_gfn); > > } > > > > -static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, > > - struct mm_struct *mm, > > - unsigned long start, > > - unsigned long end, > > - unsigned long *bitmap) > > +static int kvm_mmu_notifier_test_clear_young(struct mmu_notifier *mn, > > + struct mm_struct *mm, > > + unsigned long start, > > + unsigned long end, > > + unsigned long *bitmap, > > + bool clear) > > Perhaps pass in the callback (kvm_test_age_gfn/kvm_age_gfn) instead of > true/false to avoid the naked booleans at the callsites? Will do. Thank you. > > > { > > - trace_kvm_age_hva(start, end); > > + if (kvm_arch_prepare_bitmap_age(mn)) { > > + struct test_clear_young_metadata args =3D { > > + .bitmap =3D bitmap, > > + .end =3D end, > > + .unreliable =3D false, > > + }; > > + union kvm_mmu_notifier_arg arg =3D { > > + .metadata =3D &args > > + }; > > + bool young; > > + > > + young =3D kvm_handle_hva_range_no_flush( > > + mn, start, end, > > + clear ? kvm_age_gfn : kvm_test_ag= e_gfn, > > + arg, true); > > I suspect the end result will be cleaner we make all architectures > lockless. i.e. Move the mmu_lock acquire/release into the > architecture-specific code. > > This could result in more acquire/release calls (one per memslot that > overlaps the provided range) but that should be a single memslot in the > majority of cases I think? > > Then unconditionally pass in the metadata structure. > > Then you don't need any special casing for the fast path / bitmap path. > The only thing needed is to figure out whether to return > MMU_NOTIFIER_YOUNG vs MMU_NOTIFIER_YOUNG_LOOK_AROUND and that can be > plumbed via test_clear_young_metadata or by changing gfn_handler_t to > return an int instead of a bool. Yes I think this simplification is a great idea. I agree that usually there will only be one memslot that overlaps a virtual address range in practice (MIN_LRU_BATCH is BITS_PER_LONG), so the theoretical additional locking/unlocking shouldn't be an issue. > > > + > > + kvm_arch_finish_bitmap_age(mn); > > > > - /* We don't support bitmaps. Don't test or clear anything. */ > > + if (!args.unreliable) > > + return young ? MMU_NOTIFIER_YOUNG_FAST : 0; > > + } > > + > > + /* A bitmap was passed but the architecture doesn't support bitma= ps */ > > if (bitmap) > > return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE; > > > > @@ -934,7 +966,21 @@ static int kvm_mmu_notifier_clear_young(struct mmu= _notifier *mn, > > * cadence. If we find this inaccurate, we might come up with a > > * more sophisticated heuristic later. > > */ > > - return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn)= ; > > + return kvm_handle_hva_range_no_flush( > > + mn, start, end, clear ? kvm_age_gfn : kvm_test_ag= e_gfn, > > + KVM_MMU_NOTIFIER_NO_ARG, false); > > Should this return MMU_NOTIFIER_YOUNG explicitly? This code is assuming > MMU_NOTIFIER_YOUNG =3D=3D (int)true. Yes. Thank you for all the feedback!