Received: by 2002:ab2:7903:0:b0:1fb:b500:807b with SMTP id a3csp1264241lqj; Mon, 3 Jun 2024 16:03:50 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXM3MgzfC31N7F2g+6EWJ1aDfj9+76jXP2k8tEViXkFC+gTNFwCslBm+QGzL0n1CZUzl9sNyC9F8D6KDa3xkyNvr4+/LwLbMvZKirrR7w== X-Google-Smtp-Source: AGHT+IEdok/lga33JKnPxnOyX7NsIifVhW/+ORH5Ky/LBXwSxLDRU7jwp3PNyREFYPrb73umF9os X-Received: by 2002:a05:6a00:4b47:b0:6ed:5f64:2fef with SMTP id d2e1a72fcca58-7024780a82emr10993098b3a.17.1717455830281; Mon, 03 Jun 2024 16:03:50 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717455830; cv=pass; d=google.com; s=arc-20160816; b=yXVm0DhScDmv4uk0ezu2UENn5jgTS6rXuZdtcno6FLRNU5ha2/HqNNHwhTrAtlpmPJ Qibpuq/X4+HJPY0z1TF/1xiCfXknirhllcsr7VncYKCVQH4DbkScxsoVOxjnU+kdYpWX ppoZ4qG/6hNTyB9KOoOybInquNWz3AVFMecJd2t3+ekX/cZ6aUEHx/D5Lw+UOWREECjs zVD/ckRrPRngfYqgMXAd7UMmbF/cgAXfJImg+G+IrPE3nh0ZHEaFp3LL7LK3DWSUFStg v6JVqx4KyZ2978cTBHcV3hoQ69ewaALh5xfWeKO7TBtb5+H3Ry20Vg+c9HldBlf/RPai poMw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :in-reply-to:date:dkim-signature; bh=w0SJTxFtK/1G56YdHzcPYdHV7zjy1ZVa0L4hU78kSrI=; fh=+cFydIxrrBozqYFYljd5KcQz3VTaK4/00RC9y8OPzLY=; b=iuxCuPE5ogveQfA/EPD5+I1Zjbj1etU02MmaI28V/WHHQNy6xc6hC6GtzUXA7PCm/D vXmhL8vBNOJcrO9hTWuGTYxyu7g6oi0mmZ4jjVtIRhjMuY9JVnGTO666jX7bo3H0q1ef DWNiUhRFMdeWCc61DLtEyat3f4+8wpW/Uuz8LC5L5BSRvfy4JuNCFOUDjfaN18/maxkg nsh4w0KvtAxcSYKikygSjjQKHb5Lzl5G6UKx+NxBBbEzVaVxi9rh9WHGDUDpTOTWvB6t a8ezrfA3rv+/Q0n8jsQfAQB+Yv0VqV1jnXKnQqcQlAOQTbpFmxba6LtzBcwch/P0b5qU bNTg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=XIgrM0kW; arc=pass (i=1 spf=pass spfdomain=flex--seanjc.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-199806-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-199806-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id d2e1a72fcca58-70248c5a005si6696729b3a.115.2024.06.03.16.03.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Jun 2024 16:03:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-199806-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=XIgrM0kW; arc=pass (i=1 spf=pass spfdomain=flex--seanjc.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-199806-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-199806-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 77809285588 for ; Mon, 3 Jun 2024 23:03:26 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 053B513D251; Mon, 3 Jun 2024 23:03:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="XIgrM0kW" Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F2C413CF8A for ; Mon, 3 Jun 2024 23:03:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717455790; cv=none; b=N8jJ6DxlXN3Gy7vj33QTICQBmu6OIOZnnU/60EDoZDR2ESONMAqaN9AOdNRo3hCL3NsIKZ4ze3A7Qua1Hs3KyDV6gtnRqU5QP//2K58iTC6XwvmDOL0PmPxQlsTV4CidouvOXPs3SXFVwxJm4ralTwZ4saCjxUg39+zyW1nbXvw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717455790; c=relaxed/simple; bh=5+qBTnQ2jcGHCKlfoNDyIJ+P5xpxSFVMeOuIzIxDers=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=sJuC95tkmY8qXQaK496NJj0lfY4QmvUdtDlWH17uAzdjoXzNqTlSAw7KSiRGtaHIbnt2/VpwM4Jqn/fo8iXdfTOcEFyNVFhmzPu4FuYkyY8fCIsGhRtS8K55PU0yvCeEKj/jJq19agT2cN55I7PH6v0JYPFIEtqd1Eqef7XuPXk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=XIgrM0kW; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-1f6582eca2bso20013765ad.1 for ; Mon, 03 Jun 2024 16:03:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717455788; x=1718060588; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=w0SJTxFtK/1G56YdHzcPYdHV7zjy1ZVa0L4hU78kSrI=; b=XIgrM0kWAEj+URv/Dne3kB27vhcsJRugxnvS3o91CPtO/pwBHFAS2R1dTPRPSWyaGz 3oq5a6dayl7peIE5ODohYlC+eEh305/IZr/SO2soaJOCDIDRbzlh2CgGQYUUAGN+OhsK rHNB7puHXCKRUZR8S3+iFQYl/CRP+J5AnOC6rn2aamm8YJeY3JM6UnCFzB6Bksevkwzt uPqZfjdCVEklBbGKzx5n1g2+ZyrGgdp6w9oZ0Ki42tAq5H38XzjCZuf3A1sqGMReBf2I LjG8aat0MOiBKhGs3SiXz4GD1/C5ALgFFDOvScIUL9Jl6PnstWwUVHBbgDSFYFBSNsVJ +upw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717455788; x=1718060588; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=w0SJTxFtK/1G56YdHzcPYdHV7zjy1ZVa0L4hU78kSrI=; b=vHuHXDXT29XIMMGKQBh+t/EgVxKUsfM/Anwxh6+IMYjL4uUkpgl6YkN/T5tFP2WVxc ReBLva4T3rnDncZih53aZ18ykMTHM0YNb+Dl7JBKly3w8qB+QBlgwlW6v/z6WVxpcVNZ dXce25wHtrqnW9iFx1mSsebs8UMJfCaTxjNMG+n/Zzv1EvGSr8MD5dMeMQyRSnVQ1xKb PtQfxTuCxzx5LDvDr/nAptQ536DtDh2sK8q5UQz67eHJtEr/ioIMGVPalJNJk1MILUws kkiUYpAJbXUu6G+lixnVB8rGTqgnsQ5JU17HGjdfdOs1I2Tb02VYM5ubB8B9nOhsDds3 ar0g== X-Forwarded-Encrypted: i=1; AJvYcCWAuASCgPJQAX/U7p7NFtR900z7X5bnjdDmlc1PVTKs9eWtUTq3hxOWw4j80HdGTIWo6MPCDh12JUsaRRB+UnmaolHovk7IegEEY+De X-Gm-Message-State: AOJu0YximmBNdInxE98Fx/JHJBuL7YCS+4k1bRIl0uMY3tULrHBB2hMM 5Dtcnmhgvz6Qn9+2QD9sn0dGV0SbFfpnHW5NfEHVqXWIW3TA7f6ATRkQqfiWDfQthF9o7LhV3va KUw== X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:ea05:b0:1f6:3891:794a with SMTP id d9443c01a7336-1f638917b67mr7110545ad.10.1717455787406; Mon, 03 Jun 2024 16:03:07 -0700 (PDT) Date: Mon, 3 Jun 2024 16:03:05 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> <20240529180510.2295118-3-jthoughton@google.com> Message-ID: Subject: Re: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging From: Sean Christopherson To: James Houghton Cc: Yu Zhao , Andrew Morton , Paolo Bonzini , Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Matlack , David Rientjes , Huacai Chen , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Oliver Upton , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Mon, Jun 03, 2024, James Houghton wrote: > On Thu, May 30, 2024 at 11:06=E2=80=AFPM Yu Zhao wrot= e: > > What I don't think is acceptable is simplifying those optimizations > > out without documenting your justifications (I would even call it a > > design change, rather than simplification, from v3 to v4). >=20 > I'll put back something similar to what you had before (like a > test_clear_young() with a "fast" parameter instead of "bitmap"). I > like the idea of having a new mmu notifier, like > fast_test_clear_young(), while leaving test_young() and clear_young() > unchanged (where "fast" means "prioritize speed over accuracy"). Those two statements are contradicting each other, aren't they? Anyways, I= vote for a "fast only" variant, e.g. test_clear_young_fast_only() or so. gup() = has already established that terminology in mm/, so hopefully it would be famil= iar to readers. We could pass a param, but then the MGLRU code would likely en= d up doing a bunch of useless indirect calls into secondary MMUs, whereas a dedi= cated hook allows implementations to nullify the pointer if the API isn't support= ed for whatever reason. And pulling in Oliver's comments about locking, I think it's important that= the mmu_notifier API express it's requirement that the operation be "fast", not= that it be lockless. E.g. if a secondary MMU can guarantee that a lock will be contented only in rare, slow cases, then taking a lock is a-ok. Or a secon= dary MMU could do try-lock and bail if the lock is contended. That way KVM can honor the intent of the API with an implementation that wo= rks best for KVM _and_ for MGRLU. I'm sure there will be future adjustments an= d fixes, but that's just more motivation for using something like "fast only" instea= d of "lockless". > > > I made this logic change as part of removing batching. > > > > > > I'd really appreciate guidance on what the correct thing to do is. > > > > > > In my mind, what would work great is: by default, do aging exactly > > > when KVM can do it locklessly, and then have a Kconfig to always have > > > MGLRU to do aging with KVM if a user really cares about proactive > > > reclaim (when the feature bit is set). The selftest can check the > > > Kconfig + feature bit to know for sure if aging will be done. > > > > I still don't see how that Kconfig helps. Or why the new static branch > > isn't enough? >=20 > Without a special Kconfig, the feature bit just tells us that aging > with KVM is possible, not that it will necessarily be done. For the > self-test, it'd be good to know exactly when aging is being done or > not, so having a Kconfig like LRU_GEN_ALWAYS_WALK_SECONDARY_MMU would > help make the self-test set the right expectations for aging. >=20 > The Kconfig would also allow a user to know that, no matter what, > we're going to get correct age data for VMs, even if, say, we're using > the shadow MMU. Heh, unless KVM flushes, you won't get "correct" age data. > This is somewhat important for me/Google Cloud. Is that reasonable? Maybe > there's a better solution. Hmm, no? There's no reason to use a Kconfig, e.g. if we _really_ want to p= rioritize accuracy over speed, then a KVM (x86?) module param to have KVM walk nested= TDP page tables would give us what we want. But before we do that, I think we need to perform due dilegence (or provide= data) showing that having KVM take mmu_lock for write in the "fast only" API prov= ides better total behavior. I.e. that the additional accuracy is indeed worth t= he cost.