Received: by 2002:ab2:7903:0:b0:1fb:b500:807b with SMTP id a3csp1271536lqj; Mon, 3 Jun 2024 16:17:33 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUSrGnS+iT4UGIbjYTJrcwfWKQpK7Wl2XSQ9omnCQeeibWpTj2/kRb3VPntsdAUam6+5lrDi4mCpES5jkV2UmYxT52QZc2JJD74hertGw== X-Google-Smtp-Source: AGHT+IFauafXtV/nokmydrkyerjIftbtokaBlc2+dY2ZLV2c4eHffq11V+1x3azPkhkv1U4vUOaJ X-Received: by 2002:ae9:f703:0:b0:794:d169:3ef5 with SMTP id af79cd13be357-794f5c59df7mr1141309385a.8.1717456653012; Mon, 03 Jun 2024 16:17:33 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717456653; cv=pass; d=google.com; s=arc-20160816; b=Pi4fVExog9GkcoeB+BLKXcnpxWEpo2eC25wp5wcz8ipnhWU2818wVPKNc84he0fRvJ mAoChHS8xg6cdDl5p1A2UHhCv3YUZ1rJY0A2TTkwB0bJICaQsrSFohLvHRuvTNYCi8UX QE9vv61OpX6Mt5P5FcN/Cy+afhsxAGdtPXg6FCQbz6NXhM5JBeyC1sPx476mUR+O/6iD vj2DP/PFe2b7md/oIsdLC885jpCo4iIvs99uYnRGGj1fUrjVL9XygleaZlB03YSmtBQr rRuGUF/jhRVe/zv75Kqled9rW6gRxefarvDrd7xVSVKoRWzn7bU2krwO8rU2Xt7CoTsd 74jQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=KDykUGLfvL3M1lWs+oKStYngTbEIAr5rRwVrwRKKurY=; fh=ztsXVptRyx65NfAHd5JaRXJ3JoaL9kreJXClNoeUCuY=; b=kCkGtDCvy0/1HaK70yXyA+gO8KJbaxsoYzfNilFTI6CwgM3rP2qfYvRohPqnFxZD42 UNkeobXcxYiputQor/NmvGpriCYPqD7NC1NA0eTMuMxII/JVr0v4rXnTAvAfB7XZN7dQ O0D0QlJNx9Xn+7IfnpPGtACMB0O8czCQCTIrEUhaIy4WmT7OUXBMc52n44GqQgIqC85P 8IRm1mqnN7JT9KpGxpdigeCIe2uH9t+rsdM96fdkU9Gr7Dw/JOa+BIcR8LC3BHTEFuyo h+pqwoQwyLO6yahAu71rUMEMRJbvejWN17fo/wEY186PZP2aI18l5S5TPjofuu3RJcne HUIA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=TsCD3X39; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-199814-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-199814-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id af79cd13be357-794f3063025si155305185a.309.2024.06.03.16.17.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Jun 2024 16:17:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-199814-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=TsCD3X39; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-199814-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-199814-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id AE8851C21508 for ; Mon, 3 Jun 2024 23:17:32 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A329E137928; Mon, 3 Jun 2024 23:17:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="TsCD3X39" Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F3A1980056 for ; Mon, 3 Jun 2024 23:17:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717456640; cv=none; b=AHDGmh+Log+iuRmDJxqyP3L0qrqTZcKVzzVTiEII/9ayAVI3puTBvxi6D059GK5gEhgGz8XBz2JESZqDIoDQljsla3xHAahQU6o8Fa6Bou1YMfkLZs5ih8C9lt3w0fuzhn8PRihRNtSfObZbw+00xCM2xMwzPxRUUtmfgAFx6Lk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717456640; c=relaxed/simple; bh=4jxu/JsmVRXCfHV8AURZSjnWZ5s+7pAtAjdq2rTVOys=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=m2/PX2ycyP29hXbZS/EAlosc8kenQk2rBX7qDgIG/OwEWRGkK2cLFnZ/GUCx48Yg+GkD4paALQvnTNmbFr/wH4ZYuSYaY3h3/+OCZrUmeUT6hhOuboT33LdrMCQXDwSlwwB/2/nHAn0ehQNm5o5q8sRMEcFk52BfyRmjNuHRups= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=TsCD3X39; arc=none smtp.client-ip=209.85.160.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-43dfe020675so127171cf.0 for ; Mon, 03 Jun 2024 16:17:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717456638; x=1718061438; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=KDykUGLfvL3M1lWs+oKStYngTbEIAr5rRwVrwRKKurY=; b=TsCD3X39LP8tV3l7f2iR3mcypPK7mKBvNY7Yb1kno6zjCY+ahkEdZH+Tb3ZXeX8y9V QlTZqhb9FU2anXOZ4UKsKSq+hKW4yHQdoH3g3w4jpGi9YAGUKUlPlc8SJKzTk6kFFJ0i ofCiVWu5IVdLJDFyaYt9T1PGYzMxtTvOR7XzK4z5oIbM6V/SOm9hCqP+1GyLvrVnFTai M4pz06qrW9UEFY52eV61AHq9jmY4xIlYsH31lXecxPK+wpc4oMmR6cBaT2MorErS6uDB Co6PuvnRmwSKmuc/CKVTVXPKBlC61gzo6R++VZ4l99yA7xmQoLnUQI+rvzuJMJOfnXG+ 8G1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717456638; x=1718061438; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KDykUGLfvL3M1lWs+oKStYngTbEIAr5rRwVrwRKKurY=; b=VJIMK93504nDBDxkDuJgIgaXgCI6AV9i+2GR4qZGFb8wVrT6csMLsJUfTAQ/4Wex9i S1dvtoX8UHDa8b+vpdTM6wYTxqBAnGI/3mAOYdFzEM9oC99jhFBOAN/kETRlX452Ckf0 C8o0Cp/3ZEOZzk9CNQDDsUE8zbh8HZePy0Fgw8OFVSBuxS2jQKDGLf84jYt93ECTU6B1 9ti17/Us7ZhpbF9ol2SQWMSYZpmmgNfeTvRcfgeYTgY4NloYClO7H4P4OA2Tt+DKQ2vA kRMei3kPuPTPEXvAAUQNZrjCLZz+76Vy0rNOrct0UZd0wkAT6j2AoU1U7yjRPdWwcSNW 4aWg== X-Forwarded-Encrypted: i=1; AJvYcCVeDXQSAsg2NtDevKlJiZUqNfih9e1XTgMPeVfM1prRRj0pPrszmFQkuymC1b3Flf0wDW5bg4Eqh4cO6CbwWUIzgTIyCRqeUVlxkKYm X-Gm-Message-State: AOJu0YzIMiu9PFS2uwG760QnHBccYKZjQiC6Qll9LUpgHEvWTAgTBGNP ad4p/nlUX4kNi4RJmS0ukwbBIG8iGvobyBO3Hw5b1JZKjjGmTty9ts3BNCPtVdcmB9rJMCY3Tv2 phdreuPj3reWEqXrXrb3CbASJVIXrJ+COz+8i X-Received: by 2002:a05:622a:4ccc:b0:43a:aa3f:917a with SMTP id d75a77b69052e-4401e68c145mr1144381cf.27.1717456637692; Mon, 03 Jun 2024 16:17:17 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> <20240529180510.2295118-3-jthoughton@google.com> In-Reply-To: From: James Houghton Date: Mon, 3 Jun 2024 16:16:41 -0700 Message-ID: Subject: Re: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging To: Sean Christopherson Cc: Yu Zhao , Andrew Morton , Paolo Bonzini , Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Matlack , David Rientjes , Huacai Chen , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Oliver Upton , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Jun 3, 2024 at 4:03=E2=80=AFPM Sean Christopherson wrote: > > On Mon, Jun 03, 2024, James Houghton wrote: > > On Thu, May 30, 2024 at 11:06=E2=80=AFPM Yu Zhao wr= ote: > > > What I don't think is acceptable is simplifying those optimizations > > > out without documenting your justifications (I would even call it a > > > design change, rather than simplification, from v3 to v4). > > > > I'll put back something similar to what you had before (like a > > test_clear_young() with a "fast" parameter instead of "bitmap"). I > > like the idea of having a new mmu notifier, like > > fast_test_clear_young(), while leaving test_young() and clear_young() > > unchanged (where "fast" means "prioritize speed over accuracy"). > > Those two statements are contradicting each other, aren't they? I guess it depends on how you define "similar". :) > Anyways, I vote > for a "fast only" variant, e.g. test_clear_young_fast_only() or so. gup(= ) has > already established that terminology in mm/, so hopefully it would be fam= iliar > to readers. We could pass a param, but then the MGLRU code would likely = end up > doing a bunch of useless indirect calls into secondary MMUs, whereas a de= dicated > hook allows implementations to nullify the pointer if the API isn't suppo= rted > for whatever reason. > > And pulling in Oliver's comments about locking, I think it's important th= at the > mmu_notifier API express it's requirement that the operation be "fast", n= ot that > it be lockless. E.g. if a secondary MMU can guarantee that a lock will b= e > contented only in rare, slow cases, then taking a lock is a-ok. Or a sec= ondary > MMU could do try-lock and bail if the lock is contended. > > That way KVM can honor the intent of the API with an implementation that = works > best for KVM _and_ for MGRLU. I'm sure there will be future adjustments = and fixes, > but that's just more motivation for using something like "fast only" inst= ead of > "lockless". Yes, thanks, this is exactly what I meant. I really should have "only" in the name to signify that it is a requirement that it be fast. Thanks for wording it so clearly. > > > > > I made this logic change as part of removing batching. > > > > > > > > I'd really appreciate guidance on what the correct thing to do is. > > > > > > > > In my mind, what would work great is: by default, do aging exactly > > > > when KVM can do it locklessly, and then have a Kconfig to always ha= ve > > > > MGLRU to do aging with KVM if a user really cares about proactive > > > > reclaim (when the feature bit is set). The selftest can check the > > > > Kconfig + feature bit to know for sure if aging will be done. > > > > > > I still don't see how that Kconfig helps. Or why the new static branc= h > > > isn't enough? > > > > Without a special Kconfig, the feature bit just tells us that aging > > with KVM is possible, not that it will necessarily be done. For the > > self-test, it'd be good to know exactly when aging is being done or > > not, so having a Kconfig like LRU_GEN_ALWAYS_WALK_SECONDARY_MMU would > > help make the self-test set the right expectations for aging. > > > > The Kconfig would also allow a user to know that, no matter what, > > we're going to get correct age data for VMs, even if, say, we're using > > the shadow MMU. > > Heh, unless KVM flushes, you won't get "correct" age data. > > > This is somewhat important for me/Google Cloud. Is that reasonable? May= be > > there's a better solution. > > Hmm, no? There's no reason to use a Kconfig, e.g. if we _really_ want to= prioritize > accuracy over speed, then a KVM (x86?) module param to have KVM walk nest= ed TDP > page tables would give us what we want. > > But before we do that, I think we need to perform due dilegence (or provi= de data) > showing that having KVM take mmu_lock for write in the "fast only" API pr= ovides > better total behavior. I.e. that the additional accuracy is indeed worth= the cost. That sounds good to me. I'll drop the Kconfig. I'm not really sure what to do about the self-test, but that's not really all that important.