Received: by 2002:a05:6500:1b8f:b0:1fa:5c73:8e2d with SMTP id df15csp995336lqb; Wed, 29 May 2024 18:09:03 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWr0EX7Tu5axhZyVZZZEIKaAqnlF8rfs+q3hUjdm/MMYb+gX9MqUf3YvnUKmJUFS8l8ZBoP5WhT1C8TsIFiuvL7cPjciYqZ0hwmgZUJIQ== X-Google-Smtp-Source: AGHT+IF1iAASBcC3x0Vy8GYfTqQzpT/uZHXwqzqFI/l0kGVSkwRoQZd4jcIxL5vC3xycbLQ8JRDC X-Received: by 2002:a50:d4d3:0:b0:578:55f5:1987 with SMTP id 4fb4d7f45d1cf-57a177a73bemr363715a12.13.1717031343562; Wed, 29 May 2024 18:09:03 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717031343; cv=pass; d=google.com; s=arc-20160816; b=WA1WD/t66jg8kOPdVOLi5mOvYZOVmhUvECicz66hVvFSlCkxJ4k+Kn0T7l1C2woon8 r3MIYMFcjRPCELx7hUwn68wYQjuvG5WcPbF13XFkWS85cLBLGWd4LYt2xk3yldtiF72N 1SNz8QXYnqYY0VF5AgjYLJDoLGAZnuFxlOnJ9lBo5W/QhP5Urj7SJM2pv0NJfBPWqFzb 0JuORvcOL8C9zyI6fAmnRlnrh8Yzr9CXTyVjFx/vvEWcuEN/1BVNn7n1qjhCEo6wGJ4W 690WTLN6g3f+sg139R0g8M+v9iW0eDeWAZvC2pEUQ5WIvZvsdE5sZKcWMrQFDPqG2lFj BeKQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=RNCYiTPJ+jmwqf05JMT+AdSr7J72TbIz2CKFhve3Jy0=; fh=hsHGgoTxvY9IHIVkzHZkyZFdvWYtlj95a8l1d8zkDT8=; b=mo0K8pnjxiWknset5WBvDYI8qWoj/31CvC4WBVTmOEm6iJ1Guls2wVyK7YLRzlWn8V 6LLYV8t4cf8svfF+gYIukNUiKZDoYLQwA5INgFYrNbtCyxFTq5JWoXY/TVgmigTj2+tU Fm0udu5R615DvBCsNX2HQw31GLiGnbvY0e6FtvYRrZkV2rnMOEHL8GzdNi3YXMxY99+X MpgH+IITVXvqrpm8IdKp+QEVuBin4DNGjezOBybWSXtxKK1dYYb3yA2FWFLCxsPDwxc0 vTwfAoIUYtBAPweqyBVCXyUBFAr1d/OR2/tgOXoidScsQmK3iLwOQq6O/bBf1r3PgIZ2 u2BA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=0CxcI6FD; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-194836-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-194836-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id 4fb4d7f45d1cf-57a1b128605si185711a12.454.2024.05.29.18.09.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 May 2024 18:09:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-194836-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=0CxcI6FD; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-194836-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-194836-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 2283B1F23792 for ; Thu, 30 May 2024 01:09:03 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4038517722; Thu, 30 May 2024 01:08:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="0CxcI6FD" Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 85C9BFC18 for ; Thu, 30 May 2024 01:08:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717031327; cv=none; b=NKhmqoETAO9AjNs8cQ+e5GGHU7OkNMkEhGZHImi2BJjmYtvsMhh2OgvrpQf9LBrhMIdgjPZL5mxjCCJ/scV+j74JAnJ4MRWE0gVQbm9nxEgIYJtdLKq69Tkqb9ukKb+j3BnP8MRkqOoOtsT4YPIO0XPkLczLfLaA4iZC7WunGf0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717031327; c=relaxed/simple; bh=HUAN0/h1uUnwVHvw5+gjjvN2wmVD3N2hYhH/W+fSmng=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=tpjUns5GN0k027J71G/CByflM/nxQsukblVZ+9ykzqnj5fh3mXtfpKgw9ECUGnTltx1oqsFwRzTRXuAsFK6YkqODpoZLfhvsI7R9fivG6PTG9YA2AKz23DK84+cCBrDs2kIxjfCMIY0bps6GzgqqhIcxujNVMUPZHbVycziE4Wk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=0CxcI6FD; arc=none smtp.client-ip=209.85.160.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-43f87dd6866so167871cf.0 for ; Wed, 29 May 2024 18:08:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717031324; x=1717636124; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=RNCYiTPJ+jmwqf05JMT+AdSr7J72TbIz2CKFhve3Jy0=; b=0CxcI6FDl7jFDQxcpUYxZt5vBPgkc6u9I8kDHvOF1t4nxhy5YdkqiDc92+E1QFjBKt YCnld08mSd/EPuEkf9cN83p9ZGJJdSW1vvAqiO/uDlNSvEm0FQt2Jy2+4B9UXiqUcGEJ OMmxgGvT/L6gqZ+1JZL5PvL6gN2g0y/Kmaeikr1QToKJV8EDaLiGXPfNpYax2gnxj1yJ srEhKv09Q2KyuZNzA3YxHQ/f5WAHQYjlh24DdofT0tWdgnqQb0JPldGS8QQrXG7IqGQJ wUN35PyD3tV/2alPCcwAPx4CXGaiPkWVQyS5BbJM15JNAZk9NnU+TDEmQCLy9iKPRlmS u8dA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717031324; x=1717636124; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RNCYiTPJ+jmwqf05JMT+AdSr7J72TbIz2CKFhve3Jy0=; b=JCBdlGEdJCGAIZEyfTjk+MaXKxotOBGJix/WkyNGozjbI4zemzIfAErXcey2jSu3SK Cmp6mNFOoRRl0v2DVy9tHDb3udLXDBgR1n+lDMsksx37KoUV4Us/czZOTEy9/af4irsP 62IqDw2iTC4+w6Ry37KTeAApqtwGIWpYnzvb9LAmTsTctheYXX6/dfbjPtXEAMnJFCTY VV7fL9dFJnA70xxh5vySEjUW6cbbpIwruuZjstWmJW+tIw4khREWX8Bdf/ar7m5Hz7T7 JuD6RoR4RHMhRT6HARr1O0Vmedvyl3gFGdoUbMyVZSR1LMGNSjC140nS7vC5DuI5a+cx C1jg== X-Forwarded-Encrypted: i=1; AJvYcCXEzRfCIHamHV+1WtmY7YfRz0AKDcai+nJA/5PD9v8BhIiaWphLuZVDba1gx3yvLHqArttnzLGcTwrVnf7+lTinu1IrUoBRYSCbWioh X-Gm-Message-State: AOJu0Yzj2l4wajWWGu0hJLmCFkVxJj/iqANZ5WdxE9xBHlvJ16KTgoLh uFb6kBnPdmtUFvvAWv+v1RixJgsAZPwCLU41/mrEmaAHrHeaCBN5BD3FeL4ETTgDHHwoGGm/Lww DeUJNJu1MDjJ5xbMLM8TMp55AQHPJquoopFOt X-Received: by 2002:a05:622a:5509:b0:43b:6b6:8cad with SMTP id d75a77b69052e-43fe8e0b1dfmr1268341cf.10.1717031323987; Wed, 29 May 2024 18:08:43 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> <20240529180510.2295118-3-jthoughton@google.com> In-Reply-To: From: James Houghton Date: Wed, 29 May 2024 18:08:06 -0700 Message-ID: Subject: Re: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging To: Sean Christopherson Cc: Yu Zhao , Andrew Morton , Paolo Bonzini , Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Matlack , David Rientjes , Huacai Chen , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Oliver Upton , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, May 29, 2024 at 3:58=E2=80=AFPM Sean Christopherson wrote: > > On Wed, May 29, 2024, Yu Zhao wrote: > > On Wed, May 29, 2024 at 3:59=E2=80=AFPM Sean Christopherson wrote: > > > > > > On Wed, May 29, 2024, Yu Zhao wrote: > > > > On Wed, May 29, 2024 at 12:05=E2=80=AFPM James Houghton wrote: > > > > > > > > > > Secondary MMUs are currently consulted for access/age information= at > > > > > eviction time, but before then, we don't get accurate age informa= tion. > > > > > That is, pages that are mostly accessed through a secondary MMU (= like > > > > > guest memory, used by KVM) will always just proceed down to the o= ldest > > > > > generation, and then at eviction time, if KVM reports the page to= be > > > > > young, the page will be activated/promoted back to the youngest > > > > > generation. > > > > > > > > Correct, and as I explained offline, this is the only reasonable > > > > behavior if we can't locklessly walk secondary MMUs. > > > > > > > > Just for the record, the (crude) analogy I used was: > > > > Imagine a large room with many bills ($1, $5, $10, ...) on the floo= r, > > > > but you are only allowed to pick up 10 of them (and put them in you= r > > > > pocket). A smart move would be to survey the room *first and then* > > > > pick up the largest ones. But if you are carrying a 500 lbs backpac= k, > > > > you would just want to pick up whichever that's in front of you rat= her > > > > than walk the entire room. > > > > > > > > MGLRU should only scan (or lookaround) secondary MMUs if it can be > > > > done lockless. Otherwise, it should just fall back to the existing > > > > approach, which existed in previous versions but is removed in this > > > > version. > > > > > > IIUC, by "existing approach" you mean completely ignore secondary MMU= s that > > > don't implement a lockless walk? > > > > No, the existing approach only checks secondary MMUs for LRU folios, > > i.e., those at the end of the LRU list. It might not find the best > > candidates (the coldest ones) on the entire list, but it doesn't pay > > as much for the locking. MGLRU can *optionally* scan MMUs (secondary > > included) to find the best candidates, but it can only be a win if the > > scanning incurs a relatively low overhead, e.g., done locklessly for > > the secondary MMU. IOW, this is a balance between the cost of > > reclaiming not-so-cold (warm) folios and that of finding the coldest > > folios. > > Gotcha. > > I tend to agree with Yu, driving the behavior via a Kconfig may generate = simpler > _code_, but I think it increases the overall system complexity. E.g. dis= tros > will likely enable the Kconfig, and in my experience people using KVM wit= h a > distro kernel usually aren't kernel experts, i.e. likely won't know that = there's > even a decision to be made, let alone be able to make an informed decisio= n. > > Having an mmu_notifier hook that is conditionally implemented doesn't see= m overly > complex, e.g. even if there's a runtime aspect at play, it'd be easy enou= gh for > KVM to nullify its mmu_notifier hook during initialization. The hardest = part is > likely going to be figuring out the threshold for how much overhead is to= o much. Hi Yu, Sean, Perhaps I "simplified" this bit of the series a little bit too much. Being able to opportunistically do aging with KVM (even without setting the Kconfig) is valuable. IIUC, we have the following possibilities: - v4: aging with KVM is done if the new Kconfig is set. - v3: aging with KVM is always done. - v2: aging with KVM is done when the architecture reports that it can probably be done locklessly, set at KVM MMU init time. - Another possibility?: aging with KVM is only done exactly when it can be done locklessly (i.e., mmu_notifier_test/clear_young() called such that it will not grab any locks). I like the v4 approach because: 1. We can choose whether or not to do aging with KVM no matter what architecture we're using (without requiring userspace be aware to disable the feature at runtime with sysfs to avoid regressing performance if they don't care about proactive reclaim). 2. If we check the new feature bit (0x8) in sysfs, we can know for sure if aging is meant to be working or not. The selftest changes I made won't work properly unless there is a way to be sure that aging is working with KVM. For look-around at eviction time: - v4: done if the main mm PTE was young and no MMU notifiers are subscribed= . - v2/v3: done if the main mm PTE was young or (the SPTE was young and the MMU notifier was lockless/fast). I made this logic change as part of removing batching. I'd really appreciate guidance on what the correct thing to do is. In my mind, what would work great is: by default, do aging exactly when KVM can do it locklessly, and then have a Kconfig to always have MGLRU to do aging with KVM if a user really cares about proactive reclaim (when the feature bit is set). The selftest can check the Kconfig + feature bit to know for sure if aging will be done. I'm not sure what the exact right thing to do for look-around is. Thanks for the quick feedback.