Received: by 2002:a05:6500:2018:b0:1fb:9675:f89d with SMTP id t24csp711325lqh; Fri, 31 May 2024 14:10:37 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWscP4NfYCHr9o1QiqwHYPDn8H8ZzR45YyXAAsttVwWVxsOM1gvnfEFwNlMt+3l/sTqvo/bz1ab4F90gW5DXmzVcD5LWKyfxGS/V7P21w== X-Google-Smtp-Source: AGHT+IHoytPyuQGeS2je75DKuGXy88HaqE486EWDuYOWr9at7neKGZ3gYmflivVLEwjD/dsVhBnw X-Received: by 2002:a05:6358:310a:b0:199:4222:f949 with SMTP id e5c5f4694b2df-19b48e931admr357968555d.17.1717189837145; Fri, 31 May 2024 14:10:37 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717189837; cv=pass; d=google.com; s=arc-20160816; b=mVFDWhsVnmfjyop0UT55GR+zqZHew9L9r148cepgouQ/ai2OqSm7UoLMLy7deyHO1U UxagZ+WFha6Ij7tLmlqR9mIbJoWDBh/Q7w+6AHAAfJCe60mG7GynSH/vwYVyz8Gz3qhb ayt6BG0BYZ4K7+hRMm0xBrZPJVLK/dAqVOP0wiolJLIlYWSxt1YluuhjCuzc7gqYd/g4 JgK1Ria089XdriORfI3dPTJH1FPGALSVbvsnrC4LcvhwFZuFCfTJTORFaHzhVA3dlSei sHImEZl0CbSEzVuTJKqa/QSBoecFZIH7+rjXctm1NpgGXqVzLUq6nulrN3z+XtWJBN6J ZM+Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=N2tq10nFX29kQGqNAS352osxyUaVdPznDkZxoHpHfa8=; fh=JA3kbKxEmOM05VLDoh+voZRr6yagxJf0bh+fTVaJJgc=; b=m16QJZZQ/a5MsOft5m7x+mPWKzkaSCLxdgNi84I760dG2N94cGgiCGYOUaTmQEo3TE FBF8uo8JuTEuEoCxbz7//3K+7zTnYvIcY1UVthNF/l4VwWaKnFhY9UsBh84wb8xg+r2z +QXLVU/Mr4RxTUIsuw1hFH520VLCFA/KB8sCgs+dTT/1MLc/PXA/HPdDzsNNm32qriMw euDL1exUUYECC1weVmWomLON9DpokY4rU7IfgotwDVZGjcNja3GBlfWfFAWiiVd6fAsH mOl536sikcb1ELQMNm7LgkGd4e20nxA2XHfLO7JllO6iEwONsVnz4tK9IO7GXgIcBZMs mb2w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=O8M6IzaP; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-197486-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-197486-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id af79cd13be357-794f32ba266si337820285a.759.2024.05.31.14.10.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 31 May 2024 14:10:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-197486-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=O8M6IzaP; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-197486-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-197486-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id BD9CB1C209F8 for ; Fri, 31 May 2024 21:10:32 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1CE6579B9D; Fri, 31 May 2024 21:10:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="O8M6IzaP" Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5AE8774C08 for ; Fri, 31 May 2024 21:10:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.44 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717189819; cv=none; b=g8QeOvQkTntQwbP5Jk/Yo8jt5/jVH1vMElEehxKShpDeyX3zZBlkNWhPdgYkvdzKQ1WkUFc2HG7vFjV1y0H7RvXvKWJnh+Lv1yjMYeTUk/cCQDnkrrrQ3ZXzDk30V8NAI2/kH2pWlG07+JG6PKLe2pPgyVItbK1G7/7k6WZBOac= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717189819; c=relaxed/simple; bh=N2tq10nFX29kQGqNAS352osxyUaVdPznDkZxoHpHfa8=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=BjdmTyGhdr6sRpyPDJY9rxet7RXtf69vB9wJOt9urKnHdgm1qHiLDRMKoVhUZIwvEKvOelcI/A2wuy+PhgDnOXvmmabDXsiT6cXVbRaVKnWMsDnHrY9VkuTVDZboU+/R3gUDI3q3Ki7tlaiQTBNsVKTmNr3R6M0LtxSGoqHCrts= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=O8M6IzaP; arc=none smtp.client-ip=209.85.221.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-35b6467754cso1674853f8f.3 for ; Fri, 31 May 2024 14:10:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717189816; x=1717794616; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=N2tq10nFX29kQGqNAS352osxyUaVdPznDkZxoHpHfa8=; b=O8M6IzaPjVWsA+eDbKxrySY44vdZf+M/85NzSyUpFnm5+YqPLHFOaxaob4A3rjv2Wv oSWuoVsaXgAVjAw1QqEenIFsKXN3SaBY94b+i836JD2yBdkqKLqOycEFvQzI8xG7O9mD Sa39s/7iCQW5wpIB2pdPta4f0SGhfkpCes4nw2Vjs2Vk7mtKsk4VsKnuHcrmMu7Vaofs zA8v8FIoIsB1Ob25WniflRRHirPKEijTOa5X41VroCq894JjUQLu6tnmSChakYV5c4Ud V73S5jeqG6tMjtCoWw+9VfVomXEmBACl3Ww06sGSwpjwwp8XKXbTsdsPGw8iZNqIVs9G OkhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717189816; x=1717794616; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N2tq10nFX29kQGqNAS352osxyUaVdPznDkZxoHpHfa8=; b=kIydaAbbDL5vXG52Aqv1QpsLCg+3yU7bdiiOyf36hHsqbV8E+r5Iv8tZvEyKYHyK1Y DnE6V7Xj/vu5ubtS7ZHZkI/brAvGM1aQu0l6w/p8uBALfGG4qLPAbiLobCiUHfi1QviD SDHNalgErqZHNprEBm4y8E+0vrF0Ir8bW/bHAB/M7fWvZW0UB/jg1h3jt2Xu4JhzTv2U nj3/m2hzQv2HtivrzQSQdMyQ81167Y2lZ4x/edgc/ebe1v9WZ0MJOY5o+RixpgAT7q7x Xfq52c7XTkUZiZR8t9SEwnXrKY5XvogA2RSaKU1tXNQUEllhyDznbc6EV/Z89ECJ0L3R U4Jg== X-Forwarded-Encrypted: i=1; AJvYcCVhGXQpbgZSnt3/n5/44HVZAN5S/BwJnWHM4z6rOYQS2L75xiec2xi2l8FfzgFAiPryqRl21AWHci/aIELmh1PpnjeeVUAhvJua4Fri X-Gm-Message-State: AOJu0YycKiBGBl9q0IvtuJIVXG+n8pSEy2NezbkwnYUAlLSaaw8tinB7 hFb70s4wZmEgid18587DwyWv0msk1j0v+oX4Yc55i6Twk3QfpSiF7ffa+2UujKcMRw/sg3yItPt X9V9vpFwTmrU+pFRy1ZwC5rQT9a5oxMagoMqE X-Received: by 2002:a05:6000:1b09:b0:357:ca29:f1ca with SMTP id ffacd0b85a97d-35e0f2869c8mr2259497f8f.32.1717189815401; Fri, 31 May 2024 14:10:15 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> <20240529180510.2295118-3-jthoughton@google.com> In-Reply-To: From: David Matlack Date: Fri, 31 May 2024 14:09:49 -0700 Message-ID: Subject: Re: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging To: Yu Zhao Cc: Oliver Upton , James Houghton , Andrew Morton , Paolo Bonzini , Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Rientjes , Huacai Chen , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, May 31, 2024 at 2:06=E2=80=AFPM David Matlack = wrote: > > On Fri, May 31, 2024 at 1:31=E2=80=AFPM Yu Zhao wrote= : > > > > On Fri, May 31, 2024 at 1:24=E2=80=AFAM Oliver Upton wrote: > > > > > > On Wed, May 29, 2024 at 03:03:21PM -0600, Yu Zhao wrote: > > > > On Wed, May 29, 2024 at 12:05=E2=80=AFPM James Houghton wrote: > > > > > > > > > > Secondary MMUs are currently consulted for access/age information= at > > > > > eviction time, but before then, we don't get accurate age informa= tion. > > > > > That is, pages that are mostly accessed through a secondary MMU (= like > > > > > guest memory, used by KVM) will always just proceed down to the o= ldest > > > > > generation, and then at eviction time, if KVM reports the page to= be > > > > > young, the page will be activated/promoted back to the youngest > > > > > generation. > > > > > > > > Correct, and as I explained offline, this is the only reasonable > > > > behavior if we can't locklessly walk secondary MMUs. > > > > > > > > Just for the record, the (crude) analogy I used was: > > > > Imagine a large room with many bills ($1, $5, $10, ...) on the floo= r, > > > > but you are only allowed to pick up 10 of them (and put them in you= r > > > > pocket). A smart move would be to survey the room *first and then* > > > > pick up the largest ones. But if you are carrying a 500 lbs backpac= k, > > > > you would just want to pick up whichever that's in front of you rat= her > > > > than walk the entire room. > > > > > > > > MGLRU should only scan (or lookaround) secondary MMUs if it can be > > > > done lockless. Otherwise, it should just fall back to the existing > > > > approach, which existed in previous versions but is removed in this > > > > version. > > > > > > Grabbing the MMU lock for write to scan sucks, no argument there. But > > > can you please be specific about the impact of read lock v. RCU in th= e > > > case of arm64? I had asked about this before and you never replied. > > > > > > My concern remains that adding support for software table walkers > > > outside of the MMU lock entirely requires more work than just deferri= ng > > > the deallocation to an RCU callback. Walkers that previously assumed > > > 'exclusive' access while holding the MMU lock for write must now cope > > > with volatile PTEs. > > > > > > Yes, this problem already exists when hardware sets the AF, but the > > > lock-free walker implementation needs to be generic so it can be appl= ied > > > for other PTE bits. > > > > Direct reclaim is multi-threaded and each reclaimer can take the mmu > > lock for read (testing the A-bit) or write (unmapping before paging > > out) on arm64. The fundamental problem of using the readers-writer > > lock in this case is priority inversion: the readers have lower > > priority than the writers, so ideally, we don't want the readers to > > block the writers at all. > > > > Using my previous (crude) analogy: puting the bill right in front of > > you (the writers) profits immediately whereas searching for the > > largest bill (the readers) can be futile. > > > > As I said earlier, I prefer we drop the arm64 support for now, but I > > will not object to taking the mmu lock for read when clearing the > > A-bit, as long as we fully understand the problem here and document it > > clearly. > > FWIW, Google Cloud has been doing proactive reclaim and kstaled-based > aging (a Google-internal page aging daemon, for those outside of > Google) for many years on x86 VMs with the A-bit harvesting > under the write-lock. So I'm skeptical that making ARM64 lockless is > necessary to allow Secondary MMUs to participate in MGLRU aging with > acceptable performance for Cloud usecases. I don't even think it's > necessary on x86 but it's a simple enough change that we might as well > just do it. The obvious caveat here: If MGLRU aging and kstaled aging are substantially different in how frequently they trigger mmu_notifiers, then my analysis may not be correct. I'm hoping Yu you can shed some light on that. I'm also operating under the assumption that Secondary MMUs are only participating in aging, and not look-around (i.e. what is implemented in v4). > > I suspect under pathological conditions (host under intense memory > pressure and high rate of reclaim occurring) making A-bit harvesting > lockless will perform better. But under such conditions VM performance > is likely going to suffer regardless. In a Cloud environment we deal > with that through other mechanisms to reduce the rate of reclaim and > make the host healthy. > > For these reasons, I think there's value in giving users the option to > enable Secondary MMUs participation MGLRU aging even when A-bit > test/clearing is not done locklessly. I believe this was James' intent > with the Kconfig. Perhaps a default-off writable module parameter > would be better to avoid distros accidentally turning it on? > > If and when there is a usecase for optimizing VM performance under > pathological reclaim conditions on ARM, we can make it lockless then.