Received: by 2002:a05:7208:2202:b0:86:316c:7444 with SMTP id s2csp670740rbb; Fri, 31 May 2024 14:18:24 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCX69qM4/giiOKgtdCZ+Kzof9szY+D32XvsRPFeT/qMlcEhQ76bOVG1AdPbNGQKQvBZpGuFS5NEZqPiRi6Z1NH/3/NSuhPmmV4NSfQyIjg== X-Google-Smtp-Source: AGHT+IH0bYsP2+HJhfXBDTJeLSaEkfPGUDQX9k/3ainBpg4wei+bjzg1JONTpBR2HWn34/bz0RhE X-Received: by 2002:a50:d7d5:0:b0:57a:273e:d8e8 with SMTP id 4fb4d7f45d1cf-57a36498455mr2418926a12.39.1717190304276; Fri, 31 May 2024 14:18:24 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717190304; cv=pass; d=google.com; s=arc-20160816; b=nU4mG4Sp9FHW0yzRpVAYShFnC2/qNjUe7jMUUl4Qy+0nBdL5ozd7dOLUoD0M7soIHo 3xorvbBEEvV2T7hidM8tAxdsqtfZuJdaH4e5N6bdInSAxAMuFC5P88cxylZFZYrBV/ZS yx2Ea8le58GvXVN/Efkn8Jr0QfpcgQjbIKuKSARN0X71OLjH/rj72RDDEzLCFx/q8Nbn VaVFzvCUvPT6VB6YjBYbn96fscDwmkoH1HzKadHci1RRIwJMwdRkwiI9nFOXAgQGXSk2 ArQrSF8tVYuz+ZImez5OZI583H9mq36IgrXB99dO/yQmD8iurct9M38ZkDMXqICRQzhN 0uIA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=Xx3RPcl1E+PKSqvPm+Il1+elz18o0K6zeYgjqcWFZiw=; fh=b1uRubRGiU3W6VNheQ//KYr7BFUFVrOtXxe83qHrack=; b=oVbhA/VdteWDyYnmSdJVxY31W3pkETSTTgFprYCw05bKZT11AQcr+GKqIrdVoHfher tYHGYNvOkkvTZykogxWExrsfBbMzx4TIscTYmEL58UVx9twDD4klFFic2CLR265zVyjI SDWjCDVoIWvNI6g/N5M4V52Zm2Skmg6CijQfED+ID/yQAvY25L8SrFPXmmwvm7kDWecC jr8oHyNTDw6CGOMuDjB4cz6dSvsbBGouc/ol5JwYunroJ4/Ci4S7vspgPVP/kmBJI3uy AspeJW/hBvMi8BrUm9nmTskJiNRaJYlsdLpv8tRA5nLSIvKQnYSC9PJ2j7f/kqy0S2ba IiQQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=LEAebNXp; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-197483-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-197483-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id 4fb4d7f45d1cf-57a31ca4ec7si1279912a12.496.2024.05.31.14.18.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 31 May 2024 14:18:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-197483-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=LEAebNXp; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-197483-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-197483-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id EAD921F229F1 for ; Fri, 31 May 2024 21:07:48 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 04C0A7EEF5; Fri, 31 May 2024 21:07:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LEAebNXp" Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 269997A158 for ; Fri, 31 May 2024 21:07:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.41 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717189642; cv=none; b=uWQ4Rlxpo9egNC38O++QOfPyGcgAur8RXajyRu6fudQ25dLggZ8ZdDuXqhU2vItXvBzPOLftU1anXTkrvfPYveO4aVi6/IN6IG2XpLtmfWiCrdpEC9hca0e/zk6Al/wWYsBr0J29anKzxtAl6UZq2QAVTYAlq1HnFZqiAzrXGQY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717189642; c=relaxed/simple; bh=Xx3RPcl1E+PKSqvPm+Il1+elz18o0K6zeYgjqcWFZiw=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=cXNSWe79Z4pu2BvwFo54bvWhX2tDBLybwL5lomeGNBaQAjX7P+uaG/DBlmD04af3DFByG+b4IY39bx99qaVgvsBEpHfvs/DrhSQD2Wxe1estC20W3796A6KlgyIEgFy37L9XGyyLsg0yJX0UjS9lRMkXfl5xFUsWeKPCL9mod+g= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=LEAebNXp; arc=none smtp.client-ip=209.85.221.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-wr1-f41.google.com with SMTP id ffacd0b85a97d-35e0eb3efd0so896438f8f.0 for ; Fri, 31 May 2024 14:07:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717189638; x=1717794438; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Xx3RPcl1E+PKSqvPm+Il1+elz18o0K6zeYgjqcWFZiw=; b=LEAebNXpt4QJDIJMMkcMJSfPcXLFgY+S01OYxYclGwNBqE+pQnFvqNscm9U53Jl5EK df6ddbQB8oYw5k2UWxgmbxcrAgJA6JXj9yw2MRmhPHNgHuga9YHW1sBTgnvkWuwYQyrm j6zqbVPWFauuTw86A8lG1Xs6adpZi2EEa8s/XV63WUQ89K/f849fFUET1TcEow+zhNjF gmDk7P1Bfeb6dOYqlCo1p3OBVGmBBcymNuV7ZK6BydcKTcGjPCYF05Dp9xNRO6ypIPzK 0iGUuZpV7id4lLvmiboI4zDMzXK1ACgbx+LO5ovdJn4+O/u8js/hyjn8dB2UO7w82Om5 IltQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717189638; x=1717794438; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Xx3RPcl1E+PKSqvPm+Il1+elz18o0K6zeYgjqcWFZiw=; b=Us7rhgM7wLjGH9OPLK828J+YdK+H7YFMO2a3Pdp6TFQGhEoG9PqY3Q6JPIbpwYW69Q 9aOKfyHgSUfMMCAQEShw25dh8lAlprgXhpruZdcviXxWsGSUJA7nsHZQQ4k1Zf45dbv5 tMk3KXMbYgPARdo7AAyBcceOoYobj5Oj1L2hrOMDERfVPTbcQ3R5R1z5XMwYxHB/Kyzg 74NevorLUIoSJJkLJVziUjVP6ZjmZDCq3XHzUO2O4Tl9ki26IuiFgkRD5Bo1oVhE9IqZ N28kMY8pfmJZT7ECLDCSrdUtnHKQmy90WJJCW72BqBsZjjE2Y7PBGLZIVvEZW3T0btrt In4w== X-Forwarded-Encrypted: i=1; AJvYcCWK2BFNucojlWkB15QGd6MSrQrkfzRYU9TobF3M9dmZKwUw4A24ENeUor0Yt+JLfwJygy52s5hr/BFOsXqLbsBGVOX7nySE086sxpUP X-Gm-Message-State: AOJu0Yz2813Zeq5DLekK1BUL9aO9wLSk59WqOhKdICuU34rfuETpKRPQ JaXSd3AGL7uNZg/GlqcIpaGh0uUOCQKsQ8UIwEVJRMR+DP//8b6L4093xZXdh5SqFvl2eUX+j91 rjX0UH0wnxybwmFY3Lft8+I5D2CQzylxsYDbB X-Received: by 2002:a5d:4cc1:0:b0:34c:d9f5:a8e with SMTP id ffacd0b85a97d-35e0f25b1a0mr2178382f8f.7.1717189638221; Fri, 31 May 2024 14:07:18 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> <20240529180510.2295118-3-jthoughton@google.com> In-Reply-To: From: David Matlack Date: Fri, 31 May 2024 14:06:49 -0700 Message-ID: Subject: Re: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging To: Yu Zhao Cc: Oliver Upton , James Houghton , Andrew Morton , Paolo Bonzini , Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Rientjes , Huacai Chen , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, May 31, 2024 at 1:31=E2=80=AFPM Yu Zhao wrote: > > On Fri, May 31, 2024 at 1:24=E2=80=AFAM Oliver Upton wrote: > > > > On Wed, May 29, 2024 at 03:03:21PM -0600, Yu Zhao wrote: > > > On Wed, May 29, 2024 at 12:05=E2=80=AFPM James Houghton wrote: > > > > > > > > Secondary MMUs are currently consulted for access/age information a= t > > > > eviction time, but before then, we don't get accurate age informati= on. > > > > That is, pages that are mostly accessed through a secondary MMU (li= ke > > > > guest memory, used by KVM) will always just proceed down to the old= est > > > > generation, and then at eviction time, if KVM reports the page to b= e > > > > young, the page will be activated/promoted back to the youngest > > > > generation. > > > > > > Correct, and as I explained offline, this is the only reasonable > > > behavior if we can't locklessly walk secondary MMUs. > > > > > > Just for the record, the (crude) analogy I used was: > > > Imagine a large room with many bills ($1, $5, $10, ...) on the floor, > > > but you are only allowed to pick up 10 of them (and put them in your > > > pocket). A smart move would be to survey the room *first and then* > > > pick up the largest ones. But if you are carrying a 500 lbs backpack, > > > you would just want to pick up whichever that's in front of you rathe= r > > > than walk the entire room. > > > > > > MGLRU should only scan (or lookaround) secondary MMUs if it can be > > > done lockless. Otherwise, it should just fall back to the existing > > > approach, which existed in previous versions but is removed in this > > > version. > > > > Grabbing the MMU lock for write to scan sucks, no argument there. But > > can you please be specific about the impact of read lock v. RCU in the > > case of arm64? I had asked about this before and you never replied. > > > > My concern remains that adding support for software table walkers > > outside of the MMU lock entirely requires more work than just deferring > > the deallocation to an RCU callback. Walkers that previously assumed > > 'exclusive' access while holding the MMU lock for write must now cope > > with volatile PTEs. > > > > Yes, this problem already exists when hardware sets the AF, but the > > lock-free walker implementation needs to be generic so it can be applie= d > > for other PTE bits. > > Direct reclaim is multi-threaded and each reclaimer can take the mmu > lock for read (testing the A-bit) or write (unmapping before paging > out) on arm64. The fundamental problem of using the readers-writer > lock in this case is priority inversion: the readers have lower > priority than the writers, so ideally, we don't want the readers to > block the writers at all. > > Using my previous (crude) analogy: puting the bill right in front of > you (the writers) profits immediately whereas searching for the > largest bill (the readers) can be futile. > > As I said earlier, I prefer we drop the arm64 support for now, but I > will not object to taking the mmu lock for read when clearing the > A-bit, as long as we fully understand the problem here and document it > clearly. FWIW, Google Cloud has been doing proactive reclaim and kstaled-based aging (a Google-internal page aging daemon, for those outside of Google) for many years on x86 VMs with the A-bit harvesting under the write-lock. So I'm skeptical that making ARM64 lockless is necessary to allow Secondary MMUs to participate in MGLRU aging with acceptable performance for Cloud usecases. I don't even think it's necessary on x86 but it's a simple enough change that we might as well just do it. I suspect under pathological conditions (host under intense memory pressure and high rate of reclaim occurring) making A-bit harvesting lockless will perform better. But under such conditions VM performance is likely going to suffer regardless. In a Cloud environment we deal with that through other mechanisms to reduce the rate of reclaim and make the host healthy. For these reasons, I think there's value in giving users the option to enable Secondary MMUs participation MGLRU aging even when A-bit test/clearing is not done locklessly. I believe this was James' intent with the Kconfig. Perhaps a default-off writable module parameter would be better to avoid distros accidentally turning it on? If and when there is a usecase for optimizing VM performance under pathological reclaim conditions on ARM, we can make it lockless then.