Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp498134rdb; Thu, 18 Jan 2024 09:31:13 -0800 (PST) X-Google-Smtp-Source: AGHT+IExInW0nFVrdaP88z2n4CQTjm2UEOKKs6A10iXHaCdWLoeUQe9nmWLvfrSFkg0hDtLgxhEw X-Received: by 2002:ac8:5f4b:0:b0:429:b580:aa60 with SMTP id y11-20020ac85f4b000000b00429b580aa60mr4296545qta.14.1705599073076; Thu, 18 Jan 2024 09:31:13 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705599073; cv=pass; d=google.com; s=arc-20160816; b=jmZ/6JpUt1UBfrEL5weSh37MC73ZM1XwuwIcnnyX9eEL4JspEBtu4qr+tOlUa0Utu0 ayLsB04IpR2mWElL+CNlQA5b4Rwt5WaIrqOjCjy4dB8jnPVEeGD+pko4NOP2RyRDyRkV W4L1u3KIECnEvg6rTyq1vwR4mOAV6GwCFh23O1zfN6qHJLTXTOFhYwD8JIg6x9CJic9a KUDal8lZMpcAUhDV4i1FJPvxbTxAco4ShAYzHXOeRCf9VANSvcVdoKiO9Xjsvh1mY30d 9DCr7iZkliylONJOOe5mZKD6IVvbAPlG7qkWfPQbV0bhx54aXtSS37lyYXTdH8mxQuwp 9xNQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=dZ8+aWnS6wpwgfnA94ITk/n+W7jq2mwNVOlBfkoNPGg=; fh=6iFmDnFggWiJnP54b/eu9R8ChVJ1CaHy5y8QDHeZ+HM=; b=qkcKIjz+9euLHDI+KTMc9kkLl98GbpDYCC3NqhqF4VSd7z6hjFgZoC1EhYLu+gcRp5 qIJl0983uo5aHNs0W0b+TkrPpsAo6pCSqyYHr5XFikjlOhztJZLz0t5JE046XCW0O7IL Tz/G5V4tqizYG8Ug09CWw6hMoxM+DEDgwVRFbKq/u1ESzC2qu/SqkzH8O5FRZ9j508wG Jtv49lKrdWxPVD2vmZPmZUpPkBr7bClmeGuC1f75mSakk+YhId06fc3OKGAPPGFawrdh M8PMc5sfAiSR7Po/vZrSfAcqYoMWDu+40EOS5IdPXvk7juryojaDIA7gwBOX/BwNi+5a pV1g== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=HJRezMYj; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-30419-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-30419-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id y6-20020a05620a09c600b0078323154c8esi14261360qky.445.2024.01.18.09.31.12 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jan 2024 09:31:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-30419-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=HJRezMYj; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-30419-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-30419-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id A502C1C2219C for ; Thu, 18 Jan 2024 17:31:02 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7A7912D048; Thu, 18 Jan 2024 17:30:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="HJRezMYj" Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F00BF2D03D for ; Thu, 18 Jan 2024 17:30:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.44 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705599054; cv=none; b=uMvrNqf7FM5cZw3hVUYBs3ToXqwJEALRVGqHv2/awEqbZVgE1p036/xpQZp34GGwHhICvg8PuLrpZfTkW7uQw9hqOyCw70fErsoU9M57IsgkIP+5FVJcyHezvxUmTK9evRH8TDiK+IUhL11/OmtrR8vTkr97ipo+zCuEEO/xVMc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705599054; c=relaxed/simple; bh=YVukkp6Ty8xmn33aJ2CAd5z1se/6rMDA1u79jM3rJ6Q=; h=Received:DKIM-Signature:X-Google-DKIM-Signature: X-Gm-Message-State:X-Google-Smtp-Source:X-Received:MIME-Version: References:In-Reply-To:From:Date:Message-ID:Subject:To:Cc: Content-Type:Content-Transfer-Encoding; b=ibFg9/csRQrSGiPCdvZ3BPyBv4zAHzKpK8RhX8VTdnkDFNFStn7tvQrEyg8uN7vxETpCsLyk0wYN3S2uP8o01lzw77eLzub5C5NcjW/MzxGrnWOgYGfLCXJ7s6n2ZWnp1eG392dHJfT5am3sPymMxt93cHaC/Vm2KyJel1/l8CE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=HJRezMYj; arc=none smtp.client-ip=209.85.218.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-a2c375d2430so1046502966b.1 for ; Thu, 18 Jan 2024 09:30:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705599051; x=1706203851; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=dZ8+aWnS6wpwgfnA94ITk/n+W7jq2mwNVOlBfkoNPGg=; b=HJRezMYjvD3u32A09tKZXH4iwRd+pipu6x9q2n3l86yBjO8l76NyK3B1IK2ptfo/zN VLwJEllWeq2girnqFejvBZKp9f/AMsVITC//xdB8qOJEJT0CEoRTJbfdiKdiz3GelFgR 4FXCW2Ktq9y+DHswf007owvU7p/PxwnvehacAAhdMGWMjgH3vUeNTLiHm3LKYTaVT5qf AXVOKfGKxPM0t3r7o2jhRv4sSNaGfmWmPwjyoHPyrgJJAxKxAWT3+gXsp6DnIFier9kj nEPZx2QsYYauGlHlOwtYiW+mo1Sroj3LO1qW9jzCvQT6bUUXgvvryjZR4FZOEvw3GCOW rL3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705599051; x=1706203851; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dZ8+aWnS6wpwgfnA94ITk/n+W7jq2mwNVOlBfkoNPGg=; b=RLdKy3l/nbgAhDMVNnE8OcZPPtCE8KQxPpA7+8FID7AnpZssX4sMQB91PUTxPOjqB5 5jfU4IaRHRFo90xdiDc10d/ztVTM1BsBDLtYiEDTBkr7/AByz1QMRNZ03QA/JFVlJmDb 7pYreFuLAHjnFS1cPUu/3yIp3lQXlWTSp2mnG9QGsDuuTlhhrOtJAP1j1kBgZf7tOcFq dvobHFtTKnJNwQf5renK827plI5DbxJ5wyTASh5vqGLkOOq3jZuNi7VzrxBFNDHWUANe U6dPeTwXe/uk6jYmEKpXnj7xJe4C2ngy94GeFXBDCWNp2wx14GXdc4cHOZotJ7vK8Qx5 /DEg== X-Gm-Message-State: AOJu0YycGBuUGoEqTOr34rNwGVl8h1lygCVY3U8h6CrLjxQ6MhxS/tS0 umMTciBx2zGfUnT4AQpBJ3DL2RJdC1XXWHmylYUnqCl1cIdOPq5OahaBu7OOJvEeYWiQSpeNUIy auLHXC4d8pzcagZi+aBZYW8pUx0cpLwp8A9+y X-Received: by 2002:a17:906:c109:b0:a2a:1343:5b18 with SMTP id do9-20020a170906c10900b00a2a13435b18mr822061ejc.86.1705599050944; Thu, 18 Jan 2024 09:30:50 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240117-b4-zswap-lock-optimize-v1-0-23f6effe5775@bytedance.com> <20240118153425.GI939255@cmpxchg.org> In-Reply-To: <20240118153425.GI939255@cmpxchg.org> From: Yosry Ahmed Date: Thu, 18 Jan 2024 09:30:12 -0800 Message-ID: Subject: Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree To: Johannes Weiner Cc: Chengming Zhou , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chris Li , Nhat Pham Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Jan 18, 2024 at 7:34=E2=80=AFAM Johannes Weiner wrote: > > On Wed, Jan 17, 2024 at 10:37:22AM -0800, Yosry Ahmed wrote: > > On Wed, Jan 17, 2024 at 1:23=E2=80=AFAM Chengming Zhou > > wrote: > > > > > > When testing the zswap performance by using kernel build -j32 in a tm= pfs > > > directory, I found the scalability of zswap rb-tree is not good, whic= h > > > is protected by the only spinlock. That would cause heavy lock conten= tion > > > if multiple tasks zswap_store/load concurrently. > > > > > > So a simple solution is to split the only one zswap rb-tree into mult= iple > > > rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This id= ea is > > > from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB tr= unks"). > > > > > > Although this method can't solve the spinlock contention completely, = it > > > can mitigate much of that contention. Below is the results of kernel = build > > > in tmpfs with zswap shrinker enabled: > > > > > > linux-next zswap-lock-optimize > > > real 1m9.181s 1m3.820s > > > user 17m44.036s 17m40.100s > > > sys 7m37.297s 4m54.622s > > > > > > So there are clearly improvements. And it's complementary with the on= going > > > zswap xarray conversion by Chris. Anyway, I think we can also merge t= his > > > first, it's complementary IMHO. So I just refresh and resend this for > > > further discussion. > > > > The reason why I think we should wait for the xarray patch(es) is > > there is a chance we may see less improvements from splitting the tree > > if it was an xarray. If we merge this series first, there is no way to > > know. > > I mentioned this before, but I disagree quite strongly with this > general sentiment. > > Chengming's patches are simple, mature, and have convincing > numbers. IMO it's poor form to hold something like that for "let's see > how our other experiment works out". The only exception would be if we > all agree that the earlier change flies in the face of the overall > direction we want to pursue, which I don't think is the case here. My intention was not to delay merging these patches until the xarray patches are merged in. It was only to wait until the xarray patches are *posted*, so that we can redo the testing on top of them and verify that the gains are still there. That should have been around now, but the xarray patches were posted in a form that does not allow this testing (because we still have a lock on the read path), so I am less inclined. My rationale was that if the gains from splitting the tree become minimal after we switch to an xarray, we won't know. It's more difficult to remove optimizations than to add them, because we may cause a regression. I am kind of paranoid about having code sitting around that we don't have full information about how much it's needed. In this case, I suppose we can redo the testing (1 tree vs. split trees) once the xarray patches are in a testable form, and before we have formed any strong dependencies on the split trees (we have time until v6.9 is released, I assume). How about that? > > With the xarray we'll still have a per-swapfile lock for writes. That > lock is the reason SWAP_ADDRESS_SPACE segmentation was introduced for > the swapcache in the first place. Lockless reads help of course, but > read-only access to swap are in the minority - stores will write, and > loads are commonly followed by invalidations. Somebody already went > through the trouble of proving that xarrays + segmentation are worth > it for swap load and store access patterns. Why dismiss that? Fair point, although I think the swapcache lock may be more contended than the zswap tree lock. > So my vote is that we follow the ususal upstreaming process here: > merge the ready patches now, and rebase future work on top of it. No objections given the current state of the xarray patches as I mentioned earlier, but I prefer we redo the testing once possible with the xarray.