Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp950025rdb; Fri, 19 Jan 2024 04:02:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IHVoYZXBRGMgVD3lwREaJmnnarT7MJZ/oJyTpc3YTpPP9g+nBzn1W8e/hS1h41Rt7KTlGYe X-Received: by 2002:a05:6a20:4d91:b0:199:a43f:db4 with SMTP id gj17-20020a056a204d9100b00199a43f0db4mr2061019pzb.80.1705665729442; Fri, 19 Jan 2024 04:02:09 -0800 (PST) Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id v1-20020a62a501000000b006dbba49f1b8si1533172pfm.25.2024.01.19.04.02.08 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jan 2024 04:02:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-31123-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kernel.org header.s=k20201202 header.b=CZzl1VTL; arc=fail (body hash mismatch); spf=pass (google.com: domain of linux-kernel+bounces-31123-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31123-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 96257B213D8 for ; Fri, 19 Jan 2024 11:59:28 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D51D34F1E0; Fri, 19 Jan 2024 11:59:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CZzl1VTL" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2EAA4F1E8 for ; Fri, 19 Jan 2024 11:59:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705665560; cv=none; b=l5LK4ix7M932j3Xb83VBisl/TZpWxF1QK9N+Tvbd60x9w0EJy2+sLO1bXMwIO6/IKn07i1oJI0b29PgsbqRc8sD23uqPk0tpevP5aUCoBn7H172SmtzK3ACmYWsrxzIUfmcEZ87YiZ+Fh+258GrFr3++jgfU/sHTEP83xNVYlYI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705665560; c=relaxed/simple; bh=bAHJx0yhKKnHDIkSyj/+7VryXHJi2XP5zcuoP570mEY=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=RxD6zHQM9T/IjM5P3EmmK5gcgoncRGa7vMoIEoiS8LskntRVJwfq1Ob0de8tP+4SaHh+w7w6+Xhv3YlH5bq9ZoZWezIW3+wP0jGWT3Gkx9MzVcEAFYsTMaHBN9UqA95jpK/YwtGBwaw1h93PtLY0gAON7AYU6E42uOHesJIVPgk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CZzl1VTL; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 71C5FC43330 for ; Fri, 19 Jan 2024 11:59:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705665559; bh=bAHJx0yhKKnHDIkSyj/+7VryXHJi2XP5zcuoP570mEY=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=CZzl1VTLUCN9HS8jI9iqNBiYprEBRawsVnx/4AwG/RmrD3jCa1ZFtbEbu6RjLtJpu +h2590qAnFM/u7S00mh6ErcEjpR72sNqKtMh/CIuBgC0Wvc6FE1egjztbt8UCNTtUn bWZdK7xxwXX/pGUImXbuDayuYGR4yn1WUedv+GIwtvEFRJ5We19mSZ2YS8aok0vj7F Ebd8Lx0zivRSsdSGeIq5pmsokkWt8I+lWsNSGvU7ssekTY08sp8EGi0yAG8NZ5Yn93 g9Hwnei9dDjkh+86jL+l648GKxh6Igxrr1OHf7Z7JqyLtDS573YYcSF5CvHRo2SjFO WnOj8tkXo9yLQ== Received: by mail-pg1-f179.google.com with SMTP id 41be03b00d2f7-5cdfbd4e8caso622244a12.0 for ; Fri, 19 Jan 2024 03:59:19 -0800 (PST) X-Gm-Message-State: AOJu0YyQ1NG+A2Zk8WeqPJba0i5+n7Pra0vZlhU9dz1Ld9G3z3IleOvj x28hr73GfKszK+9WSfF5eTfOYHO6NeWFJxAdFPGiqe+JbqPWZLLb/JPeCYNw49JgF3tjun2aBMy Lj0aNwMuANlgERsIqKwtZWvzzTht1ic4tQal/ X-Received: by 2002:a05:6a21:150b:b0:199:afd6:2338 with SMTP id nq11-20020a056a21150b00b00199afd62338mr3022933pzb.43.1705665558914; Fri, 19 Jan 2024 03:59:18 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240117-zswap-xarray-v1-0-6daa86c08fae@kernel.org> <7f52ad78-e10b-438a-b380-49451bf6f64f@bytedance.com> <3a1b124d-4a97-4400-9714-0cceac53bd34@bytedance.com> <9b2f8385-735b-4341-b521-a42c9a9cb04c@bytedance.com> In-Reply-To: From: Chris Li Date: Fri, 19 Jan 2024 03:59:07 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 0/2] RFC: zswap tree use xarray instead of RB tree To: Chengming Zhou Cc: Yosry Ahmed , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, =?UTF-8?B?V2VpIFh177+8?= , Yu Zhao , Greg Thelen , Chun-Tse Shao , =?UTF-8?Q?Suren_Baghdasaryan=EF=BF=BC?= , Brain Geffon , Minchan Kim , Michal Hocko , Mel Gorman , Huang Ying , Nhat Pham , Johannes Weiner , Kairui Song , Zhongkun He , Kemeng Shi , Barry Song , "Matthew Wilcox (Oracle)" , "Liam R. Howlett" , Joel Fernandes Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Jan 19, 2024 at 3:12=E2=80=AFAM Chengming Zhou wrote: > > On 2024/1/19 18:26, Chris Li wrote: > > On Thu, Jan 18, 2024 at 10:19=E2=80=AFPM Chengming Zhou > > wrote: > >> > >> On 2024/1/19 12:59, Chris Li wrote: > >>> On Wed, Jan 17, 2024 at 11:35=E2=80=AFPM Chengming Zhou > >>> wrote: > >>> > >>>>>>> mm-stable zswap-split-tree zswap= -xarray > >>>>>>> real 1m10.442s 1m4.157s 1m9.9= 62s > >>>>>>> user 17m48.232s 17m41.477s 17m45= 887s > >>>>>>> sys 8m13.517s 5m2.226s 7m59.= 305s > >>>>>>> > >>>>>>> Looks like the contention of concurrency is still there, I haven'= t > >>>>>>> look into the code yet, will review it later. > >>>>> > >>>>> Thanks for the quick test. Interesting to see the sys usage drop fo= r > >>>>> the xarray case even with the spin lock. > >>>>> Not sure if the 13 second saving is statistically significant or no= t. > >>>>> > >>>>> We might need to have both xarray and split trees for the zswap. It= is > >>>>> likely removing the spin lock wouldn't be able to make up the 35% > >>>>> difference. That is just my guess. There is only one way to find ou= t. > >>>> > >>>> Yes, I totally agree with this! IMHO, concurrent zswap_store paths s= till > >>>> have to contend for the xarray spinlock even though we would have co= nverted > >>>> the rb-tree to the xarray structure at last. So I think we should ha= ve both. > >>>> > >>>>> > >>>>> BTW, do you have a script I can run to replicate your results? > >>> > >>> Hi Chengming, > >>> > >>> Thanks for your script. > >>> > >>>> > >>>> ``` > >>>> #!/bin/bash > >>>> > >>>> testname=3D"build-kernel-tmpfs" > >>>> cgroup=3D"/sys/fs/cgroup/$testname" > >>>> > >>>> tmpdir=3D"/tmp/vm-scalability-tmp" > >>>> workdir=3D"$tmpdir/$testname" > >>>> > >>>> memory_max=3D"$((2 * 1024 * 1024 * 1024))" > >>>> > >>>> linux_src=3D"/root/zcm/linux-6.6.tar.xz" > >>>> NR_TASK=3D32 > >>>> > >>>> swapon ~/zcm/swapfile > >>> > >>> How big is your swapfile here? > >> > >> The swapfile is big enough here, I use a 50GB swapfile. > > > > Thanks, > > > >> > >>> > >>> It seems you have only one swapfile there. That can explain the conte= ntion. > >>> Have you tried multiple swapfiles for the same test? > >>> That should reduce the contention without using your patch. > >> Do you mean to have many 64MB swapfiles to swapon at the same time? > > > > 64MB is too small. There are limits to MAX_SWAPFILES. It is less than > > (32 - n) swap files. > > If you want to use 50G swap space, you can have MAX_SWAPFILES, each > > swapfile 50GB / MAX_SWAPFILES. > > Right. > > > > >> Maybe it's feasible to test, > > > > Of course it is testable, I am curious to see the test results. > > > >> I'm not sure how swapout will choose. > > > > It will rotate through the same priority swap files first. > > swapfile.c: get_swap_pages(). > > > >> But in our usecase, we normally have only one swapfile. > > > > Is there a good reason why you can't use more than one swapfile? > > I think no, but it seems an unneeded change/burden to our admin. > So I just tested and optimized for the normal case. I understand. Just saying it is not really a kernel limitation per say. I blame the user space :-) > > > One swapfile will not take the full advantage of the existing code. > > Even if you split the zswap trees within a swapfile. With only one > > swapfile, you will still be having lock contention on "(struct > > swap_info_struct).lock". > > It is one lock per swapfile. > > Using more than one swap file should get you better results. > > IIUC, we already have the per-cpu swap entry cache to not contend for > this lock? And I don't see much hot of this lock in the testing. Yes. The swap entry cache helps. The cache batching also causes other problems, e.g. the long tail in swap faults handling. Shameless plug, I have a patch posted earlier to address the swap fault long tail latencies. https://lore.kernel.org/linux-mm/20231221-async-free-v1-1-94b277992cb0@kern= el.org/T/ Chris