Received: by 2002:a05:7412:798b:b0:fc:a2b0:25d7 with SMTP id fb11csp112720rdb; Wed, 21 Feb 2024 19:55:33 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUzptDwbZO9hT/I2X8PbOQxinaQK9H9cuSLvdGtBK4/hI6xYoZJ8bGSGZqwfD94jJk8/Bq/AcRPr5opdInnNO++DXLgr6OL16g8ptnUng== X-Google-Smtp-Source: AGHT+IHBkVnO9KaQPMLUOqORIWO9VUD0kWZhToGQI8Tf/QZ5x257TCuOJfIUKp7553fqCQ9xX3j/ X-Received: by 2002:a05:6870:730a:b0:21f:1705:d5b0 with SMTP id q10-20020a056870730a00b0021f1705d5b0mr7290146oal.52.1708574133282; Wed, 21 Feb 2024 19:55:33 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708574133; cv=pass; d=google.com; s=arc-20160816; b=tXmU7xMtLDN6mijk7Bu8JwdVaXqGo570JAumm6j/9R8JamrHZ1yccAinWbl8a0as+a Tq0eyUdOqI+cxbQ1R7fFmPoSfZGg82BUEsp9mQyyoCTHfc0FXT6fYQ7HG+8E9yA1vnRe p5vjNOG9LOuQDDVIxZbsJ7EW0w+2yY3xiCdFVo6tsSRVNPTRMeXM+DOxhqvEppFaFP7k 9Y4QV7wWSOZ1EEQicUnjqDuPHxT0m++JjZ3YzpY5TKWVtERz5+mJXitBUnxTO1Qam2mp 6CDwHrPobPWFYbclz5YRaBwEb6/vhl6zBaNBOqJVKwCydH2/71OIWkWQ9uVfbTjxHrhs OJOA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:dkim-signature:message-id; bh=Nv6VYGEQ0sJ5MaAPxTPkbrTyxfIJu5b09N+vZJ3e8NM=; fh=hklFZm1P0N9vdpfCQavk8T8/As6nXSL+6L2vbuq47uM=; b=nFN3JguVqQCmQXgRJ3QacwHzGZFH0pknS2XErScO4uH+zkYN1xOS7CfnOSmrIzE8DE EQA3QMWFX8+52b4bLIvn7W4I35wuYPL3kbun3hqTmhVjWPkDmnfM2Amura0IGS0AG6PT WlLubSH42HSudM/YWBFbIlGGDaRotSdLwWOJ+SbsbPjpLrgaFXrgeJuzVillI+qY3DGe fq5/qIX9oxoSgOhkmSG6pFnk9ckmIt83PB/rgehXrXoy0YO60W8ns21/eAnQGoGToezn 2BiC8l6JqAYbEbW9xKm6Z22AWiBAgUQ/MjMcmf6h7cdWnHLOtz0fvIpMfWdy+L0gxlV2 JydQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=xNMQagNW; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-75881-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-75881-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id c30-20020a63725e000000b005dc12a99c6esi9206861pgn.355.2024.02.21.19.55.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Feb 2024 19:55:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-75881-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=xNMQagNW; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-75881-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-75881-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 8C6BBB21498 for ; Thu, 22 Feb 2024 03:55:29 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E2716175AB; Thu, 22 Feb 2024 03:55:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="xNMQagNW" Received: from out-171.mta0.migadu.com (out-171.mta0.migadu.com [91.218.175.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9BE48101EC for ; Thu, 22 Feb 2024 03:55:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.171 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708574122; cv=none; b=T202H8oQ4/YthSenAyZegIf8Y5gnCmvXwdrV/b2XW2zZqw5cjq97HiutNhFt2DjL6HUkapK9RkD2z9oyKTJgrG1am2yefgDvdiriGNMdt0qzeGSPdBMAi8AfSaTccD5Y715HEdTeA1N/jmIhNcV4SWZB3PQSgLg55k3zOrRatbY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708574122; c=relaxed/simple; bh=jDOLWPHSvbqh81vabMmbh1edqbrBy/gl1j2n8fl7AD4=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=GmQFUueHLzzpltoysb8qn7C69vYta3wwFH/9MlZdfDgscSxQklcummJCHmLC0LuZtyTM/wKQFaugb9Oimd3SJnTea6XPQkJirvxGuHzLovM7QhEIvYE2ZPEPKbXz7iT9zw7fQIUEvT7L++gEgD/f4IDWMmN/zZxGLFWJuY3Hc0g= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=xNMQagNW; arc=none smtp.client-ip=91.218.175.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708574117; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Nv6VYGEQ0sJ5MaAPxTPkbrTyxfIJu5b09N+vZJ3e8NM=; b=xNMQagNWKbz5rPZH5Dx9BbA/BtkPxUuvrX5Jc19lx3AakFMci1ylR/84U7UKqwH/0qXGz7 UizCwc/BZU6BA31XMxvPRNiJN0thrAHKN7PeEgObr6Da2BOrAE4++Uf0ciTIkBY+MDmtlH 2BBUqBKYxsELKs8xr0hRkdJN3rGivZM= Date: Thu, 22 Feb 2024 11:54:44 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [RFC] Analyzing zpool allocators / Removing zbud and z3fold Content-Language: en-US To: Yosry Ahmed , Andrew Morton Cc: Vitaly Wool , Miaohe Lin , Johannes Weiner , Nhat Pham , Linux-MM , Linux Kernel Mailing List , Christoph Hellwig , Sergey Senozhatsky , Minchan Kim , Chris Down , Seth Jennings , Dan Streetman , Chris Li References: X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 2024/2/9 11:27, Yosry Ahmed wrote: > Hey folks, > > This is a follow up on my previously sent RFC patch to deprecate > z3fold [1]. This is an RFC without code, I thought I could get some > discussion going before writing (or rather deleting) more code. I went > back to do some analysis on the 3 zpool allocators: zbud, zsmalloc, > and z3fold. This is a great analysis! Sorry for being late to see it. I want to vote for this direction, zram has been using zsmalloc directly, zswap can also do this, which is simpler and we can just maintain and optimize only one allocator. The only evident downside is dependence on MMU, right? And I'm trying to optimize the scalability performance for zsmalloc now, which is bad so zswap has to use 32 pools to workaround it. (zram only use one pool, should also have the scalability problem on big server, maybe have to use many zram block devices to workaround it too.) But too many pools would cause more memory waste and more fragmentation, so the resulted compression ratio is not good enough. As for the MMU dependence, we can actually avoid it? Maybe I missed something, we can get object's memory vecs from zsmalloc, then send it to decompress, which should support length(memory vecs) > 1? > > [1]https://lore.kernel.org/linux-mm/20240112193103.3798287-1-yosryahmed@google.com/ > > In this analysis, for each of the allocators I ran a kernel build test > on tmpfs in a limit cgroup 5 times and captured: > (a) The build times. > (b) zswap_load() and zswap_store() latencies using bpftrace. > (c) The maximum size of the zswap pool from /proc/meminfo::Zswapped. Here should use /proc/meminfo::Zswap, right? Zswap is the sum of pool pages size, Zswapped is the swapped/compressed pages. Thanks! > > Here are the results I have. I am using zsmalloc as the base for all > comparisons. > > -------------------------------- -------------------------------- > > (a) Build times > > *** zsmalloc *** > ────────────────────────────────────────────────────────────── > LABEL │ MIN │ MAX │ MEAN │ MEDIAN │ STDDEV > ────────────────────┼──────────┼──────────┼──────────┼──────── > real │ 108.890 │ 116.160 │ 111.304 │ 110.310 │ 2.719 > sys │ 6838.860 │ 7137.830 │ 6936.414 │ 6862.160 │ 114.860 > user │ 2838.270 │ 2859.050 │ 2850.116 │ 2852.590 │ 7.388 > ────────────────────────────────────────────────────────────── > > *** zbud *** > ────────────────────────────────────────────────────────────── > LABEL │ MIN │ MAX │ MEAN │ MEDIAN │ STDDEV > ────────────────────┼──────────┼──────────┼──────────┼──────── > real │ 105.540 │ 114.430 │ 108.738 │ 108.140 │ 3.027 > sys │ 6553.680 │ 6794.330 │ 6688.184 │ 6661.840 │ 86.471 > user │ 2836.390 │ 2847.850 │ 2842.952 │ 2843.450 │ 3.721 > ────────────────────────────────────────────────────────────── > > *** z3fold *** > ────────────────────────────────────────────────────────────── > LABEL │ MIN │ MAX │ MEAN │ MEDIAN │ STDDEV > ────────────────────┼──────────┼──────────┼──────────┼──────── > real │ 113.020 │ 118.110 │ 114.642 │ 114.010 │ 1.803 > sys │ 7168.860 │ 7284.900 │ 7243.930 │ 7265.290 │ 42.254 > user │ 2865.630 │ 2869.840 │ 2868.208 │ 2868.710 │ 1.625 > ────────────────────────────────────────────────────────────── > > Comparing the means, zbud is 2.3% faster, and z3fold is 3% slower. > > (b) zswap_load() and zswap_store() latencies > > *** zsmalloc *** > > @load_ns: > [128, 256) 377 | | > [256, 512) 772 | | > [512, 1K) 923 | | > [1K, 2K) 22141 | | > [2K, 4K) 88297 | | > [4K, 8K) 1685833 |@@@@@ | > [8K, 16K) 17087712 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > [16K, 32K) 10875077 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > [32K, 64K) 777656 |@@ | > [64K, 128K) 127239 | | > [128K, 256K) 50301 | | > [256K, 512K) 1669 | | > [512K, 1M) 37 | | > [1M, 2M) 3 | | > > @store_ns: > [512, 1K) 279 | | > [1K, 2K) 15969 | | > [2K, 4K) 193446 | | > [4K, 8K) 823283 | | > [8K, 16K) 14209844 |@@@@@@@@@@@ | > [16K, 32K) 62040863 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > [32K, 64K) 9737713 |@@@@@@@@ | > [64K, 128K) 1278302 |@ | > [128K, 256K) 487285 | | > [256K, 512K) 4406 | | > [512K, 1M) 117 | | > [1M, 2M) 24 | | > > *** zbud *** > > @load_ns: > [128, 256) 452 | | > [256, 512) 834 | | > [512, 1K) 998 | | > [1K, 2K) 22708 | | > [2K, 4K) 171247 | | > [4K, 8K) 2853227 |@@@@@@@@ | > [8K, 16K) 17727445 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > [16K, 32K) 9523050 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > [32K, 64K) 752423 |@@ | > [64K, 128K) 135560 | | > [128K, 256K) 52360 | | > [256K, 512K) 4071 | | > [512K, 1M) 57 | | > > @store_ns: > [512, 1K) 518 | | > [1K, 2K) 13337 | | > [2K, 4K) 193043 | | > [4K, 8K) 846118 | | > [8K, 16K) 15240682 |@@@@@@@@@@@@@ | > [16K, 32K) 60945786 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > [32K, 64K) 10230719 |@@@@@@@@ | > [64K, 128K) 1612647 |@ | > [128K, 256K) 498344 | | > [256K, 512K) 8550 | | > [512K, 1M) 199 | | > [1M, 2M) 1 | | > > *** z3fold *** > > @load_ns: > [128, 256) 344 | | > [256, 512) 999 | | > [512, 1K) 859 | | > [1K, 2K) 21069 | | > [2K, 4K) 53704 | | > [4K, 8K) 1351571 |@@@@ | > [8K, 16K) 14142680 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > [16K, 32K) 11788684 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > [32K, 64K) 1133377 |@@@@ | > [64K, 128K) 121670 | | > [128K, 256K) 68663 | | > [256K, 512K) 120 | | > [512K, 1M) 21 | | > > [512, 1K) 257 | | > [1K, 2K) 10162 | | > [2K, 4K) 149599 | | > [4K, 8K) 648121 | | > [8K, 16K) 9115497 |@@@@@@@@ | > [16K, 32K) 56467456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > [32K, 64K) 16235236 |@@@@@@@@@@@@@@ | > [64K, 128K) 1397437 |@ | > [128K, 256K) 705916 | | > [256K, 512K) 3087 | | > [512K, 1M) 62 | | > [1M, 2M) 1 | | > > I did not perform any sophisticated analysis on these histograms, but > eyeballing them makes it clear that all allocators have somewhat > similar latencies. zbud is slightly better than zsmalloc, and z3fold > is slightly worse than zsmalloc. This corresponds naturally to the > build times in (a). > > (c) Maximum size of the zswap pool > > *** zsmalloc *** > 1,137,659,904 bytes = ~1.13G > > *** zbud *** > 1,535,741,952 bytes = ~1.5G > > *** z3fold *** > 1,151,303,680 bytes = ~1.15G > > zbud consumes ~32.7% more memory, and z3fold consumes ~1.8% more > memory. This makes sense because zbud only stores a maximum of two > compressed pages on each order-0 page, regardless of the compression > ratio, so it is bound to consume more memory. > > -------------------------------- -------------------------------- > > According to those results, it seems like zsmalloc is superior to > z3fold in both efficiency and latency. Zbud has a small latency > advantage, but that comes with a huge cost in terms of memory > consumption. Moreover, most known users of zswap are currently using > zsmalloc. Perhaps some folks are using zbud because it was the default > allocator up until recently. The only known disadvantage of zsmalloc > is the dependency on MMU. > > Based on that, I think it doesn't make sense to keep all 3 allocators > going forward. I believe we should start with removing either zbud or > z3fold, leaving only one allocator supporting MMU. Once zsmalloc > supports !MMU (if possible), we can keep zsmalloc as the only > allocator. > > Thoughts and feedback are highly appreciated. I tried to CC all the > interested folks, but others feel free to chime in. >