Received: by 2002:ab2:710b:0:b0:1ef:a325:1205 with SMTP id z11csp1263249lql; Tue, 12 Mar 2024 11:43:44 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUbzsPJiDVcebg2cz3xHx+YLHnmJmsODCpXoHSJvRPD3pvzKmFM9mtYSNZCvi79vj+a5LspvI55NRdQj3W2fG7x4oQWfOSu3YS8A+M//A== X-Google-Smtp-Source: AGHT+IFznHtVSBDilf107i6MQevK1j/hxLa6rWqq5+hzpgtkebVOEiOh/AsCgPggCyCDd9v2WBq3 X-Received: by 2002:a05:6870:a546:b0:221:bbb6:3aea with SMTP id p6-20020a056870a54600b00221bbb63aeamr4661920oal.16.1710269024388; Tue, 12 Mar 2024 11:43:44 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1710269024; cv=pass; d=google.com; s=arc-20160816; b=U9+IhShvf/zvBxXoYFQqU6HqNIJp9KhIJ2GrKAqcrMx3fqIkGZ7GI7eYM/TAu4Ab85 EkZqt+8ApQcUo93Ws8A/XWPeM0/pAp5155A1xLgdvpREqIlQMmEoatkHnFrt2bfZM0lK vma67JbeRyZVpV9XlvZczADynvly/7C6o1fpEhiJuUMRJx7x5A9yhmOttoJ264iYK/Me H5iMfuJ/PQ0kAdx5CJmnK8tJzLIOy+0XNZh11JaUzsXLkGMiI2awEyT3o4NFlLnq6aSX BqinghVSCF6BeEfWxMOHEQcBHf5iFnCPenjwaz7R6Yj0cMyguodfFjDZqlgwhuNYN0Ow yWzQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=rt7f+q+DTeqWhNoRdGZRpb3drbVDdbSe8MdSG9RnJy4=; fh=D7D9VPnf5V4bCXC/fpk5m0w/InRvS3/gtGFbKE5LP8A=; b=CsJz3Vpol/hAVGsezO8PUADyfog346RL2yLhH6BQJv3dA4tX5pt8zYF8K8zZ9btirC NSQB8p+ID0+zOMKymYj9aaVRKSN2A5GOWrJJ4AeDLbCITKbR94RTce8ITb8D8w5KR8Iz BVkoaFVZqC3PVjI+s01wW4hRR7oHJHAJYhu9Y8PjNbbiqssNyyJgbuDSEypEUPkTfT3H mphURkMtLRe+CxqWOi3o7/fYEOWM25djQqbgoAwebOrJQg7siFBzLWyCL0qRonj5Ly5G /NTvRFM1SwXD536l16LyvLyeR6jv08Fy6dYREIq9N0KpPmxE1Ynz7RARPTdSAWDBKxy0 a0lA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b="fiY/wqP4"; arc=pass (i=1 spf=pass spfdomain=cmpxchg.org dkim=pass dkdomain=cmpxchg-org.20230601.gappssmtp.com dmarc=pass fromdomain=cmpxchg.org); spf=pass (google.com: domain of linux-kernel+bounces-100682-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-100682-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id b5-20020a637145000000b005dc507e8d15si7638411pgn.820.2024.03.12.11.43.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 11:43:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-100682-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b="fiY/wqP4"; arc=pass (i=1 spf=pass spfdomain=cmpxchg.org dkim=pass dkdomain=cmpxchg-org.20230601.gappssmtp.com dmarc=pass fromdomain=cmpxchg.org); spf=pass (google.com: domain of linux-kernel+bounces-100682-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-100682-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 065F5283E0A for ; Tue, 12 Mar 2024 18:43:44 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 737611420A0; Tue, 12 Mar 2024 18:43:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20230601.gappssmtp.com header.i=@cmpxchg-org.20230601.gappssmtp.com header.b="fiY/wqP4" Received: from mail-ot1-f46.google.com (mail-ot1-f46.google.com [209.85.210.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B990079DD4 for ; Tue, 12 Mar 2024 18:43:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.46 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710269018; cv=none; b=d2gDbklxp1AeoBjqyIm2diG2az7+IC2kw8hH0xdWUxHmnHAd0yWiX8Eg13/BVYEYA/jed82lyhoiUrSqQfaqG5KQgpyakI6f2tNiXZ+3C8aS4QBpp9n6XAAMNF89YO+bK/Xl91jKeA9WVvWxyd+O5Ecs9XMJA2UUJh57y6zGHiQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710269018; c=relaxed/simple; bh=hSR8lvPfoO0OHmPBahmqDkYqoISqwAqz4PkxOZKeVjY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=HsawkpFuYl20dbj462b3H5fX0/Ydu9zTs5gF9IGo5B99vH0Y/C2if1bA7y/WHYEvZdhuCKvj7Bf8vwf69YU8oFMRdOS6+fE2YdrZpRRut4TIXeii14maqlZbK0o5ls++iv/JKLIgdEBMLkf5JUwpX98aYhA9im4Z8oqWl5WyjBM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20230601.gappssmtp.com header.i=@cmpxchg-org.20230601.gappssmtp.com header.b=fiY/wqP4; arc=none smtp.client-ip=209.85.210.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Received: by mail-ot1-f46.google.com with SMTP id 46e09a7af769-6e5343b915fso69600a34.1 for ; Tue, 12 Mar 2024 11:43:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1710269014; x=1710873814; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=rt7f+q+DTeqWhNoRdGZRpb3drbVDdbSe8MdSG9RnJy4=; b=fiY/wqP453K5h5yNC3DAHOT8LLOeqKzq+tcDQI+gOHs4UTefIJzBNMgDsGLd/PXwBH mHQez+QUCNxa2cpv/f9u+/Zr1OkCKyZDbVTAzWuRR7twteCRRxcju8UB0PuHsxyznwhL VD1dJFNTLNreW+myY5uurrJ+Wekp9KwwAtBHmC3rwVSR62Fx6Tnv4IksPzDhpyQNKazj dJEYP2GGVEjdP4plCB6ZlzSk6Cm8cnMIGBO047C+TSGwmFst4K7q3oaQmoyf/dn5ZjWp YipXjGg1m3n5kLrOuk1y/gc2pVwaT/Q87n1Ap7Xwkyks9OmSyDFrjUxWpNO1K2Xwddqk d9Dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710269014; x=1710873814; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=rt7f+q+DTeqWhNoRdGZRpb3drbVDdbSe8MdSG9RnJy4=; b=XtnefOeFkzaNNHrG7DVsViY0fsTaGAt+VapVGLrPFES1aqI0+agkE3iWJC897RgqqR vZ89yUhoDyCod5gB2zqFQqavwH/hibZtF9rjeDGvmeBpWuAUQ8X6UU42uacBDAasHvDT arU7uHLB8XzjA3G8geO6ZDdfCrQ3aTcgaW5y1qT9slrHHTSwTxVNF4BoC4XJXz/xJGpi o1rCE70abo66QcthzycRpH3UX8upG3tH3VUOH22Q5sMkgiLKRqRAvoVg4PCVNIy6MTq2 OEg8k+kIhma3PixobxhGGnmy2AJA6zpzjOI1YEvC6KH4uClHdhkEk2b1QXE3kdmaao86 T+VQ== X-Forwarded-Encrypted: i=1; AJvYcCUb8mZykIShjXVsz6NRxeQULmQeCzZmQL14vuBTD2L/gdEQhvHQjmtAdz6rlPPgyQU+RtXcMdyrmlwmKpueQxqm+sg3lvNQUrfZw59l X-Gm-Message-State: AOJu0YwJBbntin08SqtKBVcIl4b4AkhQ8Sa3HWWDVEqYxB7atWkHWmKO VaI6xM3MhUvrNZ4FMc5ya5D2qcZ5z7z/xJb/2/nXzJClpCmpkv05t+w4L04oHJk= X-Received: by 2002:a9d:7841:0:b0:6e4:8fa3:d861 with SMTP id c1-20020a9d7841000000b006e48fa3d861mr4294276otm.34.1710269014573; Tue, 12 Mar 2024 11:43:34 -0700 (PDT) Received: from localhost (2603-7000-0c01-2716-da5e-d3ff-fee7-26e7.res6.spectrum.com. [2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with ESMTPSA id qp10-20020a05620a388a00b0078870b3ad29sm1966546qkn.126.2024.03.12.11.43.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 11:43:33 -0700 (PDT) Date: Tue, 12 Mar 2024 14:43:29 -0400 From: Johannes Weiner To: Chris Li Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yosry Ahmed , Nhat Pham , "Matthew Wilcox (Oracle)" , Chengming Zhou , Barry Song , Barry Song , Chengming Zhou Subject: Re: [PATCH v6] zswap: replace RB tree with xarray Message-ID: <20240312184329.GA3501@cmpxchg.org> References: <20240312-zswap-xarray-v6-1-1b82027d7082@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240312-zswap-xarray-v6-1-1b82027d7082@kernel.org> On Tue, Mar 12, 2024 at 10:31:12AM -0700, Chris Li wrote: > Very deep RB tree requires rebalance at times. That > contributes to the zswap fault latencies. Xarray does not > need to perform tree rebalance. Replacing RB tree to xarray > can have some small performance gain. > > One small difference is that xarray insert might fail with > ENOMEM, while RB tree insert does not allocate additional > memory. > > The zswap_entry size will reduce a bit due to removing the > RB node, which has two pointers and a color field. Xarray > store the pointer in the xarray tree rather than the > zswap_entry. Every entry has one pointer from the xarray > tree. Overall, switching to xarray should save some memory, > if the swap entries are densely packed. > > Notice the zswap_rb_search and zswap_rb_insert always > followed by zswap_rb_erase. Use xa_erase and xa_store > directly. That saves one tree lookup as well. > > Remove zswap_invalidate_entry due to no need to call > zswap_rb_erase any more. Use zswap_free_entry instead. > > The "struct zswap_tree" has been replaced by "struct xarray". > The tree spin lock has transferred to the xarray lock. > > Run the kernel build testing 10 times for each version, averages: > (memory.max=2GB, zswap shrinker and writeback enabled, > one 50GB swapfile, 24 HT core, 32 jobs) > > mm-9a0181a3710eb xarray v5 > user 3532.385 3535.658 > sys 536.231 530.083 > real 200.431 200.176 This is a great improvement code and complexity wise. I have a few questions and comments below: What kernel version is this based on? It doesn't apply to mm-everything, and I can't find 9a0181a3710eb anywhere. > @@ -1555,28 +1473,35 @@ bool zswap_store(struct folio *folio) > insert_entry: > entry->swpentry = swp; > entry->objcg = objcg; > - if (objcg) { > - obj_cgroup_charge_zswap(objcg, entry->length); > - /* Account before objcg ref is moved to tree */ > - count_objcg_event(objcg, ZSWPOUT); > - } > > - /* map */ > - spin_lock(&tree->lock); > /* > * The folio may have been dirtied again, invalidate the > * possibly stale entry before inserting the new entry. > */ The comment is now somewhat stale and somewhat out of place. It should be above that `if (old)` part... See below. > - if (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { > - zswap_invalidate_entry(tree, dupentry); > - WARN_ON(zswap_rb_insert(&tree->rbroot, entry, &dupentry)); > + old = xa_store(tree, offset, entry, GFP_KERNEL); > + if (xa_is_err(old)) { > + int err = xa_err(old); > + if (err == -ENOMEM) > + zswap_reject_alloc_fail++; > + else > + WARN_ONCE(err, "%s: xa_store failed: %d\n", > + __func__, err); > + goto store_failed; No need to complicate it. If we have a bug there, an incorrect fail stat bump is the least of our concerns. Also, no need for __func__ since that information is included in the WARN: if (xa_is_err(old)) { WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err); zswap_reject_alloc_fail++; goto store_failed; } I think here is where that comment above should go: /* * We may have had an existing entry that became stale when * the folio was redirtied and now the new version is being * swapped out. Get rid of the old. */ > + if (old) > + zswap_entry_free(old); > + > + if (objcg) { > + obj_cgroup_charge_zswap(objcg, entry->length); > + /* Account before objcg ref is moved to tree */ > + count_objcg_event(objcg, ZSWPOUT); > } > + > if (entry->length) { > INIT_LIST_HEAD(&entry->lru); > zswap_lru_add(&zswap.list_lru, entry); > atomic_inc(&zswap.nr_stored); > } > - spin_unlock(&tree->lock); We previously relied on the tree lock to finish initializing the entry while it's already in tree. Now we rely on something else: 1. Concurrent stores and invalidations are excluded by folio lock. 2. Writeback is excluded by the entry not being on the LRU yet. The publishing order matters to prevent writeback from seeing an incoherent entry. I think this deserves a comment. > /* update stats */ > atomic_inc(&zswap_stored_pages); > @@ -1585,6 +1510,12 @@ bool zswap_store(struct folio *folio) > > return true; > > +store_failed: > + if (!entry->length) { > + atomic_dec(&zswap_same_filled_pages); > + goto freepage; > + } It'd be good to avoid the nested goto. Why not make the pool operations conditional on entry->length instead: store_failed: if (!entry->length) atomic_dec(&zswap_same_filled_pages); else { zpool_free(zswap_find_zpool(...)); put_pool: zswap_pool_put(entry->pool); } freepage: Not super pretty either, but it's a linear flow at least.