Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp1266213rdb; Fri, 19 Jan 2024 13:32:43 -0800 (PST) X-Google-Smtp-Source: AGHT+IFYqGq49FMQtx8t+Pk7261TrlyhBmo7a6/m7ItFoO2cjYjk5jpSbJN3gU8+zOGY3YQ0Q3xD X-Received: by 2002:a17:906:f903:b0:a29:28f6:6aca with SMTP id lc3-20020a170906f90300b00a2928f66acamr165023ejb.123.1705699963611; Fri, 19 Jan 2024 13:32:43 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705699963; cv=pass; d=google.com; s=arc-20160816; b=TVxHR5SQqq6cr5oefh+qs6L1+FUd62tD9BXi+v8No/qs2cS09uY8mECR2gkngJ8gSI czWTCapTCeWhB4UlMzuDKfFiqZTtIQgG/0gEAzRYlV2Lr24c+8/8y8E36FupheuMiBg1 ej3ManC8VO9f8bMAf1Hn2Mldoqz6/4fhsZDquipZwUYCnMq91hsZCtVroSJ74OaNVvbW 9v4U7dDLQRe6wAjKdax5m6OPT9KdBwlpjwAUlE+E9JQT60NUH7sfItlQfjA4Le9vhbg4 C7KB0RardR+ySkLQYESVYHmncBwTPAefqVZUB/Ges2UXSAIMntM+QRN722HffYA/N7Dj ItnQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=/OBs+QAUNZ4/erD6EAAfbaSZZimQa6EvqCj2fXM7aGs=; fh=3G+LrlaXkaxjxsos+peGn2UxPridobGTAuxKHlBj55Y=; b=dAGVHzVz4hvQk6+8DnuqsinWg7beq/HzFZMbGgPUQR3I41Yq8JPt8+RXYXULPdvZzZ 0y/1PxHPCIrHjGpcTib/tuZHu0zHvhf1o/HP2qsd6KGSxAp0Y29RuOgHZcld43rQn52d XtyeS0ZC0mckwt0QXXk0ujb6uJVp4L94yltqkkcVsg7Vjg2nt8+RyOFmA29ggpORDBRh JDVx2IagOc1hHZV7ubzqgyQFV+5qBgMuHduJ0sLLIgF0wr1h+KhveiyEoVG+vDzzcVje gTNUHyK2TIU0/IKbHKvryQI4RaiebCG7X7zW8ZwZF7C/8OVxa48qQWMyX9CueF74vgcN Vagw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=d4mWa9Em; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-31580-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31580-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id xo2-20020a170907bb8200b00a2ebddf8e75si3180729ejc.520.2024.01.19.13.32.43 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jan 2024 13:32:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-31580-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=d4mWa9Em; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-31580-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31580-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 3A17F1F236C1 for ; Fri, 19 Jan 2024 21:32:43 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8379C59150; Fri, 19 Jan 2024 21:32:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="d4mWa9Em" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90A8158123 for ; Fri, 19 Jan 2024 21:31:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705699919; cv=none; b=JA88Et58JiZIIGVbnmhKyFeqvVm881QNLnh7o+xp61gzyxJ+UOJ5VR1c5xn1sInFnBBAmbjZRualr9OAvMnlSKGp1KS7MKk2bSR/4Cj3QsxCOZ/qy3iq3/VoiEUbQj0nKpubAiL/p0s3kpJ3SfEFF+asTOfVuYYJDdJmKTkgdvM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705699919; c=relaxed/simple; bh=FFuVjU3mY+IzP8h5t5jS3m765EvdKkoQi/+p+qZCu84=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=i9Bbosh8YBBZZLiTp90Gf5utdKCAalXc3Nsiae39j9U+0zwdTxYZwiSvMhfA2Kekx8lD3BHLy50oVuvv1SjnlO9vPCK/lF2Jsg4cF1CYZ8Mv70lO6Tc6/BBk1BxsQ50tnECrJB0FTtfNESj2P3ZIapon0hTXL9b8zjkH+yCifik= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=d4mWa9Em; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1ABEEC43394 for ; Fri, 19 Jan 2024 21:31:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705699919; bh=FFuVjU3mY+IzP8h5t5jS3m765EvdKkoQi/+p+qZCu84=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=d4mWa9EmQX4yuKB4EXi5g51uWDIi2sdzsdWZbaRuxcfzcH8VJuuklFBO+SA6fMQs2 xxgPsDsc7/TeEiaZKfsUFn0T1yIQ2hJmHneKa3m5psG+BwhPd0v2aI9/tDM4OIyD2v lDExuJ9Wcz1nTtEA4NNQtYhLTeJfeJM7SS61NAF1lGCjR1REb/utepZk+W9YCnrkeo Qzd3Q8DZ/3xZma2JFIHBI5IWLlAU7RYQLXCm9v5VBtzsdoYfUb7XRHdWQQd/DKtAJq EwbM8eb7unZtBGk3ULLQLVbHhD+Kk8qE6FNqrpIld4zXJ8OIMmn0JYa6TFgOqXqSLY vuZm+MUk5unjw== Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-1d5f5a2e828so9477925ad.3 for ; Fri, 19 Jan 2024 13:31:59 -0800 (PST) X-Gm-Message-State: AOJu0YyXveSeTou+tz9TIVUZnz9QnhTr/EKNFc5+nkQcSgeDpLGUgmZb b4uo3HtdYDIyX8XSTJdkt8OuN1HRZtyetFBmxqIlwHSJy7SlVezd3I3p1nvaRbwC5ZXIoF85oEE XyG2pEwzs75F2S+KQj9CrDHC3mL/Sd4GxyxPZ X-Received: by 2002:a17:90b:108c:b0:28a:f0bc:2a9f with SMTP id gj12-20020a17090b108c00b0028af0bc2a9fmr342582pjb.21.1705699918523; Fri, 19 Jan 2024 13:31:58 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240117-zswap-xarray-v1-0-6daa86c08fae@kernel.org> <20240117-zswap-xarray-v1-2-6daa86c08fae@kernel.org> In-Reply-To: From: Chris Li Date: Fri, 19 Jan 2024 13:31:46 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 2/2] mm: zswap.c: remove RB tree To: Yosry Ahmed Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, =?UTF-8?B?V2VpIFh177+8?= , Yu Zhao , Greg Thelen , Chun-Tse Shao , =?UTF-8?Q?Suren_Baghdasaryan=EF=BF=BC?= , Brain Geffon , Minchan Kim , Michal Hocko , Mel Gorman , Huang Ying , Nhat Pham , Johannes Weiner , Kairui Song , Zhongkun He , Kemeng Shi , Barry Song , "Matthew Wilcox (Oracle)" , "Liam R. Howlett" , Joel Fernandes , Chengming Zhou Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Jan 19, 2024 at 11:37=E2=80=AFAM Yosry Ahmed wrote: > > > I think using the xas_* APIs can be avoided here. The only reason we > > > need it is that we want to check if there's an existing entry first, > > > and return -EEXIST. However, in that case, the caller will replace it > > > anyway (and do some operations on the dupentry): > > > > We might be able to for the insert case if we don't mind changing the > > code behavior a bit. My original intent is to keep close to the > > original zswap code and not stir the pot too much for the xarray > > replacement. We can always make more adjustment once the RB tree is > > gone. > > I don't see how this changes code behavior though. The current code in > zswap_store() will do the following: I am referring to the log and update counter happening after the zswap mapping was updated. Maybe nobody actually cares about that behavior difference. In my mind, there is a difference. > > - Hold the tree lock to make sure no one else modifies it. > - Try to insert, check if there is already a dupentry at the index and > return -EEXIST. > - Warn, increment zswap_duplicate_entry, and invalidate the dupentry. > - Try to insert again (this should always succeed since we are holding > the lock). > > What I am proposing is: > - zswap_xa_insert() is a thin wrapper around xa_store() (or we can > remove it completely). > - zswap_store() does the following: > - Use zswap_xa_insert() and check if there is a returned dupentry. > - Warn, increment zswap_duplicate_entry, and invalidate the dupentry. > > Either way, we always place the entry we have in the tree, and if > there is a dupentry we warn and invalidate it. If anything, the latter > is more straightforward. > > Am I missing something? No, that is what I would do if I have to use the xa_* API. > > > > > > > > } > > > > > > > > static bool zswap_erase(struct zswap_tree *tree, struct zswap_entr= y *entry) > > > > { > > > > + struct zswap_entry *e; > > > > pgoff_t offset =3D swp_offset(entry->swpentry); > > > > - if (!RB_EMPTY_NODE(&entry->rbnode)) { > > > > - struct zswap_entry *old; > > > > - old =3D xa_erase(&tree->xarray, offset); > > > > - BUG_ON(old !=3D entry); > > > > - rb_erase(&entry->rbnode, &tree->rbroot); > > > > - RB_CLEAR_NODE(&entry->rbnode); > > > > - return true; > > > > - } > > > > - return false; > > > > + XA_STATE(xas, &tree->xarray, offset); > > > > + > > > > + do { > > > > + xas_lock_irq(&xas); > > > > + do { > > > > + e =3D xas_load(&xas); > > > > + } while (xas_retry(&xas, e)); > > > > + if (xas_valid(&xas) && e !=3D entry) { > > > > + xas_unlock_irq(&xas); > > > > + return false; > > > > + } > > > > + xas_store(&xas, NULL); > > > > + xas_unlock_irq(&xas); > > > > + } while (xas_nomem(&xas, GFP_KERNEL)); > > > > + return !xas_error(&xas); > > > > } > > > > > > Same here, I think we just want: > > > > > > return !!xa_erase(..); > > > > For the erase case it is tricky. > > The current zswap code does not erase an entry if the tree entry at > > the same offset has been changed. It should be fine if the new entry > > is NULL. Basically some race to remove the entry already. However, if > > the entry is not NULL, then force resetting it to NULL will change > > behavior compared to the current. > > I see, very good point. I think we can use xa_cmpxchg() and pass in NULL? > That is certainly possible. Thanks for bringing it up. Let me try to combine the tree->lock with xarray lock first. If xa_cmpxchg() can simplify the result there, I will use it. > Handling large folios in zswap is a much larger topic that involves a > lot more than this xa_* vs. xas_* apis dispute. Let's not worry about > this for now. Ack. One more reason to use the XAS interface is that zswap currently does multiple lookups on typical zswap_load(). It finds entries by offset, for the entry (lookup one). Then after folio install to swap cache, it deletes the entry, it will performan another lookup to delete the entry (look up two). Using XAS might be able to cache the node location for the second lookup to avoid the full node walk. That is not in my current patch and can be a later improvement patch as well. Chris