Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp1521608lqp; Fri, 22 Mar 2024 19:41:05 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWWt4/Wz1jJ4BUg4uV1igrLzirux/1aTApRccvBfdHgxhHHGkid+ubmypg/wGb6Xhfcy1iVfty0sW37QWLjdqkNtHuzWJ1ABb3pae68SQ== X-Google-Smtp-Source: AGHT+IFuEX1dbQspxwAMZbD3NYHf5aaCF8pxB4azwjR7hWeG95Gd7C2QYdxaPJN2uloThpYsUvv3 X-Received: by 2002:a17:90a:1305:b0:2a0:4c3b:3454 with SMTP id h5-20020a17090a130500b002a04c3b3454mr1045603pja.47.1711161664760; Fri, 22 Mar 2024 19:41:04 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711161664; cv=pass; d=google.com; s=arc-20160816; b=e/YwmqgckWCUcZCTweR1ewiQQkRlIfAvYj1wTqNbmB3KCS2480o+rz+Z1Gf6FmTb3B JZKiCD4wQyw6R/llqe1Grc1wKhoucOZZelh0Ddm/9dnN3tnNN1fFNFqWpKdLd48TCQjB UwerHWyjB8OZMyGTpY1rr5jwKEVlvGmCV5ptUMS1LSc9MESbxiLBaJCZLIwMitfzpNVu BQeEU0VXiXWUMabWczMYMyyCIt1iEOPY4jL0UeLWqLZ5iT739uELOOMqzOMXO5aCAxQC zOC2zrMbI/fg+S3RYueTddR2qnsTMmXlsol8vKiTf2yD4HEdlYf8krazpvU/rJ69YddN gyIw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=ubB5GX00NOgOmV4nuWF8+vu4MS8DTronHMCNxhdTyUg=; fh=pcTJ6oN02+F82vnZx5Pd1Nze+RQjh/8EgfogA8dWP64=; b=0HFIm5wKknFl1GNnxmjkc8qc3GAzEvBgCeRwhAQRmOMm45eU/opD7yFF/82FIM9SWg zGRRBbVG36b/s18YHhQcyv0E46jIg92kYBtXzHKdadx5CvPE3sH4QZcvywGMSxtj4Phz yiDdzW/FLkCkwCejCzc5ry7y0Rjyyx55lLRAxzfQ9D7zlM0vQFXhZ+/SH4M8LkugkZW8 VUKSewktd7lVx7Ce3DH+1LhHkt2zNtT+2TKZeNzT/N6NuZPea7yb2nSz/ErtZfiupvL6 5G/S1SWmRrcE+/Q4Lq15YvHm7WYHjq/MjJWeH0dMLu3sXGiXKTozdPTcgn6kguKyj5az 6j1A==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=bDR22eLV; arc=pass (i=1 spf=pass spfdomain=bytedance.com dkim=pass dkdomain=bytedance.com dmarc=pass fromdomain=bytedance.com); spf=pass (google.com: domain of linux-kernel+bounces-112189-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-112189-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id a5-20020a17090a854500b0029c75618daesi6811670pjw.25.2024.03.22.19.41.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Mar 2024 19:41:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-112189-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=bDR22eLV; arc=pass (i=1 spf=pass spfdomain=bytedance.com dkim=pass dkdomain=bytedance.com dmarc=pass fromdomain=bytedance.com); spf=pass (google.com: domain of linux-kernel+bounces-112189-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-112189-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 6A3D1283C2C for ; Sat, 23 Mar 2024 02:41:04 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 469ED1388; Sat, 23 Mar 2024 02:40:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="bDR22eLV" Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com [209.85.208.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95BD2372 for ; Sat, 23 Mar 2024 02:40:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.179 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711161657; cv=none; b=mbNkb1/+rCKOOvPI7U3KxTgTehxXemGvA4Pquo6OPCD+ORromsVFLFaVVGIzPFYVPQRnWid9QYWqQm8wDWAt0XWi3MeNgYYxykKA/+cjH1zTxdMAqW4A4e7r3PC/fzexEZ1fWgruRyqbVQsgQMmmObYaynUPvARKoFuxbzHLJv4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711161657; c=relaxed/simple; bh=uFP153Q5zmrb80cYlL0vZaMRUeDucvJmZ/hwtBqIa4M=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=XbLKugTChjSdIWvNlxEly0nAVQE1WSnFvAbnaHI/jaXPgLBBi/gsj/Gpf0f/HIMImdtwMe+CldvufH2TW81pUKDJSGEtC85aq2MluOSx+K8Z4FWNWmWUZkLY1lbnnq1E6eDPo+7icRCr4q8bkduSF+wTZWH4EU3NhA7v81KoWTI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=bDR22eLV; arc=none smtp.client-ip=209.85.208.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Received: by mail-lj1-f179.google.com with SMTP id 38308e7fff4ca-2d48ba4d5ecso34131171fa.1 for ; Fri, 22 Mar 2024 19:40:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1711161653; x=1711766453; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ubB5GX00NOgOmV4nuWF8+vu4MS8DTronHMCNxhdTyUg=; b=bDR22eLVuz9F7vlR0i95/iNdGHVE4Jq2iDkwWiqPEV167QkNeC6ylefrBqVm1rcFz8 iPNlG/EWb2p0E3Iir1Sy3objM90X5A4XS73RRoaoKJlT7JjGWKLSImT7llIDtFRBu+xo M45aSy5rYR7dSRIJCxVB+JUbq8q9XDp1xzJHHGEeOPFpEAvUpHbQzYB4tuyXJ2YWhdK/ bUh00MEarrnNoqtFFCeTRhUttRhAJqo9CeyxFs9ybYQXGUm/G+KNJ0eNuTYwmamu6fmk mYADBFXE66aNs+5z50UpuKQQkFNqhqE2Yc1dASbngm03ZXceBgsMyfbzaH7QaVQdKkei ql2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711161653; x=1711766453; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ubB5GX00NOgOmV4nuWF8+vu4MS8DTronHMCNxhdTyUg=; b=jL+HUY2gHNhEay7reT8u8z/cqBbPlhhXwHcMfzi3lMUgJFWS5qoveWUR7VKGB+HcXx SCNp8k/PQyXa45HgfDb3C9Jk15GwMtzZCiai/0nt+JPinmB4h91UfM6SKkcd/IH1e0q2 wC3PxCfu7IeJijiDD9W0q9IWgJx0gP9mOQJpMxUY9FvpzYkh0SgH5IE2WUQDVHVM9A2o OFxDUJBSqH2pgqdks4PFRbLcSxwyaWoj37AQBLj4q//yCCglvKNNd4je0iABPaWhtdTU IM02LBeymZmmWzt5DlLHPNNks4/t17TJuAl296DztMNFuap9ZOHSYHrFqDYKDN8MuFqx EYlQ== X-Forwarded-Encrypted: i=1; AJvYcCXm9/bYvpqozQnY6rphtq4vR3cZWj+Vw82ZxVdrAEUQkA26Rl3wi3hRhDXYnXylJJ0ykxHqBI6pwbj3WGFFhd+Byi0Dbzh8zsFmvw1k X-Gm-Message-State: AOJu0YwFvXarIJIfpEiXbc5o+qqjNCDzBB0kKQ/wNiXEQzeDbMTAokm9 Dpgd4LvYCE/K8cJufOjvw5vm1K+13fUMP+7CLGsJ2FiVZpUbrpgaT/zxtRFE8NggJjp4tZttHJh XGovJHSX1/WHaIoMuCdl1jUPOMSLhdRDGFerToA== X-Received: by 2002:a19:6919:0:b0:513:a2d0:28c7 with SMTP id e25-20020a196919000000b00513a2d028c7mr201586lfc.16.1711161652556; Fri, 22 Mar 2024 19:40:52 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240322163939.17846-1-chengming.zhou@linux.dev> <20240322234826.GA448621@cmpxchg.org> <20240323015543.GB448621@cmpxchg.org> In-Reply-To: From: Zhongkun He Date: Sat, 23 Mar 2024 10:40:38 +0800 Message-ID: Subject: Re: [External] Re: [RFC PATCH] mm: add folio in swapcache if swapin from zswap To: Yosry Ahmed Cc: Johannes Weiner , Barry Song <21cnbao@gmail.com>, chengming.zhou@linux.dev, nphamcs@gmail.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Mar 23, 2024 at 10:03=E2=80=AFAM Yosry Ahmed wrote: > > On Fri, Mar 22, 2024 at 6:55=E2=80=AFPM Johannes Weiner wrote: > > > > On Fri, Mar 22, 2024 at 05:14:37PM -0700, Yosry Ahmed wrote: > > > [..] > > > > > > I don't think we want to stop doing exclusive loads in zswap du= e to this > > > > > > interaction with zram, which shouldn't be common. > > > > > > > > > > > > I think we can solve this by just writing the folio back to zsw= ap upon > > > > > > failure as I mentioned. > > > > > > > > > > Instead of storing again, can we avoid invalidating the entry in = the > > > > > first place if the load is not "exclusive"? > > > > > > > > > > The reason for exclusive loads is that the ownership is transferr= ed to > > > > > the swapcache, so there is no point in keeping our copy. With an > > > > > optimistic read that doesn't transfer ownership, this doesn't > > > > > apply. And we can easily tell inside zswap_load() if we're dealin= g > > > > > with a swapcache read or not by testing the folio. > > > > > > > > > > The synchronous read already has to pin the swp_entry_t to be saf= e, > > > > > using swapcache_prepare(). That blocks __read_swap_cache_async() = which > > > > > means no other (exclusive) loads and no invalidates can occur. > > > > > > > > > > The zswap entry is freed during the regular swap_free() path, whi= ch > > > > > the sync fault calls on success. Otherwise we keep it. > > > > > > > > I thought about this, but I was particularly worried about the need= to > > > > bring back the refcount that was removed when we switched to only > > > > supporting exclusive loads: > > > > https://lore.kernel.org/lkml/20240201-b4-zswap-invalidate-entry-v2-= 6-99d4084260a0@bytedance.com/ > > > > > > > > It seems to be that we don't need it, because swap_free() will free > > > > the entry as you mentioned before anyone else has the chance to loa= d > > > > it or invalidate it. Writeback used to grab a reference as well, bu= t > > > > it removes the entry from the tree anyway and takes full ownership = of > > > > it then frees it, so that should be okay. > > > > > > > > It makes me nervous though to be honest. For example, not long ago > > > > swap_free() didn't call zswap_invalidate() directly (used to happen= to > > > > swap slots cache draining). Without it, a subsequent load could rac= e > > > > with writeback without refcount protection, right? We would need to > > > > make sure to backport 0827a1fb143f ("mm/zswap: invalidate zswap ent= ry > > > > when swap entry free") with the fix to stable for instance. > > > > > > > > I can't find a problem with your diff, but it just makes me nervous= to > > > > have non-exclusive loads without a refcount. > > > > > > > > > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > > index 535c907345e0..686364a6dd86 100644 > > > > > --- a/mm/zswap.c > > > > > +++ b/mm/zswap.c > > > > > @@ -1622,6 +1622,7 @@ bool zswap_load(struct folio *folio) > > > > > swp_entry_t swp =3D folio->swap; > > > > > pgoff_t offset =3D swp_offset(swp); > > > > > struct page *page =3D &folio->page; > > > > > + bool swapcache =3D folio_test_swapcache(folio); > > > > > struct zswap_tree *tree =3D swap_zswap_tree(swp); > > > > > struct zswap_entry *entry; > > > > > u8 *dst; > > > > > @@ -1634,7 +1635,8 @@ bool zswap_load(struct folio *folio) > > > > > spin_unlock(&tree->lock); > > > > > return false; > > > > > } > > > > > - zswap_rb_erase(&tree->rbroot, entry); > > > > > + if (swapcache) > > > > > + zswap_rb_erase(&tree->rbroot, entry); > > > > > > On second thought, if we don't remove the entry from the tree here, > > > writeback could free the entry from under us after we drop the lock > > > here, right? > > > > The sync-swapin does swapcache_prepare() and holds SWAP_HAS_CACHE, so > > racing writeback would loop on the -EEXIST in __read_swap_cache_async()= . > > (Or, if writeback wins the race, sync-swapin fails on swapcache_prepare= () > > instead and bails on the fault.) > > > > This isn't coincidental. The sync-swapin needs to, and does, serialize > > against the swap entry moving into swapcache or being invalidated for > > it to be safe. Which is the same requirement that zswap ops have. > > You are right. Even if swap_free() isn't called under SWAP_HAS_CACHE's > protection, a subsequent load will also be protected by SWAP_HAS_CACHE > (whether it's swapped in with sync swapin or throught the swapcache) > -- so it would be protected against writeback as well. Now it seems > like we may have been able to drop the refcount even without exclusive > loads..? > > Anyway, I think your fix is sound. Zhongkun, do you mind confirming > that the diff Johannes sent fixes the problem for you? OK=EF=BC=8C I will try it and come back in a few hours. Thanks for the solution, it sounds great.