Received: by 2002:ab2:7903:0:b0:1fb:b500:807b with SMTP id a3csp773556lqj; Sun, 2 Jun 2024 23:20:09 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUf3TZdvpChJnOUNUxLR8E2Lifu+pmNsCzo7IxvrOcIWbrzVBRqu+t2vNp1fGdpaVH+OfinTFdBLMVhWa2VDlkJ7vzALPjgVpOPcXHLpw== X-Google-Smtp-Source: AGHT+IHFuqzWAGjuBJTTe/3ORUFCx/4O4mc4+hqTN66QbKUy7XZXoswssFuO8DaZP+TJZPwkTiQx X-Received: by 2002:ac2:5f81:0:b0:529:b632:ae4e with SMTP id 2adb3069b0e04-52b89564f8amr5556481e87.2.1717395609140; Sun, 02 Jun 2024 23:20:09 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717395609; cv=pass; d=google.com; s=arc-20160816; b=b1CXGBO5NFm8T02oCcfB6o8s0KF9p8PWBvNnu/YQEY4dTUc10NXSY9gR6FvXgsm5U1 WE/OxEtR03eh+9bXwqwFKFJ4zI3GVFEmkyZyJvOspTgRbWUhvgRy7lUmF0+BeVqPf6Zv vvVJPFoFLdr092S/TaMPcE5eanNLCuVp1/QC7BFdw3ml8th0pdx4cMG7mTUSridmA5g2 u0etQIqbqExG+7T5O4/HK2EYWkw0hcAcohLRHMpTP/4SIJHxsz1lt2itlLYtzYjsIISb KlduPLt5mSG2oUSewhJCjE+PidegneJNSX//oLlZEn3PS8KcGxIsqL46q4GTfovamOG+ R2Iw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=S3PaakJPuHRQHe3+zIiR5vamoIk168HLWLkXlrYdn70=; fh=TM0ioJ5Wb5steVglHxmonfGb4mEYFHTEf06A9j3WNxo=; b=tBv6Xo5EMYtKthrImkeP1oBbA5rIsOqwl2kGcbYI7jucj7+3snyhezPUgmO4tvGI8e MQRdfBZk/d9G5b+L728col1vBPP67O12+q1EkMD9zA8gH8Rwms0TxiZ1lb0p+TgXplWD UHLgv9QYThM4LlWtxWwsardQdH5k8U5S+yxIXkgr3pBignkufL0yOFDhfy4qqp032bDZ kKbtOGi4ffECir7T2Hc6JKCk4Js7K4x46y0z4BRfhoDllwm2cwoh/s1EqOjWQzz4juW2 H4q9P4OU/48XoQwiTw6kXfuQ6jbvqYaYFwrpHuEr5qiRd1YSBjQ/stsgylSLzDF69V6G 5W4A==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=R8RO5Hhw; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-198634-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-198634-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id a640c23a62f3a-a6919dac4dasi40195866b.811.2024.06.02.23.20.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Jun 2024 23:20:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-198634-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=R8RO5Hhw; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-198634-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-198634-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id A97221F21068 for ; Mon, 3 Jun 2024 06:20:08 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 87D3036B17; Mon, 3 Jun 2024 06:19:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="R8RO5Hhw" Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [95.215.58.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73588282ED for ; Mon, 3 Jun 2024 06:19:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.176 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717395591; cv=none; b=SfxaUemQkU7JU2t1EttACPBPmrdQmN8BubpHI1heoOF0MjSsOJUfBIF7Pi5CJC6JHTi9ddqMPBgicaUbl1XMxi8C7Nb0HS781h48S8cK0iaFdDtbyHODKjB0z/UTvf24rxizwsowGLozsnVZHZmgLTAWEjPo/i35cMSyFuUlKZQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717395591; c=relaxed/simple; bh=Ekm6NYStvaLChzM1LSjnCHNJo+i0mvZjDH0A6+mmR1g=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=U9XRZG6agjUrmjxWGvlvKUd+1/4YOdOlOHfyRbAWbZC6j2uDRzE7MFNwHHLYJYCnyewOMsXwXrQoHPrb/7QLFTsl/bi+miQk99/xwDl1w//0jEZSc2n0tJH584W27PgyOiOs5i/KUO5y+ruIxQETcd7m3/c/zDW+dtCpA6o2kWo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=R8RO5Hhw; arc=none smtp.client-ip=95.215.58.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Envelope-To: yosryahmed@google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1717395587; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=S3PaakJPuHRQHe3+zIiR5vamoIk168HLWLkXlrYdn70=; b=R8RO5HhwU/Uftl4Qyp7mi6KWYUl8eIBjW8t26ie7X7eu9H7429ID+7bZKuQDsMocndid3p I57Jfwsb20tC2z7R/E5LJD0oEuPsQCZxk7tUyQP44pkeklTKqRN40hJH3jBrML0CvmN2v8 a3ifNxVjcxAF+Lh1dStgVburb01w5f4= X-Envelope-To: nphamcs@gmail.com X-Envelope-To: willy@infradead.org X-Envelope-To: akpm@linux-foundation.org X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: david@redhat.com X-Envelope-To: 21cnbao@gmail.com X-Envelope-To: chrisl@kernel.org X-Envelope-To: ryan.roberts@arm.com X-Envelope-To: kasong@tencent.com Message-ID: <9de0ce63-3815-4c1a-91a2-11cb3d526672@linux.dev> Date: Mon, 3 Jun 2024 14:19:17 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH 0/3] mm: zswap: trivial folio conversions Content-Language: en-US To: Yosry Ahmed , Nhat Pham Cc: Matthew Wilcox , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Hildenbrand , Barry Song <21cnbao@gmail.com>, Chris Li , Ryan Roberts , Kairui Song References: <20240524033819.1953587-1-yosryahmed@google.com> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 2024/5/29 03:32, Yosry Ahmed wrote: > On Tue, May 28, 2024 at 12:08 PM Nhat Pham wrote: >> >> On Fri, May 24, 2024 at 4:13 PM Yosry Ahmed wrote: >>> >>> On Fri, May 24, 2024 at 12:53 PM Yosry Ahmed wrote: >>>> >>>> On Thu, May 23, 2024 at 8:59 PM Matthew Wilcox wrote: >>>>> >>>>> On Fri, May 24, 2024 at 03:38:15AM +0000, Yosry Ahmed wrote: >>>>>> Some trivial folio conversions in zswap code. >>>>> >>>>> The three patches themselves look good. >>>>> >>>>>> The mean reason I included a cover letter is that I wanted to get >>>>>> feedback on what other trivial conversions can/should be done in >>>>>> mm/zswap.c (keeping in mind that only order-0 folios are supported >>>>>> anyway). These are the things I came across while searching for 'page' >>>>>> in mm/zswap.c, and chose not to do anything about for now: >>>>> >>>>> I think there's a deeper question to answer before answering these >>>>> questions, which is what we intend to do with large folios and zswap in >>>>> the future. Do we intend to split them? Compress them as a large >>>>> folio? Compress each page in a large folio separately? I can see an >>>>> argument for choices 2 and 3, but I think choice 1 is going to be >>>>> increasingly untenable. >>>> >>>> Yeah I was kinda getting the small things out of the way so that zswap >>>> is fully folio-ized, before we think about large folios. I haven't >>>> given it a lot of thought, but here's what I have in mind. >>>> >>>> Right now, I think most configs enable zswap will disable >>>> CONFIG_THP_SWAP (otherwise all THPs will go straight to disk), so >>>> let's assume that today we are splitting large folios before they go >>>> to zswap (i.e. choice 1). >>>> >>>> What we do next depends on how the core swap intends to deal with >>>> large folios. My understanding based on recent developments is that we >>>> intend to swapout large folios as a whole, but I saw some discussions >>>> about splitting all large folios before swapping them out, or leaving >>>> them whole but swapping them out in order-0 chunks. >>>> >>>> I assume the rationale is that there is little benefit to keeping the >>>> folios whole because they will most likely be freed soon anyway, but I >>>> understand not wanting to spend time on splitting them, so swapping >>>> them out in order-0 chunks makes some sense to me. It also dodges the >>>> whole fragmentation issue. >>>> >>>> If we do either of these things in the core swap code, then I think >>>> zswap doesn't need to do anything to support large folios. If not, >>>> then we need to make a choice between 2 (compress large folios) & >>>> choice 3 (compress each page separately) as you mentioned. >>>> >>>> Compressing large folios as a whole means that we need to decompress >>>> them as a whole to read a single page, which I think could be very >>>> inefficient in some cases or force us to swapin large folios. Unless >>>> of course we end up in a world where we mostly swapin the same large >>>> folios that we swapped out. Although there can be additional >>>> compression savings from compressing large folios as a whole. >>>> >>>> Hence, I think choice 3 is the most reasonable one, at least for the >>>> short-term. I also think this is what zram does, but I haven't >>>> checked. Even if we all agree on this, there are still questions that >>>> we need to answer. For example, do we allocate zswap_entry's for each >>>> order-0 chunk right away, or do we allocate a single zswap_entry for >>>> the entire folio, and then "split" it during swapin if we only need to >>>> read part of the folio? >>>> >>>> Wondering what others think here. >>> >>> More thoughts that came to mind here: >>> >>> - Whether we go with choice 2 or 3, we may face a latency issue. Zswap >>> compression happens synchronously in the context of reclaim, so if we >>> start handling large folios in zswap, it may be more efficient to do >>> it asynchronously like swap to disk. >> >> We've been discussing this in private as well :) >> >> It doesn't have to be these two extremes right? I'm perfectly happy >> with starting with compressing each subpage separately, but perhaps we >> can consider managing larger folios in bigger chunks (say 64KB). That >> way, on swap-in, we just have to bring a whole chunk in, not the >> entire folio, and still take advantage of compression efficiencies on >> bigger-than-one-page chunks. I'd also check with other filesystems >> that leverage compression, to see what's their unit of compression is. > > Right. But I think it will be a clearer win to start with compressing > each subpage separately, and it avoids splitting folios during reclaim > to zswap. It also doesn't depend on the zsmalloc work. > > Once we have that, we can experiment with compressing folios in larger > chunks. The tradeoffs become less clear at that point, and the number > of variables you can tune goes up :) Agree, it's a good approach! And it hasn't any decompression amplification problem. Thanks.