Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp2220612rdh; Tue, 26 Sep 2023 17:07:05 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFB4gaOAyAWrUS9w3CuCOra0QQW9mKG/4ZwbPc3DMAQU6bg57r/OaaIXrBLDgZhqAVn8V+9 X-Received: by 2002:a05:6358:429e:b0:13a:a85b:a4ce with SMTP id s30-20020a056358429e00b0013aa85ba4cemr596385rwc.16.1695773224895; Tue, 26 Sep 2023 17:07:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695773224; cv=none; d=google.com; s=arc-20160816; b=IiJ+nre1+ObH+dWeq73s0b84tDbHXltvd5LgrEIv5OAek/EZGkMObw0uvJHYbo7CYr wAZHv9l7tw9koARX/6Xrd3m4Kbe3wxK8DwOHgk4p+WGDfAyD06kYfJ9bPmVLU9+zgrGR IZFDSbZiaMJ3ks1A9W4RQJkrnfxF8OhJRCHR80d4VgJ9WO/wnv+9DAk6Bpg4u1xlGZg8 aZUz+ViFq4vH21cFbBxoNo9QDsCTJxUB0NoVZkOzWe0k2pe3NrtC0/pyRf4gyEFJXE6N 4+sFZMvMoj6Yic4TBI/3Oj33rPx30OBj/WUze9HF1w5aKYsf8RMFibcLtfCfqwYi44ZZ lFkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=o9U+hyK0DC2msywzl8E5WSklE3JM2O4AFWpd5vWieyg=; fh=qamb2QH60dp6VEo3pCbdIXk7jsG+PcK07nx1WqBycxk=; b=c0MVywVqInGijJDnuRATwR+S8KXD51eS50NbyYDUsSMxHF2blcke6//w/kt47X5V+D SyEBsVwqevQ4IPO85NLfQamxvPcrYW6ACiDVpD7ZjBorNR6vHos5zs40s4Un5ry4gaWz nUJ41KF8wga+Y/fBKITj5Py9r2MEmAR62A9r0ADTmnULR53nV2y63HlD7QW69scWAtAP DnF59B4WSZH84WyH+hz9gE5ebjftWQ+uGwTZKpeUeH8vBGLoFLQTVW21mXvNnuPw4S0C iGtc+xBLwcqkXpSUkKV7/RZAzY95QJVJuDukkeLI85g8F+dRR+gU4YdZ+Cnf+jvhwiGm o4IQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=qZoEA6tN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id k70-20020a638449000000b00577f6b56757si13865448pgd.495.2023.09.26.17.07.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Sep 2023 17:07:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=qZoEA6tN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 1218181972CD; Tue, 26 Sep 2023 11:38:05 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235511AbjIZSiG (ORCPT + 99 others); Tue, 26 Sep 2023 14:38:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235449AbjIZSiF (ORCPT ); Tue, 26 Sep 2023 14:38:05 -0400 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3A20F3 for ; Tue, 26 Sep 2023 11:37:57 -0700 (PDT) Received: by mail-wr1-x431.google.com with SMTP id ffacd0b85a97d-31ff985e292so9199442f8f.1 for ; Tue, 26 Sep 2023 11:37:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695753476; x=1696358276; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=o9U+hyK0DC2msywzl8E5WSklE3JM2O4AFWpd5vWieyg=; b=qZoEA6tNGIf86kFye+vtDlYn0NwynKxS3QN5nciFYrgAmUK+uRSaQHEQf7u5/wDYK6 hM1a3uHc+MQFEMUrpu+yhNvb43T3OaQYEUWV0WpxPtI/hfXeqSFkmdUfDzduZSGp+Otm ssFK8lf3Nog/PDg/eHN12j9h5C5WhIhj73NfA/Gf6OTVzVEjuQO/q9HeWiGx8wNIuV79 b5c2tqq7eoiqdOtCroAKV/rCQ54GcGKOKUiQGl6dZYHIwQCnj5w9oxJdygkOlbQsz7F6 rgphE2wOT/+vIVQdizptmp96tzLuL+J5EsYbha3312iJ2dDyQ1APs0Y32KfikCcuXaC3 mVcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695753476; x=1696358276; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=o9U+hyK0DC2msywzl8E5WSklE3JM2O4AFWpd5vWieyg=; b=nNB4wM6ivWNddnhrcPNIna2KnUi/QYZCLgti4furCFfn4RuNXtc1dIBSkVRP25Qioc aKWsWcnfK0v2S8VNKgTrE7nnRZoRrrXR17vQlNRjV9Bm8rr10XADp0priNRWRljfXI7D YlSbkW2QffFca15vd+rkMxa0B3ZfZU3HqhO3UqNX47RsYybZ9/eBfd4zbjHG5cZcLgzH MJGyer2iP4XtpInSo3KZqFaEM8xUjQaNadCQDm3M9g/92J2BxKhYnzEzHadl9GP7lIYY DHnHjvfzdUbLqfFEhxM14fGfaK6Qm1lqJOnQYXjaKrCwPDm8ctDz2P1gLHDuznuTdToT T6zw== X-Gm-Message-State: AOJu0Yza0yDe1kuijb6ugPhe0c5ITB1uNwxWDpUmgWuj82WcD+3ry/j2 S25I5H9lF1qA3sQflNkn0FI/YER/g3ukgY+FcGRTYA== X-Received: by 2002:a5d:404d:0:b0:31f:e534:2d6f with SMTP id w13-20020a5d404d000000b0031fe5342d6fmr10166471wrp.11.1695753476063; Tue, 26 Sep 2023 11:37:56 -0700 (PDT) MIME-Version: 1.0 References: <20230919171447.2712746-1-nphamcs@gmail.com> <20230919171447.2712746-2-nphamcs@gmail.com> <20230926182436.GB348484@cmpxchg.org> In-Reply-To: <20230926182436.GB348484@cmpxchg.org> From: Yosry Ahmed Date: Tue, 26 Sep 2023 11:37:17 -0700 Message-ID: Subject: Re: [PATCH v2 1/2] zswap: make shrinking memcg-aware To: Johannes Weiner Cc: Nhat Pham , akpm@linux-foundation.org, cerasuolodomenico@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Chris Li Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 26 Sep 2023 11:38:05 -0700 (PDT) On Tue, Sep 26, 2023 at 11:24=E2=80=AFAM Johannes Weiner wrote: > > On Mon, Sep 25, 2023 at 01:17:04PM -0700, Yosry Ahmed wrote: > > +Chris Li > > > > On Tue, Sep 19, 2023 at 10:14=E2=80=AFAM Nhat Pham = wrote: > > > > > > From: Domenico Cerasuolo > > > > > > Currently, we only have a single global LRU for zswap. This makes it > > > impossible to perform worload-specific shrinking - an memcg cannot > > > determine which pages in the pool it owns, and often ends up writing > > > pages from other memcgs. This issue has been previously observed in > > > practice and mitigated by simply disabling memcg-initiated shrinking: > > > > > > https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.co= m/T/#u > > > > > > This patch fully resolves the issue by replacing the global zswap LRU > > > with memcg- and NUMA-specific LRUs, and modify the reclaim logic: > > > > > > a) When a store attempt hits an memcg limit, it now triggers a > > > synchronous reclaim attempt that, if successful, allows the new > > > hotter page to be accepted by zswap. > > > b) If the store attempt instead hits the global zswap limit, it will > > > trigger an asynchronous reclaim attempt, in which an memcg is > > > selected for reclaim in a round-robin-like fashion. > > > > Hey Nhat, > > > > I didn't take a very close look as I am currently swamped, but going > > through the patch I have some comments/questions below. > > > > I am not very familiar with list_lru, but it seems like the existing > > API derives the node and memcg from the list item itself. Seems like > > we can avoid a lot of changes if we allocate struct zswap_entry from > > the same node as the page, and account it to the same memcg. Would > > this be too much of a change or too strong of a restriction? It's a > > slab allocation and we will free memory on that node/memcg right > > after. > > My 2c, but I kind of hate that assumption made by list_lru. > > We ran into problems with it with the THP shrinker as well. That one > strings up 'struct page', and virt_to_page(page) results in really fun > to debug issues. > > IMO it would be less error prone to have memcg and nid as part of the > regular list_lru_add() function signature. And then have an explicit > list_lru_add_obj() that does a documented memcg lookup. I also didn't like/understand that assumption, but again I don't have enough familiarity with the code to judge, and I don't know why it was done that way. Adding memcg and nid as arguments to the standard list_lru API makes the pill easier to swallow. In any case, this should be done in a separate patch to make the diff here more focused on zswap changes. > > Because of the overhead, we've been selective about the memory we > charge. I'd hesitate to do it just to work around list_lru. On the other hand I am worried about the continuous growth of struct zswap_entry. It's now at ~10 words on 64-bit? That's ~2% of the size of the page getting compressed if I am not mistaken. So I am skeptical about storing the nid there. A middle ground would be allocating struct zswap_entry on the correct node without charging it. We don't need to store the nid and we don't need to charge struct zswap_entry. It doesn't get rid of virt_to_page() though.