Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp2975547rdh; Wed, 27 Sep 2023 20:57:06 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFJCwyY2dKEtDEjP1mWmc/lAOHfeQFD9L0Yu+d7+sskgkFHHymuGoVM+AfHZAbmMF+2HOt0 X-Received: by 2002:a05:6808:1312:b0:3a7:44a1:512c with SMTP id y18-20020a056808131200b003a744a1512cmr148264oiv.5.1695873426283; Wed, 27 Sep 2023 20:57:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695873426; cv=none; d=google.com; s=arc-20160816; b=EwYnl1sERIGA0Cv2SeHR3Js0bFv1gILXl/GLFqT09VwQhtqH0GouYgHmtCuUCFzhR4 O6wYgNkCMAeuIW/P6WM9pfwCxch7DGkkfB/ntduh4YFnPzm2gfvwZCXlv8uh+NnhI3me bOEHx/vtLBHbyc1rgNx2x7YHNBOqmyndIOE9kgbprNZRrYM8pBK4p8oi6SwmWGIj2y+0 Hty13Wv0CcCYoTAmzBMhQUCOJCNseLqe+msxI86bvdqlG4S3YK+Jc8o7kZmvx6NYIMZQ /W3pP3bklfKQbSOokoaOvPT6A4xMwCZEhJAB/vmqWO1H8Fckaqb09qSj3S8M5/7IkBY9 jqxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=Z2gl2b6anOv/WlO5fpZiL7fnlm+S3JfLVNwXnhVppVA=; fh=fz1gTbAo3NK4eUgH7T3Qf0XgI0mvK3xupXVu8Cy324E=; b=edrHegFVXykJOOxem7eSGAYLqY1zMGhv6zPuUF8OsCuNUdnyEzDdN1Lipy8F35Xazl 4SEQEtsi1hcnsqX6jvKg1Gnh+7Aei/5VUQdHKjjrMQk1B22+YKJnr3CrRUW8GhBU3aDv gZ9N8QsZBDdqW/xpQQ+3u6FnMCePyqw3Rw/rptvPuuQjCcR9ragp62PKIgZeGJ8jfx1z mj/RMkHedUi3q2hZVRRwJa2fX5JOaQmosW2cwjMlCFnWva3fhzOaXG4bphmfKaV8NSfl W5u/o6Gdp0yWmd1cUAskKWx3Do5bqidZlMVwfc+4A2BhFaCdgQ/edWeJ4vZhWKBrYUox Olig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=G+laWL8C; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id eb2-20020a056a004c8200b00690fe0f6e0dsi17477830pfb.68.2023.09.27.20.57.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Sep 2023 20:57:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=G+laWL8C; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id B894481A4468; Wed, 27 Sep 2023 13:39:46 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229639AbjI0Ujf (ORCPT + 99 others); Wed, 27 Sep 2023 16:39:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57648 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229493AbjI0Uje (ORCPT ); Wed, 27 Sep 2023 16:39:34 -0400 Received: from mail-vs1-xe32.google.com (mail-vs1-xe32.google.com [IPv6:2607:f8b0:4864:20::e32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2B45311D for ; Wed, 27 Sep 2023 13:39:32 -0700 (PDT) Received: by mail-vs1-xe32.google.com with SMTP id ada2fe7eead31-452527dded1so5235574137.0 for ; Wed, 27 Sep 2023 13:39:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1695847171; x=1696451971; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=Z2gl2b6anOv/WlO5fpZiL7fnlm+S3JfLVNwXnhVppVA=; b=G+laWL8CAzVQa6H1iMsBmUT6elmRbi5ZT1BZmF1zCpdQR/GR3rDDPzurwMdZ4JOaLh 1L03ba7TzZJKRHnnTRugXjuOV5ctsUAbJVHREVPqXe5xVi8ifhgfY/WSXuwFW8kHrjof FMY9wBEBzmzF8xGQBpYhOefBHisOPPp+mrKOTE3G2by7pFCp1xVnqdnWAp4YUZUIOO3s YrtwmxWjc6bTYeVLy/NeNh7yNQtf77pn1EtJjEgkmSK7iUb3toYaVoOWIp5QXVKxqFj6 qs9694XfKK2h4oM5PxKUxYv0ySJ/tYiLYTvpqZ/CS4dAF+L/vpSjWSS7KXNLZ3fatnd6 4GRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695847171; x=1696451971; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Z2gl2b6anOv/WlO5fpZiL7fnlm+S3JfLVNwXnhVppVA=; b=VZ60feDHd3g/kjRilfxOPRU7Q3gy9mA5oaF+DYDYhn1MJDAdi65+jRZKUv29CKgV/3 gGLSgRdVpDTd+OlwfQaAzc2yLAMa8pecXmQzcnjH9hcffoW+AfA/B5tLsr5FjgMnc0/x hmvTh+HtbCii7PMQvwmsfKynJsEGi3NvqlHlphdk16xfm1mSoVKSUsRVpq7tNISrKi/4 1I531L6nYwEEils0ikWOP81s6mOa+oikr4rnsuxNf5URp7D7Lr+VvT2ikI2KKFKNxpkK 4m+o1uaioUHkjTUwYnptHLX6zlCFASPHapN4LKGH+Gz8IxYmFz6ykEkOKI5CzRHT5jYH tlNQ== X-Gm-Message-State: AOJu0YwBz9fglDNMzlzu1MPLWXMx2vcI+Uno1HnTyZ6vecB6qoPGmTMI Bjk5IR7qmweCsOXNxmvtPAhjSw== X-Received: by 2002:a67:bd05:0:b0:452:b96c:313b with SMTP id y5-20020a67bd05000000b00452b96c313bmr3069566vsq.13.1695847171214; Wed, 27 Sep 2023 13:39:31 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:ba06]) by smtp.gmail.com with ESMTPSA id n11-20020a0cdc8b000000b0065b2be40d58sm1426496qvk.25.2023.09.27.13.39.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Sep 2023 13:39:30 -0700 (PDT) Date: Wed, 27 Sep 2023 16:39:29 -0400 From: Johannes Weiner To: Yosry Ahmed Cc: Nhat Pham , akpm@linux-foundation.org, cerasuolodomenico@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Chris Li Subject: Re: [PATCH v2 1/2] zswap: make shrinking memcg-aware Message-ID: <20230927203929.GA399644@cmpxchg.org> References: <20230919171447.2712746-1-nphamcs@gmail.com> <20230919171447.2712746-2-nphamcs@gmail.com> <20230926182436.GB348484@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Wed, 27 Sep 2023 13:39:46 -0700 (PDT) On Tue, Sep 26, 2023 at 11:37:17AM -0700, Yosry Ahmed wrote: > On Tue, Sep 26, 2023 at 11:24 AM Johannes Weiner wrote: > > > > On Mon, Sep 25, 2023 at 01:17:04PM -0700, Yosry Ahmed wrote: > > > +Chris Li > > > > > > On Tue, Sep 19, 2023 at 10:14 AM Nhat Pham wrote: > > > > > > > > From: Domenico Cerasuolo > > > > > > > > Currently, we only have a single global LRU for zswap. This makes it > > > > impossible to perform worload-specific shrinking - an memcg cannot > > > > determine which pages in the pool it owns, and often ends up writing > > > > pages from other memcgs. This issue has been previously observed in > > > > practice and mitigated by simply disabling memcg-initiated shrinking: > > > > > > > > https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u > > > > > > > > This patch fully resolves the issue by replacing the global zswap LRU > > > > with memcg- and NUMA-specific LRUs, and modify the reclaim logic: > > > > > > > > a) When a store attempt hits an memcg limit, it now triggers a > > > > synchronous reclaim attempt that, if successful, allows the new > > > > hotter page to be accepted by zswap. > > > > b) If the store attempt instead hits the global zswap limit, it will > > > > trigger an asynchronous reclaim attempt, in which an memcg is > > > > selected for reclaim in a round-robin-like fashion. > > > > > > Hey Nhat, > > > > > > I didn't take a very close look as I am currently swamped, but going > > > through the patch I have some comments/questions below. > > > > > > I am not very familiar with list_lru, but it seems like the existing > > > API derives the node and memcg from the list item itself. Seems like > > > we can avoid a lot of changes if we allocate struct zswap_entry from > > > the same node as the page, and account it to the same memcg. Would > > > this be too much of a change or too strong of a restriction? It's a > > > slab allocation and we will free memory on that node/memcg right > > > after. > > > > My 2c, but I kind of hate that assumption made by list_lru. > > > > We ran into problems with it with the THP shrinker as well. That one > > strings up 'struct page', and virt_to_page(page) results in really fun > > to debug issues. > > > > IMO it would be less error prone to have memcg and nid as part of the > > regular list_lru_add() function signature. And then have an explicit > > list_lru_add_obj() that does a documented memcg lookup. > > I also didn't like/understand that assumption, but again I don't have > enough familiarity with the code to judge, and I don't know why it was > done that way. Adding memcg and nid as arguments to the standard > list_lru API makes the pill easier to swallow. In any case, this > should be done in a separate patch to make the diff here more focused > on zswap changes. Just like the shrinker, it was initially written for slab objects like dentries and inodes. Since all the users then passed in charged slab objects, it ended up hardcoding that assumption in the interface. Once that assumption no longer holds, IMO we should update the interface. But agreed, a separate patch makes sense. > > Because of the overhead, we've been selective about the memory we > > charge. I'd hesitate to do it just to work around list_lru. > > On the other hand I am worried about the continuous growth of struct > zswap_entry. It's now at ~10 words on 64-bit? That's ~2% of the size > of the page getting compressed if I am not mistaken. So I am skeptical > about storing the nid there. > > A middle ground would be allocating struct zswap_entry on the correct > node without charging it. We don't need to store the nid and we don't > need to charge struct zswap_entry. It doesn't get rid of > virt_to_page() though. I like that idea. The current list_lru_add() would do the virt_to_page() too, so from that POV it's a lateral move. But we'd save the charging and the extra nid member.