Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp2872623rdh; Wed, 27 Sep 2023 15:55:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGPc4jDBbyiRk1nH6DDN2xoco+BNd1g2lDxvyonioSVrHosrEr1MgNoa/ZrY/1+uwYpY/fj X-Received: by 2002:a05:6a00:150f:b0:690:d620:7804 with SMTP id q15-20020a056a00150f00b00690d6207804mr3444090pfu.13.1695855328198; Wed, 27 Sep 2023 15:55:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695855328; cv=none; d=google.com; s=arc-20160816; b=BK8Gf7TuKraqwskLJeH+b+OWvbmhRCyFthB+cK/xf59LtrzBD+esWduj5quLN/wfO/ BL24qKa3L74CRAdCps7VLsK6RhvBUtUGon0uTUgspHFmaN/8xVmoD//85BJmifGRY3wr M4Fu+2VqanEfU3OQv+YByrauH7Lp/lejVwuFXRun47I0KJZMRKkAJIqlmJ87hPvrODT4 SHZBNcjINda/E+rZyZIN9HR2Hu2ZF3XUekhAaW3zItRjZ+x8UnI2QV4Rx+jDmmyQToWB cBRGCoz8sPbCwrxfwFu4bdhHI5k+0NVWgRxHv8ZCg4gXPzFUujbwrM8jMSEn4kEMBsj8 MAdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=9G/HLJ01FZeXOW/EiwRF0i/XwdYFJSfQ3MESOHhln7I=; fh=koDnk1Pp3lDWKW9VBGwoQ5hEDk5em06n1KFPWeoN0ac=; b=BhSh7p67Mfcuv67X6HACsHZ0rjGRiuGw9r1+AFmHO85l7XCf8Ir7D8SlD+GPlfornr tSSRVBJSBZM00T67zhL+tpfNBworA1QI+40yjaPfBSq6h4ON0iMI6U59JJ8HndkzQKY4 8M6fY8E8KOwob8FrpCyxy0B9cTnUEGTnhBE62mLGUwK8K7zqA6hU32mdKIslz6Z4pW0O uGNZeH0HyXM+6pAnFCJ29gq4R9gbPcNuvgidWqOvWVZkCDLlgSVCI6mMn1aX8TA4skIN JR31OcetPJiw0keaE6caYolgU4NYZpyYniIXSK7CZM3gFGLcBok6lp+M/bmNAAPQdzN+ yajg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b="KM6Lb2/U"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id m9-20020a654389000000b005774d2f4ab5si16214474pgp.807.2023.09.27.15.55.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Sep 2023 15:55:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b="KM6Lb2/U"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 8B2E280D0810; Wed, 27 Sep 2023 14:02:26 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229650AbjI0VCM (ORCPT + 99 others); Wed, 27 Sep 2023 17:02:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229460AbjI0VCL (ORCPT ); Wed, 27 Sep 2023 17:02:11 -0400 Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com [IPv6:2607:f8b0:4864:20::72d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEF20126 for ; Wed, 27 Sep 2023 14:02:08 -0700 (PDT) Received: by mail-qk1-x72d.google.com with SMTP id af79cd13be357-7741bffd123so723268685a.0 for ; Wed, 27 Sep 2023 14:02:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1695848528; x=1696453328; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=9G/HLJ01FZeXOW/EiwRF0i/XwdYFJSfQ3MESOHhln7I=; b=KM6Lb2/UoXcwwPn0S2kLK+nUet4U1xQyszFCU/t3o8Ra34cx0wtzJD8pg9yaL/xXGA Pj+V1pQmgi6Rq6mUmhIXoz7Gd4BVD1HTvVgpJ7RkdKzMNq9fcXzijMiWx/CrtBbMhouR XzuAAk2SCSkbBM+7266H2RJwTWbSXj4xxJ1sgutln42j+1ttFbUVF/bVleE84mqPQdnb P3f3ddXucPRZp6g5sB8Kg3WYstAiwk1pjTdGwLXx9AD8ZNDvC0kIAyrSR3csyDJYyKKb Ujwagw2oNuOG8CbEb3LjLDdZ9qqHnLwms1gUgSs0lGpUQBl2D1Ct+peD6srgoHLsdZmf MgJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695848528; x=1696453328; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=9G/HLJ01FZeXOW/EiwRF0i/XwdYFJSfQ3MESOHhln7I=; b=uK3WCroGqtZb9UTXq8W1KSbvcfpBzsFIrV/SNuIfBQ3XMDOLbvk6qvcCbqiXFF7X1j cELXp7X55K7wvEWqd0S31qXXplqaE2988y/rmZYr/4zYuK8CCr4e3g09/LCB35vwbgYs VpfjPgefTH/diVig02pfYVtxRK0vcCoeIs32BWjHWag36mBIvJWqSgaA7hu2TabDJ+Ya P1RFa96jugfHsY2FUKGcVoozpNWzaTUm9/kBEBfrm8X1j5IImFVQclzuw2OSoWuiT/61 dBEFVyYalWMPzucz5jcx/3ASFpnQr6bcVNbxdi0d030CXbFECG4SJaLiTgMOh9tFKXHg VlqQ== X-Gm-Message-State: AOJu0YxEc+/4DXhb85JIoyygLSLNo7FRNwo7AuThZbLAz0Cq1Gz1nP3Y LkO6L1J0nkDLzlp2QuZOfD/JLQ== X-Received: by 2002:a05:620a:25d4:b0:767:ae40:1cae with SMTP id y20-20020a05620a25d400b00767ae401caemr3203135qko.7.1695848528012; Wed, 27 Sep 2023 14:02:08 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:ba06]) by smtp.gmail.com with ESMTPSA id op34-20020a05620a536200b00772662b7804sm5746480qkn.100.2023.09.27.14.02.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Sep 2023 14:02:07 -0700 (PDT) Date: Wed, 27 Sep 2023 17:02:06 -0400 From: Johannes Weiner To: Domenico Cerasuolo Cc: Yosry Ahmed , Nhat Pham , akpm@linux-foundation.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Chris Li Subject: Re: [PATCH v2 1/2] zswap: make shrinking memcg-aware Message-ID: <20230927210206.GC399644@cmpxchg.org> References: <20230919171447.2712746-1-nphamcs@gmail.com> <20230919171447.2712746-2-nphamcs@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 27 Sep 2023 14:02:26 -0700 (PDT) On Wed, Sep 27, 2023 at 09:48:10PM +0200, Domenico Cerasuolo wrote: > > > @@ -485,6 +487,17 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, > > > __folio_set_locked(folio); > > > __folio_set_swapbacked(folio); > > > > > > + /* > > > + * Page fault might itself trigger reclaim, on a zswap object that > > > + * corresponds to the same swap entry. However, as the swap entry has > > > + * previously been pinned, the task will run into an infinite loop trying > > > + * to pin the swap entry again. > > > + * > > > + * To prevent this from happening, we remove it from the zswap > > > + * LRU to prevent its reclamation. > > > + */ > > > + zswap_lru_removed = zswap_remove_swpentry_from_lru(entry); > > > + > > > > This will add a zswap lookup (and potentially an insertion below) in > > every single swap fault path, right?. Doesn't this introduce latency > > regressions? I am also not a fan of having zswap-specific details in > > this path. > > > > When you say "pinned", do you mean the call to swapcache_prepare() > > above (i.e. setting SWAP_HAS_CACHE)? IIUC, the scenario you are > > worried about is that the following call to charge the page may invoke > > reclaim, go into zswap, and try to writeback the same page we are > > swapping in here. The writeback call will recurse into > > __read_swap_cache_async(), call swapcache_prepare() and get EEXIST, > > and keep looping indefinitely. Is this correct? Yeah, exactly. > > If yes, can we handle this by adding a flag to > > __read_swap_cache_async() that basically says "don't wait for > > SWAP_HAS_CACHE and the swapcache to be consistent, if > > swapcache_prepare() returns EEXIST just fail and return"? The zswap > > writeback path can pass in this flag and skip such pages. We might > > want to modify the writeback code to put back those pages at the end > > of the lru instead of in the beginning. > > Thanks for the suggestion, this actually works and it seems cleaner so I think > we'll go for your solution. That sounds like a great idea. It should be pointed out that these aren't perfectly equivalent. Removing the entry from the LRU eliminates the lock recursion scenario on that very specific entry. Having writeback skip on -EEXIST will make it skip *any* pages that are concurrently entering the swapcache, even when it *could* wait for them to finish. However, pages that are concurrently read back into memory are a poor choice for writeback anyway, and likely to be removed from swap soon. So it happens to work out just fine in this case. I'd just add a comment that explains the recursion deadlock, as well as the implication of skipping any busy entry and why that's okay.