Received: by 2002:a05:7412:40d:b0:e2:908c:2ebd with SMTP id 13csp130272rdf; Mon, 20 Nov 2023 19:35:54 -0800 (PST) X-Google-Smtp-Source: AGHT+IGqWW1C+NTOqOJmtAeK47qBfrvKSNNYKeS1nsst6JfnwpVqda1DJgsQTu/CRjzcjCyryzg/ X-Received: by 2002:a17:903:41c3:b0:1ca:a290:4c0c with SMTP id u3-20020a17090341c300b001caa2904c0cmr8592597ple.16.1700537754203; Mon, 20 Nov 2023 19:35:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700537754; cv=none; d=google.com; s=arc-20160816; b=a2tE37Fg4VlIxPBToSLfNYMxf6rGTgicHhurKnNggzVUTlr1lr93m81iydtkdzKxki RuB1nxuWwxwm42p8z2oI7BLxYHM/BvEaE+IAX5toyfkFRPDuEkoKHBnKa/9avaZOm9Hs dodoBYtvodLKN3DiXmi5bhpO7mW20PuOD+ynbFSdjRiGdUL3D++CBD/DVP0CibVrpAMA CFsaN2BIQNQ2umT9ma5Rd1/DpVpUNezfefwnNItToLgKnr5tFLg4vkzb19g+ApgdhrpL g01v9jjKUuzuVeWCSiKkv8ICM3Uybd0oZfcXRvL4XHPKQCKJU2LWG6Si5GADwBhw5m7K PBEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:message-id:date:references:in-reply-to:subject:cc:to :from:dkim-signature; bh=TW5Y1PGU61acO0Ys5tuhfUcH3Plxyjuvd5N7WVyM5PE=; fh=0+oFXG7Ieo/IcxmdXcShUNq/5q7JRjuCkRn8APn3s9Q=; b=Tf5FifVM1/4c51IkjEvMk8oc7Rlm89P9zTqIPMEJY2/bSpEE/Y/9nzbNvkHApIz5Bv HnBnzZ9+p2OjNVcXf4paD9TZ8wgwielR1mOnURmZusMQDH8IXGNMgcZKdXmFKXLAV6ll PA1yodEUwgkh/w425kEEuXGAXlLZLiEASkFhbladkFOHXnA+JFQ+sD9tVN1XLh7VQyJm CRrR/KE0lCh4VTbRx94fiH18dlB4PsuLcTt6fxIjx2uVYERRlZiysxjafUrvZXmyc99v hscN79X/JaYpY1m20R8pigMrDgJJFJQq3NGgLpxVhhYOUm2WhNnjtIrVjgl8QRdvc76e 4Evw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=LHjzP56f; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id g15-20020a1709029f8f00b001cc54202425si9310922plq.409.2023.11.20.19.35.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Nov 2023 19:35:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=LHjzP56f; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 616C58076022; Mon, 20 Nov 2023 19:35:15 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229853AbjKUDfF (ORCPT + 99 others); Mon, 20 Nov 2023 22:35:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229522AbjKUDfD (ORCPT ); Mon, 20 Nov 2023 22:35:03 -0500 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 02833A7 for ; Mon, 20 Nov 2023 19:34:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700537699; x=1732073699; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=n0R9KcRqMiNMDOv9C7sCQJS2G5NNQeFlnHxrrCEZSuA=; b=LHjzP56fXchs1y8id3c+YsB6i80SBFpS1KjMB4aufsKrUIu9YT78JDay fcY+jYITDmR6q0EZhMyWYRYDKVDKhFOUlQuGjO/v/cKwdusXOgcNYA2RV j76nDO7OLX1lind5EGOxZRJdMx6w/dwojhpYnwK6xX3kSl5ukLScsohaL WFnzewswyrxuLOZ6hUtslvdDgn5smlwq/69oI968uAORTx/XqtuuKlZ8f LoWLQ5AENFH+pSLph5EHGCorWRcVvWnuAiA9anZ01/OuWWy+eEA1Cu+Jm d0J03VYcjKb/AJ0OpIQG5BH0tB97qk+ELkvs7WTdtW8Kl+PhREN82ZiEv g==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="390613818" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="390613818" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 19:34:59 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="836924044" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="836924044" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 19:34:56 -0800 From: "Huang, Ying" To: Yosry Ahmed Cc: Chris Li , Zhongkun He , Andrew Morton , Johannes Weiner , Nhat Pham , Seth Jennings , Dan Streetman , Vitaly Wool , linux-mm , LKML Subject: Re: [PATCH] mm:zswap: fix zswap entry reclamation failure in two scenarios In-Reply-To: (Yosry Ahmed's message of "Mon, 20 Nov 2023 18:46:13 -0800") References: <20231113130601.3350915-1-hezhongkun.hzk@bytedance.com> <8734x1cdtr.fsf@yhuang6-desk2.ccr.corp.intel.com> <87edgkapsz.fsf@yhuang6-desk2.ccr.corp.intel.com> <875y1vc1n7.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Tue, 21 Nov 2023 11:32:55 +0800 Message-ID: <871qcjbx20.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Mon, 20 Nov 2023 19:35:15 -0800 (PST) Yosry Ahmed writes: > On Mon, Nov 20, 2023 at 5:55=E2=80=AFPM Huang, Ying wrote: >> >> Yosry Ahmed writes: >> >> > On Mon, Nov 20, 2023 at 4:57=E2=80=AFPM Huang, Ying wrote: >> >> >> >> Yosry Ahmed writes: >> >> >> >> > On Sun, Nov 19, 2023 at 7:20=E2=80=AFPM Huang, Ying wrote: >> >> >> >> >> >> Chris Li writes: >> >> >> >> >> >> > On Thu, Nov 16, 2023 at 12:19=E2=80=AFPM Yosry Ahmed wrote: >> >> >> >> >> >> >> >> Not bypassing the swap slot cache, just make the callbacks to >> >> >> >> invalidate the zswap entry, do memg uncharging, etc when the sl= ot is >> >> >> >> no longer used and is entering the swap slot cache (i.e. when >> >> >> >> free_swap_slot() is called), instead of when draining the swap = slot >> >> >> >> cache (i.e. when swap_range_free() is called). For all parts of= MM >> >> >> >> outside of swap, the swap entry is freed when free_swap_slot() = is >> >> >> >> called. We don't free it immediately because of caching, but th= is >> >> >> >> should be transparent to other parts of MM (e.g. zswap, memcg, = etc). >> >> >> > >> >> >> > That will cancel the batching effect on the swap slot free, maki= ng the >> >> >> > common case for swapping faults take longer to complete, righ? >> >> >> > If I recall correctly, the uncharge is the expensive part of the= swap >> >> >> > slot free operation. >> >> >> > I just want to figure out what we are trading off against. This = is not >> >> >> > one side wins all situations. >> >> >> >> >> >> Per my understanding, we don't batch memcg uncharging in >> >> >> swap_entry_free() now. Although it's possible and may improve >> >> >> performance. >> >> > >> >> > Yes. It actually causes a long tail in swapin fault latency as Chris >> >> > discovered in our prod. I am wondering if doing the memcg uncharging >> >> > outside the slots cache will actually amortize the cost instead. >> >> > >> >> > Regardless of memcg charging, which is more complicated, I think we >> >> > should at least move the call to zswap_invalidate() before the slots >> >> > cache. I would prefer that we move everything non-swapfile specific >> >> > outside the slots cache layer (zswap_invalidate(), >> >> > arch_swap_invalidate_page(), clear_shadow_from_swap_cache(), >> >> > mem_cgroup_uncharge_swap(), ..). However, if some of those are >> >> > controversial, we can move some of them for now. >> >> >> >> That makes sense for me. >> >> >> >> > When draining free swap slots from the cache, swap_range_free() is >> >> > called with nr_entries =3D=3D 1 anyway, so I can't see how any batc= hing is >> >> > going on. If anything it should help amortize the cost. >> >> >> >> In swapcache_free_entries(), the sis->lock will be held to free multi= ple >> >> swap slots via swap_info_get_cont() if possible. This can reduce >> >> sis->lock contention. >> > >> > Ah yes that's a good point. Since most of these callbacks don't >> > actually access sis, but use the swap entry value itself, I am >> > guessing the reason we need to hold the lock for all these callbacks >> > is to prevent swapoff and swapon reusing the same swap entry on a >> > different swap device, right? >> >> In, >> >> swapcache_free_entries() >> swap_entry_free() >> swap_range_free() >> >> Quite some sis fields will be accessed. > > I wasn't referring to this code. I was what's preventing us from > moving the callbacks I mentioned outside the lock (zswap_invalidate(), > arch_swap_invalidate_page(), clear_shadow_from_swap_cache(), > mem_cgroup_uncharge_swap(), ..). I think most or all of them don't > really access sis, but perhaps they need the lock to ensure the swap > entry value does not get reused? In fact, the swap entries to be freed by swapcache_free_entries() is in a state that can not be freed by other path (including swapoff()). It's swap_map value is SWAP_HAS_CACHE, but we can not find folio in swap_address_space(). To be honest, I don't know whether there are dependencies on sis->lock in these callbacks. You need to investigate them one by one. -- Best Regards, Huang, Ying