Received: by 2002:a05:7412:40d:b0:e2:908c:2ebd with SMTP id 13csp96921rdf; Mon, 20 Nov 2023 17:56:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IE/SJEsJPF8WAP0GojikwCBpwt5kCZcWccQ7bpXvckyPOO5cjRFdbR2S/8Ky9MG6TOPgD94 X-Received: by 2002:a17:90b:1b41:b0:283:27e0:652c with SMTP id nv1-20020a17090b1b4100b0028327e0652cmr6787609pjb.43.1700531793941; Mon, 20 Nov 2023 17:56:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700531793; cv=none; d=google.com; s=arc-20160816; b=Q0Y6jh75//2aYLsyituJFSxA6+KApjysXyOahTYZXDkdEDoAwEFzwvthlefwcc7M4K eePG+rUrOCNRQED/2Bvsn10Yf7RMYRhSqEpx3LNMreNtVGfWJN2V0qo86mOpmAU/8jlI kAHgTdSqtEi+l2ir9mhqGhgoztg2fghZRgpdwDA7TalUT2x6BEjiE61YVC7QiKLaohyh i+WoUAdcS116LY1MtfHyj6yK9YWg3+fSY8D/7EXVgq3yurPE71nSpvNCNftiqEopt052 AcvOwu6XE2U4Nr8ahNz58RytJEui48aX+OW0pNdTc4AwleMgSpwbfY2bXbuFZn9YuQZ6 fNLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:message-id:date:references:in-reply-to:subject:cc:to :from:dkim-signature; bh=fcn6UE6XUVSDCOeoqI9L9fuo3llUEYQPa5KVCsl6Hgg=; fh=0+oFXG7Ieo/IcxmdXcShUNq/5q7JRjuCkRn8APn3s9Q=; b=v0hB68kqFG35MgDBWd2PusPNh/4/1jIuHMWLX2lSJpL9jMfU5f3Ngc7T613bS/2a/0 oYC3zy427vQCXoL60AbM7v0mgfhSZDI3kGXWutxq67j30dWySX0v0KT8Fof8ByQPLSjt JcHE6gC/KVpSbNPfZQ/qOiCMUmhnT40GJO5PKirFjrI+/HRdHHPNy9+QVBbv4toTNeLA HRQf5eOND1FjdD2Z00FBLup2Ig7aL80OybRBQUrGbcAT1fa0hiaiy9Tg0dos7RNlZn1K zc1h3n1t1pXG9bsaGLR7AWM/i8coSoTEDpiXOAgEezrXAmn1NJF5fxIltUSPqIHc7pdN Z1gg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Km1jFg8a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id gm15-20020a17090b100f00b002681fea6d14si11703572pjb.79.2023.11.20.17.56.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Nov 2023 17:56:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Km1jFg8a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id AC0318069D85; Mon, 20 Nov 2023 17:56:15 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233027AbjKUBz6 (ORCPT + 99 others); Mon, 20 Nov 2023 20:55:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232997AbjKUBz5 (ORCPT ); Mon, 20 Nov 2023 20:55:57 -0500 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE80E85 for ; Mon, 20 Nov 2023 17:55:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700531753; x=1732067753; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=kq28EHYGvh4sRWAhKoXm/zX2T5wCVpxNHx7fWDZxre4=; b=Km1jFg8af4VDlODwv8t/8pO/knm5yhE3u1CaiX6P0SSy0QuBNzEMapNm UxjgJXpNXxMc5Lxz8S9456GNmtuD5deTQ07T/C1q6sztcWG19pyM8qpE/ c9OQsasFmlPpLCxAGcO57RAVrsDcANOHKulWBah0cPiahxGMgNqc8j8uW Y95QeZCsxOXKX7yesl40gYLONg5c8OtDBqaabFH+D5yhvq68uPvccsMgc Qdaedn46VrxklrBASIEvwspe0Nj5qk8bzeX5AD5hg5sVv7dAa/E0Tigup lB6sPQGI+Hhi1rAbENurb2gwhT8Yy2/0lBxTIZN5m+7B9/OHiumbCrRus A==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="458238706" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="458238706" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 17:55:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="910314635" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="910314635" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 17:55:49 -0800 From: "Huang, Ying" To: Yosry Ahmed Cc: Chris Li , Zhongkun He , Andrew Morton , Johannes Weiner , Nhat Pham , Seth Jennings , Dan Streetman , Vitaly Wool , linux-mm , LKML Subject: Re: [PATCH] mm:zswap: fix zswap entry reclamation failure in two scenarios In-Reply-To: (Yosry Ahmed's message of "Mon, 20 Nov 2023 17:15:15 -0800") References: <20231113130601.3350915-1-hezhongkun.hzk@bytedance.com> <8734x1cdtr.fsf@yhuang6-desk2.ccr.corp.intel.com> <87edgkapsz.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Tue, 21 Nov 2023 09:53:48 +0800 Message-ID: <875y1vc1n7.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Mon, 20 Nov 2023 17:56:15 -0800 (PST) Yosry Ahmed writes: > On Mon, Nov 20, 2023 at 4:57=E2=80=AFPM Huang, Ying wrote: >> >> Yosry Ahmed writes: >> >> > On Sun, Nov 19, 2023 at 7:20=E2=80=AFPM Huang, Ying wrote: >> >> >> >> Chris Li writes: >> >> >> >> > On Thu, Nov 16, 2023 at 12:19=E2=80=AFPM Yosry Ahmed wrote: >> >> >> >> >> >> Not bypassing the swap slot cache, just make the callbacks to >> >> >> invalidate the zswap entry, do memg uncharging, etc when the slot = is >> >> >> no longer used and is entering the swap slot cache (i.e. when >> >> >> free_swap_slot() is called), instead of when draining the swap slot >> >> >> cache (i.e. when swap_range_free() is called). For all parts of MM >> >> >> outside of swap, the swap entry is freed when free_swap_slot() is >> >> >> called. We don't free it immediately because of caching, but this >> >> >> should be transparent to other parts of MM (e.g. zswap, memcg, etc= ). >> >> > >> >> > That will cancel the batching effect on the swap slot free, making = the >> >> > common case for swapping faults take longer to complete, righ? >> >> > If I recall correctly, the uncharge is the expensive part of the sw= ap >> >> > slot free operation. >> >> > I just want to figure out what we are trading off against. This is = not >> >> > one side wins all situations. >> >> >> >> Per my understanding, we don't batch memcg uncharging in >> >> swap_entry_free() now. Although it's possible and may improve >> >> performance. >> > >> > Yes. It actually causes a long tail in swapin fault latency as Chris >> > discovered in our prod. I am wondering if doing the memcg uncharging >> > outside the slots cache will actually amortize the cost instead. >> > >> > Regardless of memcg charging, which is more complicated, I think we >> > should at least move the call to zswap_invalidate() before the slots >> > cache. I would prefer that we move everything non-swapfile specific >> > outside the slots cache layer (zswap_invalidate(), >> > arch_swap_invalidate_page(), clear_shadow_from_swap_cache(), >> > mem_cgroup_uncharge_swap(), ..). However, if some of those are >> > controversial, we can move some of them for now. >> >> That makes sense for me. >> >> > When draining free swap slots from the cache, swap_range_free() is >> > called with nr_entries =3D=3D 1 anyway, so I can't see how any batchin= g is >> > going on. If anything it should help amortize the cost. >> >> In swapcache_free_entries(), the sis->lock will be held to free multiple >> swap slots via swap_info_get_cont() if possible. This can reduce >> sis->lock contention. > > Ah yes that's a good point. Since most of these callbacks don't > actually access sis, but use the swap entry value itself, I am > guessing the reason we need to hold the lock for all these callbacks > is to prevent swapoff and swapon reusing the same swap entry on a > different swap device, right? In, swapcache_free_entries() swap_entry_free() swap_range_free() Quite some sis fields will be accessed. -- Best Regards, Huang, Ying