Received: by 2002:a05:7412:40d:b0:e2:908c:2ebd with SMTP id 13csp114904rdf; Mon, 20 Nov 2023 18:48:27 -0800 (PST) X-Google-Smtp-Source: AGHT+IHlOZE0LLOFCCIfM/bWGXyCoFno8d6ScZv06ZXBo2v+bClbTKBhY7SGlQXALMVaLJ5tgMRf X-Received: by 2002:a05:6830:14c2:b0:6d6:53fe:2181 with SMTP id t2-20020a05683014c200b006d653fe2181mr9914818otq.26.1700534907405; Mon, 20 Nov 2023 18:48:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700534907; cv=none; d=google.com; s=arc-20160816; b=xPwPynnQZ4Ay9DugZcYz0MZU1JUWnQFNrRrv9tQUWopTm1MaMcO9O2puuVy1ycwoJz MtUpiUbAE+n7Zm9WgkI259whF3C30IQKTBE6f9KDEt7mFtCnqgcrCTVkc4Ow2nD6blX8 A5W7WxXJK0HoadOPREb759JpQWnDtaGN8QhxWoKtGmr+QI5HfLM3rKJH7qr1LQapSkBH cYIi5MYk+6dkma66Nil1WxEhzIvx+uqNsocOb/blG81uIHk5mhGrkUsqge/MdZrJpWRb Y4FgEXmuGI4oeM5+ZeEE2bilEzSVP8YGq65B++TM0FCSnwRtrSdNX7ADatCdlJ3Ye47J wjiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=IT8XHAvcEhECV25AWmpJElN+FUmGpxVcHQIxuxXBKgE=; fh=wvfcfNmB/0livP+I1tI8F1oO+6bhIix/c5rNjWy6LhY=; b=sGh1DUvPt/146gDmyZa7PxF9bETt2X3VXMDq9p/MA+EEYRbVldKJzcQODxXhBzlOou 1e9WhyuDrA9XQF3mOOfEXcR760rLhST3cHIGv179XDToKyHOCClELqb2oCRmYLlIo5Jn ONeS7jIyPogTGhW4XguMlkQcC+OlbHR10mde3Wb1d68rrxP2zJEkcsNKH4WbwquM83Py 9wIqw50GRPKd87VBUD8ivgSRDDYTj8Jwy1BYeKo/hw2NNCDKeRbCRhGSsG8krr24W0fq iZwT+VAZGCCS/vUgl4b8l1RalvHNMKt+mrSUYVT4uADM3WWbGMiFwbMrI1BoVqvoT6jZ 0p9g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="hl4rA/iS"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id y187-20020a6364c4000000b005b9b68add85si9456412pgb.254.2023.11.20.18.48.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Nov 2023 18:48:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="hl4rA/iS"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 095E48083B04; Mon, 20 Nov 2023 18:47:18 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229830AbjKUCq6 (ORCPT + 99 others); Mon, 20 Nov 2023 21:46:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229522AbjKUCq5 (ORCPT ); Mon, 20 Nov 2023 21:46:57 -0500 Received: from mail-ej1-x633.google.com (mail-ej1-x633.google.com [IPv6:2a00:1450:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 01337C4 for ; Mon, 20 Nov 2023 18:46:53 -0800 (PST) Received: by mail-ej1-x633.google.com with SMTP id a640c23a62f3a-a013d22effcso62020666b.2 for ; Mon, 20 Nov 2023 18:46:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1700534812; x=1701139612; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=IT8XHAvcEhECV25AWmpJElN+FUmGpxVcHQIxuxXBKgE=; b=hl4rA/iScYUS/ZpABmXD+D508ZjQ2mMPVc4v6mk1DCgkUBn4wqrY5Y5ZIlNwylTkKj KYO825BzZz9RhL7Q1TPlMRm2FbsCoYEn9uPvWRp+0oM1h/rY3ylEUN0cMj3qqMoKjq0A 1uVTDVVsGQaPw+nV5jFxn1Ey6Y6PtLYjEVhQu/DcL53WAuR/dQ63Fm2KUwmaYhoO8s2Z trQA+9E0v+x6GoCBTqxeSIG+DG2OwjmNpkaFIXHurkL1KdMxBIsy8/OstkIrYC+ezsdl NtJWD5Y9yCcugBQHXvJ9OrA3T1N6wapaHwv2XwVY33+WgBGXqlOBI0uIbbMsF5vRfkAC Ij0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700534812; x=1701139612; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IT8XHAvcEhECV25AWmpJElN+FUmGpxVcHQIxuxXBKgE=; b=bGVoSwLY2/OAL+NadAfEVoA3zBXPXdev5foytbAynbmXPfYgyvKlLP01kFjvVpzDlS Xc/8HDWrR0dhoV60jkd+RGIZq1H40FPaRzH1Hi/j4xc+WUnFFULn98QmAeexhKwGMOth L4hASHWJ2otZbhVPhhjmdTqGRE2Oc4wbEu6VA0/0KAIEnqX38xqq5b+ge5g6erfAM0Ql DdocJtdBkrWhrgwD7LLAeWz1hDkAr4HjvFvzQJJDA6MHQW2csg948fVCC6Zyiuht8eaz jz5/nJcv8iSkB3mUNqY6Qj3FVaLnnCnEi47Pif2znUCm/XfSLK+/OL5T2S5vsBkOB5sb mhIg== X-Gm-Message-State: AOJu0YxHvqM1/qCZnNNHHSXfzBXAuIyLMQ9YkqZoF4WL8wjI+cLdm3UM I8+4ilX5kV40531Dsmtof1hE2CssZEd7aMzqrL5vtQ== X-Received: by 2002:a17:906:414a:b0:9dd:9380:bb15 with SMTP id l10-20020a170906414a00b009dd9380bb15mr6688701ejk.50.1700534812389; Mon, 20 Nov 2023 18:46:52 -0800 (PST) MIME-Version: 1.0 References: <20231113130601.3350915-1-hezhongkun.hzk@bytedance.com> <8734x1cdtr.fsf@yhuang6-desk2.ccr.corp.intel.com> <87edgkapsz.fsf@yhuang6-desk2.ccr.corp.intel.com> <875y1vc1n7.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <875y1vc1n7.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Yosry Ahmed Date: Mon, 20 Nov 2023 18:46:13 -0800 Message-ID: Subject: Re: [PATCH] mm:zswap: fix zswap entry reclamation failure in two scenarios To: "Huang, Ying" Cc: Chris Li , Zhongkun He , Andrew Morton , Johannes Weiner , Nhat Pham , Seth Jennings , Dan Streetman , Vitaly Wool , linux-mm , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Mon, 20 Nov 2023 18:47:18 -0800 (PST) On Mon, Nov 20, 2023 at 5:55=E2=80=AFPM Huang, Ying = wrote: > > Yosry Ahmed writes: > > > On Mon, Nov 20, 2023 at 4:57=E2=80=AFPM Huang, Ying wrote: > >> > >> Yosry Ahmed writes: > >> > >> > On Sun, Nov 19, 2023 at 7:20=E2=80=AFPM Huang, Ying wrote: > >> >> > >> >> Chris Li writes: > >> >> > >> >> > On Thu, Nov 16, 2023 at 12:19=E2=80=AFPM Yosry Ahmed wrote: > >> >> >> > >> >> >> Not bypassing the swap slot cache, just make the callbacks to > >> >> >> invalidate the zswap entry, do memg uncharging, etc when the slo= t is > >> >> >> no longer used and is entering the swap slot cache (i.e. when > >> >> >> free_swap_slot() is called), instead of when draining the swap s= lot > >> >> >> cache (i.e. when swap_range_free() is called). For all parts of = MM > >> >> >> outside of swap, the swap entry is freed when free_swap_slot() i= s > >> >> >> called. We don't free it immediately because of caching, but thi= s > >> >> >> should be transparent to other parts of MM (e.g. zswap, memcg, e= tc). > >> >> > > >> >> > That will cancel the batching effect on the swap slot free, makin= g the > >> >> > common case for swapping faults take longer to complete, righ? > >> >> > If I recall correctly, the uncharge is the expensive part of the = swap > >> >> > slot free operation. > >> >> > I just want to figure out what we are trading off against. This i= s not > >> >> > one side wins all situations. > >> >> > >> >> Per my understanding, we don't batch memcg uncharging in > >> >> swap_entry_free() now. Although it's possible and may improve > >> >> performance. > >> > > >> > Yes. It actually causes a long tail in swapin fault latency as Chris > >> > discovered in our prod. I am wondering if doing the memcg uncharging > >> > outside the slots cache will actually amortize the cost instead. > >> > > >> > Regardless of memcg charging, which is more complicated, I think we > >> > should at least move the call to zswap_invalidate() before the slots > >> > cache. I would prefer that we move everything non-swapfile specific > >> > outside the slots cache layer (zswap_invalidate(), > >> > arch_swap_invalidate_page(), clear_shadow_from_swap_cache(), > >> > mem_cgroup_uncharge_swap(), ..). However, if some of those are > >> > controversial, we can move some of them for now. > >> > >> That makes sense for me. > >> > >> > When draining free swap slots from the cache, swap_range_free() is > >> > called with nr_entries =3D=3D 1 anyway, so I can't see how any batch= ing is > >> > going on. If anything it should help amortize the cost. > >> > >> In swapcache_free_entries(), the sis->lock will be held to free multip= le > >> swap slots via swap_info_get_cont() if possible. This can reduce > >> sis->lock contention. > > > > Ah yes that's a good point. Since most of these callbacks don't > > actually access sis, but use the swap entry value itself, I am > > guessing the reason we need to hold the lock for all these callbacks > > is to prevent swapoff and swapon reusing the same swap entry on a > > different swap device, right? > > In, > > swapcache_free_entries() > swap_entry_free() > swap_range_free() > > Quite some sis fields will be accessed. I wasn't referring to this code. I was what's preventing us from moving the callbacks I mentioned outside the lock (zswap_invalidate(), arch_swap_invalidate_page(), clear_shadow_from_swap_cache(), mem_cgroup_uncharge_swap(), ..). I think most or all of them don't really access sis, but perhaps they need the lock to ensure the swap entry value does not get reused? > > -- > Best Regards, > Huang, Ying