Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp3598329rdh; Mon, 27 Nov 2023 20:14:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IEUfgSdzdikD0uMqszXzcEFGWeY4y7rCyWzL7zpr3dtfbuXwnILoeVNWGFdEet4QYyp4klU X-Received: by 2002:a05:6870:c44:b0:1fa:1ca4:b917 with SMTP id lf4-20020a0568700c4400b001fa1ca4b917mr16067767oab.41.1701144873750; Mon, 27 Nov 2023 20:14:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701144873; cv=none; d=google.com; s=arc-20160816; b=pDywAlS83abU6MQzRqBI2/Mdd95yEnZ9DijI10fpDfYRTLTnqENDnRxyN3OtU0mlKf 4iL3PLAxDekL+Htg7seG9Rer/TWJYNfu4dkRCmAsSzp4TvXKsWeiLSGq55lmrsgRWX32 eKsWqfTEAIilERSTeVPwDTTKMHqABxfu4udmiBnpawDgehM1qEEwJlhKSyea5j17OJ6R Wzwrt1pFU3Q35nOd+6pnVqKRexwE5bKOHKpQG4EBxd1gCHrPyG211eMbfyxa2amykJS4 mY5J/9L0f5ZVRnFOfVxThgfGVMtkl2BokxtDKDerStZpz/h30lC+bS0p50zOgMMICEIO KyTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=wPUfY3E34s0ybkWQp9ENvHW5Xeh7l+OsGOithDOSDmA=; fh=koA9DAMgny8rBufieHp0TbFT4ty1KYYRkM/D8Ik2Sbc=; b=zY+bBaOGOvGL+MdtbgpMFw8YnX7CnV646SRRzrk8FUio7smULJB+yxzfJFHjMK11f0 lyUNRNOgr9qY83VrvI09Ze2/z4XyAxOz+DgjMADpvJzYsc5zIHg3CXZyZVO8Rzw4MamX CRwD66g5odSiS4jetJ09jJsEOCmxKCJ3BfN3Ft2TnISLVCuIrHOin9MYgOTxIA8xxnaX 5YCjRxoYUsYCUTC2QKthIwG75D+/3L55+/KML+uq4mpJRocnutwZS7crys3ifBzBJ88S XP1zRPCAQ0XBtuM/9aq04oFlJE965E7K6rkAZzaKGWy3CHcAl6MXoIdUwWJDz4AQCnrF 3XeA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=Sa3xb4t5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id y24-20020a056a001c9800b006cbfe3c5c60si7773893pfw.404.2023.11.27.20.14.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Nov 2023 20:14:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=Sa3xb4t5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 8409F806157C; Mon, 27 Nov 2023 20:14:30 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234657AbjK1EOL (ORCPT + 99 others); Mon, 27 Nov 2023 23:14:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234443AbjK1EOK (ORCPT ); Mon, 27 Nov 2023 23:14:10 -0500 Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68503E6 for ; Mon, 27 Nov 2023 20:14:16 -0800 (PST) Received: by mail-lf1-x12e.google.com with SMTP id 2adb3069b0e04-507a29c7eefso6521358e87.1 for ; Mon, 27 Nov 2023 20:14:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701144854; x=1701749654; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=wPUfY3E34s0ybkWQp9ENvHW5Xeh7l+OsGOithDOSDmA=; b=Sa3xb4t5jbBki2E4RKJ5wRmTPCprcbAITZ36z6EkiLVNCwLYeMrAMuThn7xyUdYbOs SSoQws8WPbKtHcSEdIETOW2f8sufzjcwa401mnET1Rxd84GYeReWhzZdBVH6J+ezDtgF aX56H+p6wH1mlf+vqLI3iaI382zq88wUIcTTQiMERQFYh5sBcep1nLv4qlmwgonolisu 8B/tYPOxNswR8gnlSqXOm68q+HXK5nAcs2FUIgpPmGEw6igHcM+GwNzg1Yja45h6fztw R5khNtX+wGw7KmA07Ywjh8lKpqr74sVPpRZf+NiMVT+iCXYK9sAjAivMfagaS3W0uTol sa0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701144854; x=1701749654; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wPUfY3E34s0ybkWQp9ENvHW5Xeh7l+OsGOithDOSDmA=; b=QCrvC0O6DfML2GFb9ngjdqVhn62CRHDjLVXZL/gWQvv42OCBbJZZ1SjKxLpJClcVGs ITEzAETPBaYrnu1eZgZYryuvkgETVyLP2VO/4jIyMM850Ib6voBZcM2BenMXmecfwsfs oP3E8FEOkegfHzJrJET4OtLlDNTK1MvNKcBC5r9r0SeAJyLwNHOXBY72fMzfBYy0589s a2yFL6/WGvLCloR7CJC8ZpveV6SxIwDlB195eUEkNFLtvVvh0IJ0aIlXUUf2BfAZWtY2 Tcjx51SFjqdNNPnNfcHq2m7ZCbkjh2WxMXam0RySVEKQv5cgD0d+dPkpwddGWmBxHYi4 /PsQ== X-Gm-Message-State: AOJu0Yzqbu8LVjNWK65Pyqyr6fOc1oNu47Dsp8dH03qmmSuVyRe0z0aH cQKB0F/xuB1M5enc9SeGGQiQzOPkiw7XlnmT3yV7/g== X-Received: by 2002:a05:6512:3d03:b0:509:8a7e:4d0d with SMTP id d3-20020a0565123d0300b005098a7e4d0dmr8172857lfv.0.1701144854489; Mon, 27 Nov 2023 20:14:14 -0800 (PST) MIME-Version: 1.0 References: <87msv58068.fsf@yhuang6-desk2.ccr.corp.intel.com> <87h6l77wl5.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkbf7gz6.fsf@yhuang6-desk2.ccr.corp.intel.com> <87msuy5zuv.fsf@yhuang6-desk2.ccr.corp.intel.com> <87fs0q5xsq.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87fs0q5xsq.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Yosry Ahmed Date: Mon, 27 Nov 2023 20:13:38 -0800 Message-ID: Subject: Re: [PATCH v10] mm: vmscan: try to reclaim swapcache pages if no swap space To: "Huang, Ying" Cc: Minchan Kim , Chris Li , Michal Hocko , Liu Shixin , Yu Zhao , Andrew Morton , Sachin Sant , Johannes Weiner , Kefeng Wang , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 27 Nov 2023 20:14:30 -0800 (PST) On Mon, Nov 27, 2023 at 8:05=E2=80=AFPM Huang, Ying = wrote: > > Yosry Ahmed writes: > > > On Mon, Nov 27, 2023 at 7:21=E2=80=AFPM Huang, Ying wrote: > >> > >> Yosry Ahmed writes: > >> > >> > On Mon, Nov 27, 2023 at 1:32=E2=80=AFPM Minchan Kim wrote: > >> >> > >> >> On Mon, Nov 27, 2023 at 12:22:59AM -0800, Chris Li wrote: > >> >> > On Mon, Nov 27, 2023 at 12:14=E2=80=AFAM Huang, Ying wrote: > >> >> > > > I agree with Ying that anonymous pages typically have differ= ent page > >> >> > > > access patterns than file pages, so we might want to treat th= em > >> >> > > > differently to reclaim them effectively. > >> >> > > > One random idea: > >> >> > > > How about we put the anonymous page in a swap cache in a diff= erent LRU > >> >> > > > than the rest of the anonymous pages. Then shrinking against = those > >> >> > > > pages in the swap cache would be more effective.Instead of ha= ving > >> >> > > > [anon, file] LRU, now we have [anon not in swap cache, anon i= n swap > >> >> > > > cache, file] LRU > >> >> > > > >> >> > > I don't think that it is necessary. The patch is only for a sp= ecial use > >> >> > > case. Where the swap device is used up while some pages are in= swap > >> >> > > cache. The patch will kill performance, but it is used to avoi= d OOM > >> >> > > only, not to improve performance. Per my understanding, we wil= l not use > >> >> > > up swap device space in most cases. This may be true for ZRAM,= but will > >> >> > > we keep pages in swap cache for long when we use ZRAM? > >> >> > > >> >> > I ask the question regarding how many pages can be freed by this = patch > >> >> > in this email thread as well, but haven't got the answer from the > >> >> > author yet. That is one important aspect to evaluate how valuable= is > >> >> > that patch. > >> >> > >> >> Exactly. Since swap cache has different life time with page cache, = they > >> >> would be usually dropped when pages are unmapped(unless they are sh= ared > >> >> with others but anon is usually exclusive private) so I wonder how = much > >> >> memory we can save. > >> > > >> > I think the point of this patch is not saving memory, but rather > >> > avoiding an OOM condition that will happen if we have no swap space > >> > left, but some pages left in the swap cache. Of course, the OOM > >> > avoidance will come at the cost of extra work in reclaim to swap tho= se > >> > pages out. > >> > > >> > The only case where I think this might be harmful is if there's plen= ty > >> > of pages to reclaim on the file LRU, and instead we opt to chase dow= n > >> > the few swap cache pages. So perhaps we can add a check to only set > >> > sc->swapcache_only if the number of pages in the swap cache is more > >> > than the number of pages on the file LRU or similar? Just make sure = we > >> > don't chase the swapcache pages down if there's plenty to scan on th= e > >> > file LRU? > >> > >> The swap cache pages can be divided to 3 groups. > >> > >> - group 1: pages have been written out, at the tail of inactive LRU, b= ut > >> not reclaimed yet. > >> > >> - group 2: pages have been written out, but were failed to be reclaime= d > >> (e.g., were accessed before reclaiming) > >> > >> - group 3: pages have been swapped in, but were kept in swap cache. T= he > >> pages may be in active LRU. > >> > >> The main target of the original patch should be group 1. And the page= s > >> may be cheaper to reclaim than file pages. > >> > >> Group 2 are hard to be reclaimed if swap_count() isn't 0. > >> > >> Group 3 should be reclaimed in theory, but the overhead may be high. > >> And we may need to reclaim the swap entries instead of pages if the pa= ges > >> are hot. But we can start to reclaim the swap entries before the swap > >> space is run out. > >> > >> So, if we can count group 1, we may use that as indicator to scan anon > >> pages. And we may add code to reclaim group 3 earlier. > >> > > > > My point was not that reclaiming the pages in the swap cache is more > > expensive that reclaiming the pages in the file LRU. In a lot of > > cases, as you point out, the pages in the swap cache can just be > > dropped, so they may be as cheap or cheaper to reclaim than the pages > > in the file LRU. > > > > My point was that scanning the anon LRU when swap space is exhausted > > to get to the pages in the swap cache may be much more expensive, > > because there may be a lot of pages on the anon LRU that are not in > > the swap cache, and hence are not reclaimable, unlike pages in the > > file LRU, which should mostly be reclaimable. > > > > So what I am saying is that maybe we should not do the effort of > > scanning the anon LRU in the swapcache_only case unless there aren't a > > lot of pages to reclaim on the file LRU (relatively). For example, if > > we have a 100 pages in the swap cache out of 10000 pages in the anon > > LRU, and there are 10000 pages in the file LRU, it's probably not > > worth scanning the anon LRU. > > For group 1 pages, they are at the tail of the anon inactive LRU, so the > scan overhead is low too. For example, if number of group 1 pages is > 100, we just need to scan 100 pages to reclaim them. We can choose to > stop scanning when the number of the non-group-1 pages reached some > threshold. > We should still try to reclaim pages in groups 2 & 3 before OOMing though. Maybe the motivation for this patch is group 1, but I don't see why we should special case them. Pages in groups 2 & 3 should be roughly equally cheap to reclaim. They may have higher refault cost, but IIUC we should still try to reclaim them before OOMing.