Received: by 2002:ab2:3141:0:b0:1ed:23cc:44d1 with SMTP id i1csp1644786lqg; Sun, 3 Mar 2024 21:43:23 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCV/gGWag0E27c8R7rL73feIAQS3Bjz6FVR4tpYxWJK9yjVsTmF9hqw2AOFPuN1tTsV3aMaHXBpp7/kNlmkNSAyda6cPSCBaY0vPS4P/qw== X-Google-Smtp-Source: AGHT+IFdefWAFj5LluW7Fh8omcNZY0cU5a7MOhz83YeipVXJDmwr9Q8xVsfwiu2MCA08Gq5jJoRq X-Received: by 2002:a17:906:d045:b0:a44:4d9b:9062 with SMTP id bo5-20020a170906d04500b00a444d9b9062mr4556520ejb.69.1709531003464; Sun, 03 Mar 2024 21:43:23 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709531003; cv=pass; d=google.com; s=arc-20160816; b=jJ6oO2hnHNLT6AEEMQCdMLIqign0lO96zZTSVRvcl7iPi+FNUdn9jWL8bTw38UIjow Ce5wsmus7b7Mls/g16+cQtVg3ioSovwuYuQsZZWdbctG/v2D58ljv7IbbH990wSSXDq5 6jijDTwty3TUgPQOlu78VfoPF9Pg4eyHg/9AI6yYRQHkBr6kagZEqWVJLdlqn9Z6KAcM yrriLp50CS4CWpPwDS/DohEJtIqz28fT2rkeXB08pYowy6A4Ky9G7g4vidB6QMl4GqKC S/v6twB5H3G1dr/07b+/co9xkYdsTV9fAUvqDnxPlKu48dJ4HfFpJEBTFh74EcQtFxlL sBxA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=/g+ihsCrAtAeTgC7avXfAm7wA/HUc5ofGFAsJ0YST60=; fh=V3W/olmzc7YumzSMC/hi43kq94tqrcduWVq9aFo6MSs=; b=1K7dqZPQFfmanoVfw8x3Up3/QhUmRfjuSZNsPt5X2QXYKXSyopErNNRo8/1LXd8WUE fi8CYl97SUSKJgMjysFooIAVQjHuWFUlpMOoMHdMnwt7tP1dqCLyneC6vPTOOYNPe9k8 bAlaAxTa4edB+ljv9boju1oTtF/B4A50yhlPyl6fqe6g42fih660ZBCcT+wNd13wot55 8es9ieykvyQ2GVYHGNP+cMlapP5EaAsu7MKvtmOigP5Iu2P8f4T//FLIQbowWlkh7Kon Gc87pOBkKI5k9vEuwB+Ac9SjUA16MgSCmqHaaTXia8+eR0qTOiI/LZaypzyJWyOrnW+0 cRWA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=QLncKS53; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-90062-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-90062-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id c19-20020a170906529300b00a44c7114c51si2043232ejm.574.2024.03.03.21.43.23 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 03 Mar 2024 21:43:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-90062-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=QLncKS53; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-90062-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-90062-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 0C7251F21148 for ; Mon, 4 Mar 2024 05:43:23 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6AA6FC127; Mon, 4 Mar 2024 05:43:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QLncKS53" Received: from mail-ua1-f43.google.com (mail-ua1-f43.google.com [209.85.222.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D15F2A59 for ; Mon, 4 Mar 2024 05:43:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.43 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709530992; cv=none; b=THTaJGGn5SmVwCnxe9NpOq7ABGEiCkXnFCq1pRzjeyC2+3dD1/p1mcSLtpbGOTimmNETRRhVotMPYVvWTzKoVr7+StOkeSv+gQ1DwHZZR9uz52rfUmG9K3g4FE4Vx6hS5t3fGGzia9eEJ1mJ7iN3Ct5lZVMo9x2EKxPMgdtzqm8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709530992; c=relaxed/simple; bh=xV/fuhKSDLemtoH7XWKg980mNZhrSOCtufAVr1sOLQI=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=mwburudEYsq7l1h/h/L2FOqkDSf1qE3mamrnJ4oYaeIqx9z9xSfWsJmAFeBM79osOie23NJ5I98qi6UDq1/j6fcAWG0BflEOHdoaItaD3/4Iqu60QukLi31xk1hxuI/JAQIgNufPzP089jwYvzzuvIO6Zjk8BrW6JN7Fu1ivYXk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=QLncKS53; arc=none smtp.client-ip=209.85.222.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ua1-f43.google.com with SMTP id a1e0cc1a2514c-7db123701bcso811514241.2 for ; Sun, 03 Mar 2024 21:43:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709530989; x=1710135789; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/g+ihsCrAtAeTgC7avXfAm7wA/HUc5ofGFAsJ0YST60=; b=QLncKS53WefWhcf6uEloXiL4QDYiSvT0KDE4ViooHA2GHBc/WgvLi3djkVvd2rdbDq sbLCKsM+vN8s26BOJ+TxdBSzRc3DKUxPTIz5ldykHsupQVnFFKcNXm+aJnGYXaTB6CNd 1YNQjz1stp7CnDvPfM7lbpV6ygtmBbgmab20rDafA9rwhSzNQJPKeZS4KtO94H/7bqR9 t8jwn4bsWsIhiCBwIMDRYONHGDjyt3sznHgdRUitHSc9lHsnQ6VZ7bPF/sbwTGfuq5wz yWSih6NVS38J5nCUoYx6IuaGb06UCNc4orizRN/X3PKZOJwtj5sL/oLyGJAbEvX8EhWN ObYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709530989; x=1710135789; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/g+ihsCrAtAeTgC7avXfAm7wA/HUc5ofGFAsJ0YST60=; b=jFhr/bcbcYFPqqAmFeWmbVoqBp4Ck0gZBqIaFG8xAmkzgcM7RO9iF+ppS7LC+MpNAN 0m/H5UjSpmGvzo+INvhXQ66IQFZ8J52CynvADirwNsUPpJELBfRHuBXc6gn3AxomQKsx NSEdc7l4Y/BDEwBGxZXuF9MWujWdljj+lTioelL8K68kQ6mOCOd0hNpUItpOlc2wSiYv 7+5YG/ZdavDUqomwndL4oVmFpyHwQytxCzIdgAVvW2YTeVi4+vAREqSXc2/RJjUNtAaL K7M1SoIcNi4yhnHuDsCRn7rwQq2ivK13nq9yToeXbtxhYU/r3SMW5WOeYxno0JVaDLzw UHMQ== X-Forwarded-Encrypted: i=1; AJvYcCV0xxjBT/jlvxaKY64KA1u06DfCvLFylpHVkHXJtQ/eEuFRJz+LN1NUiXQEFh32a1APUtWceJqSXqX1BsYAm6IA045ApL2k5EEWRyg7 X-Gm-Message-State: AOJu0Yy1kMI55j0wYNlRKZpJJMxkoOMEEOgUbPqouA2m20MtUFkC9/T6 6JMJGbe7sKFlptc3iLD/rpwrkGaSAxVyYAiBGt7C6K/gV8IMMurxVw5cx91kQz1SOR23h+r4+eF bUFa1dWryb6j7OLgT9Ve/oHnEwus= X-Received: by 2002:a05:6102:356e:b0:472:b188:30ad with SMTP id bh14-20020a056102356e00b00472b18830admr1766445vsb.1.1709530989623; Sun, 03 Mar 2024 21:43:09 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <6541e29b-f25a-48b8-a553-fd8febe85e5a@redhat.com> <2934125a-f2e2-417c-a9f9-3cb1e074a44f@redhat.com> <049818ca-e656-44e4-b336-934992c16028@arm.com> <4a73b16e-9317-477a-ac23-8033004b0637@arm.com> <1195531c-d985-47e2-b7a2-8895fbb49129@redhat.com> <5ebac77a-5c61-481f-8ac1-03bc4f4e2b1d@arm.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Mon, 4 Mar 2024 18:42:58 +1300 Message-ID: Subject: Re: [PATCH v3 1/4] mm: swap: Remove CLUSTER_FLAG_HUGE from swap_cluster_info:flags To: Ryan Roberts Cc: Matthew Wilcox , David Hildenbrand , Andrew Morton , Huang Ying , Gao Xiang , Yu Zhao , Yang Shi , Michal Hocko , Kefeng Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Mar 4, 2024 at 5:52=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrote= : > > On Sat, Mar 2, 2024 at 6:08=E2=80=AFAM Ryan Roberts wrote: > > > > On 01/03/2024 16:44, Ryan Roberts wrote: > > > On 01/03/2024 16:31, Matthew Wilcox wrote: > > >> On Fri, Mar 01, 2024 at 04:27:32PM +0000, Ryan Roberts wrote: > > >>> I've implemented the batching as David suggested, and I'm pretty co= nfident it's > > >>> correct. The only problem is that during testing I can't provoke th= e code to > > >>> take the path. I've been pouring through the code but struggling to= figure out > > >>> under what situation you would expect the swap entry passed to > > >>> free_swap_and_cache() to still have a cached folio? Does anyone hav= e any idea? > > >>> > > >>> This is the original (unbatched) function, after my change, which c= aused David's > > >>> concern that we would end up calling __try_to_reclaim_swap() far to= o much: > > >>> > > >>> int free_swap_and_cache(swp_entry_t entry) > > >>> { > > >>> struct swap_info_struct *p; > > >>> unsigned char count; > > >>> > > >>> if (non_swap_entry(entry)) > > >>> return 1; > > >>> > > >>> p =3D _swap_info_get(entry); > > >>> if (p) { > > >>> count =3D __swap_entry_free(p, entry); > > >>> if (count =3D=3D SWAP_HAS_CACHE) > > >>> __try_to_reclaim_swap(p, swp_offset(entry), > > >>> TTRS_UNMAPPED | TTRS_FULL= ); > > >>> } > > >>> return p !=3D NULL; > > >>> } > > >>> > > >>> The trouble is, whenever its called, count is always 0, so > > >>> __try_to_reclaim_swap() never gets called. > > >>> > > >>> My test case is allocating 1G anon memory, then doing madvise(MADV_= PAGEOUT) over > > >>> it. Then doing either a munmap() or madvise(MADV_FREE), both of whi= ch cause this > > >>> function to be called for every PTE, but count is always 0 after > > >>> __swap_entry_free() so __try_to_reclaim_swap() is never called. I'v= e tried for > > >>> order-0 as well as PTE- and PMD-mapped 2M THP. > > >> > > >> I think you have to page it back in again, then it will have an entr= y in > > >> the swap cache. Maybe. I know little about anon memory ;-) > > > > > > Ahh, I was under the impression that the original folio is put into t= he swap > > > cache at swap out, then (I guess) its removed once the IO is complete= ? I'm sure > > > I'm miles out... what exactly is the lifecycle of a folio going throu= gh swap out? > > > > > > I guess I can try forking after swap out, then fault it back in in th= e child and > > > exit. Then do the munmap in the parent. I guess that could force it? = Thanks for > > > the tip - I'll have a play. > > > > That has sort of solved it, the only problem now is that all the folios= in the > > swap cache are small (because I don't have Barry's large swap-in series= ). So > > really I need to figure out how to avoid removing the folio from the ca= che in > > the first place... > > I am quite sure we have a chance to hit a large swapcache even using zRAM= - > a sync swapfile and even during swap-out. > > I have a test case as below, > 1. two threads to run MADV_PAGEOUT > 2. two threads to read data being swapped-out > > in do_swap_page, from time to time, I can get a large swapcache. > > We have a short time window after add_to_swap() and before > __removing_mapping() of > vmscan, a large folio is still in swapcache. > > So Ryan, I guess you can trigger this by adding one more thread of > MADV_DONTNEED to do zap_pte_range? Ryan, I have modified my test case to have 4 threads: 1. MADV_PAGEOUT 2. MADV_DONTNEED 3. write data 4. read data and git push the code here so that you can get it, https://github.com/BarrySong666/swaptest/blob/main/swptest.c I can reproduce the issue in zap_pte_range() in just a couple of minutes. > > > > > > > > > >> > > >> If that doesn't work, perhaps use tmpfs, and use some memory pressur= e to > > >> force that to swap? > > >> > > >>> I'm guessing the swapcache was already reclaimed as part of MADV_PA= GEOUT? I'm > > >>> using a block ram device as my backing store - I think this does sy= nchronous IO > > >>> so perhaps if I have a real block device with async IO I might have= more luck? > > >>> Just a guess... > > >>> > > >>> Or perhaps this code path is a corner case? In which case, perhaps = its not worth > > >>> adding the batching optimization after all? > > >>> > > >>> Thanks, > > >>> Ryan > > >>> > > > Thanks Barry