Received: by 2002:a05:7412:40d:b0:e2:908c:2ebd with SMTP id 13csp231286rdf; Tue, 21 Nov 2023 00:33:28 -0800 (PST) X-Google-Smtp-Source: AGHT+IHCDTadvTAkjb/x/xPyCssA6JA1YffJcCcrvKKh6FmaqlP2k0EohiwYFBmcaHPZvCDH239Z X-Received: by 2002:a05:6a20:748e:b0:181:6b3d:ca1c with SMTP id p14-20020a056a20748e00b001816b3dca1cmr2765200pzd.3.1700555608431; Tue, 21 Nov 2023 00:33:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700555608; cv=none; d=google.com; s=arc-20160816; b=Mm+kUPw92RWQGvcqDCMWLZSQxOWJPmV+FjHmTRZv7sHRoSSSU8DJjnUXfUKJzBgbmh a6LnzVkCJCdpbU0yDAvIh4EiaoODB2SJPXnu0OpkkNsFk8r5Np4Ju8s9h/RJbeFfo6G7 ZSZEAlbfgcEZFB4sPktXmm0lnf7C3Yr76WHQP/rctTErC1YZqZfq+Yo3PeApCT9CrUEI xa1hm+SM+dXB+EsMDpPhryY4W3UahQaiNEVDtrENEN8Ovh58K3QbxVeM1CiGx9iiqszS YhcZR8x6RB9A5SXNj3CuRTDu+1f/yNCQaPmSKl0NQ8JssUuAAe/SjddkgaoCaCq2nBYO HEGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=kktn2kCkgc4KLPqSGzjESgPBOL1CAp2D1XThTJuldgc=; fh=mxM+iyYud2eqLtG0pkLKi0uG+kielFRY+hhtqFHmhjA=; b=GuYTcd+UjVXzJ8QhysAstWe3DvAtO2OogZ5OnWeKAmoNb7Dubpv77X2BwAMsJI8Wbs wXEHqL8dtqHH1l8YpC8s6RLGYrC+8M+iHAtRL2pElfgyVTSS5K5bDoiwymqHBrHpWqO+ mL2KFSpYvxCXnuXhfHAZ/S4KEcsqP0X/J/PWckPY/9eFmRU9iYfNR470tzSs4o6tKYeL ZGFljOlcP2RyEleaNmepAiHSwkuUf5aeXef1XRu9Yr7we7GOZqn/aeUfyH5MWEAdvnVQ pKRW0YZtDSFOPy9thvzPyVXVGH+fL69Z0OslKvk1hMpovcQCsRm+5btTiNNWFJdj4hcz vwXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=AfQxNJZJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id dh4-20020a056a020b8400b005bdc949fee3si10747475pgb.880.2023.11.21.00.33.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Nov 2023 00:33:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=AfQxNJZJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id C899680BD74B; Tue, 21 Nov 2023 00:33:24 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230126AbjKUIdN (ORCPT + 99 others); Tue, 21 Nov 2023 03:33:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230026AbjKUIdJ (ORCPT ); Tue, 21 Nov 2023 03:33:09 -0500 Received: from mail-lj1-x22f.google.com (mail-lj1-x22f.google.com [IPv6:2a00:1450:4864:20::22f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 63B2710E for ; Tue, 21 Nov 2023 00:33:05 -0800 (PST) Received: by mail-lj1-x22f.google.com with SMTP id 38308e7fff4ca-2c5039d4e88so68238051fa.3 for ; Tue, 21 Nov 2023 00:33:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700555583; x=1701160383; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kktn2kCkgc4KLPqSGzjESgPBOL1CAp2D1XThTJuldgc=; b=AfQxNJZJllkR3Sj4yYp+nHBYJeZBXXKVfdf4X5siAuqpiZro0uF80yhekvDZH9ngdy UuIhdhj6y6DhT4t9nkmjqhxH8oY98rPWp0kPIJsIAF02+l4kMvtCTbsKP+VLeNtghemf UuYsTsbQMRDINMCYRMAFvBI15ToCgNmllHm2+bWRopInG8MWzZfDchFMKH5ZDJCLbGFo LG6ijL7mLUJ4AeDVq02gR8KodNDai1ftrTdC9jNY6vupfPRHTLBWMHpUwEOvit/NRE06 6Rm/vPbQLCKcx2ODR3iLv8SmgvIonmHzsQyyaLCqHQU3cgwjQZyYMBtNN9SM2C/qFkjl QZ9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700555583; x=1701160383; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kktn2kCkgc4KLPqSGzjESgPBOL1CAp2D1XThTJuldgc=; b=o1VHHEW0mBIm4jMUr1nO4KHWhXk4+pIebv9f/QTpaeBQ/lrgOicyx39BCQfLB8Na0l D9EJnSbgF39/w9FLpviZb/7e+qubR+cE12HyS0pdNVhlfSG1AQIhozXDKDTxLSahIWzL nVlgJ7fKS1bnCFFuJm4SMeV8bdLPHhJkFZBfM7TILKs/fwkAEUxkEX2kr48TOzmiLfSb VfO8EFNXarLVs0bTrm/3m8jG3P9esHbBrTsiXhUz1bZQhQK5SdmgMZgAV/O49F5l2LRe pfvS9B5FXCq9/QSn1EZj8nJ4SK1tpC7zHKG8GF1eHoZDRgCPa8Gd4VRU96j52JGRrvvn vvqw== X-Gm-Message-State: AOJu0Ywa004Owt1Q06enzZoTwdMrQr3nzgA8qb1fhUWgrIdfzHFYSVUg rSBKX51ZnCSJWkTZB/heUjxCbB61IXlX8Mqkxkw= X-Received: by 2002:a2e:6e05:0:b0:2c5:32b:28ea with SMTP id j5-20020a2e6e05000000b002c5032b28eamr6351919ljc.32.1700555583359; Tue, 21 Nov 2023 00:33:03 -0800 (PST) MIME-Version: 1.0 References: <20231119194740.94101-1-ryncsn@gmail.com> <20231119194740.94101-6-ryncsn@gmail.com> In-Reply-To: From: Kairui Song Date: Tue, 21 Nov 2023 16:32:45 +0800 Message-ID: Subject: Re: [PATCH 05/24] mm/swap: move readahead policy checking into swapin_readahead To: Chris Li Cc: linux-mm@kvack.org, Andrew Morton , "Huang, Ying" , David Hildenbrand , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Tue, 21 Nov 2023 00:33:25 -0800 (PST) Chris Li =E4=BA=8E2023=E5=B9=B411=E6=9C=8821=E6=97=A5= =E5=91=A8=E4=BA=8C 15:41=E5=86=99=E9=81=93=EF=BC=9A > > On Mon, Nov 20, 2023 at 10:35=E2=80=AFPM Kairui Song w= rote: > > > > Chris Li =E4=BA=8E2023=E5=B9=B411=E6=9C=8821=E6=97= =A5=E5=91=A8=E4=BA=8C 14:18=E5=86=99=E9=81=93=EF=BC=9A > > > > > > On Sun, Nov 19, 2023 at 11:48=E2=80=AFAM Kairui Song wrote: > > > > > > > > From: Kairui Song > > > > > > > > This makes swapin_readahead a main entry for swapin pages, > > > > prepare for optimizations in later commits. > > > > > > > > This also makes swapoff able to make use of readahead checking > > > > based on entry. Swapping off a 10G ZRAM (lzo-rle) is faster: > > > > > > > > Before: > > > > time swapoff /dev/zram0 > > > > real 0m12.337s > > > > user 0m0.001s > > > > sys 0m12.329s > > > > > > > > After: > > > > time swapoff /dev/zram0 > > > > real 0m9.728s > > > > user 0m0.001s > > > > sys 0m9.719s > > > > > > > > And what's more, because now swapoff will also make use of no-reada= head > > > > swapin helper, this also fixed a bug for no-readahead case (eg. ZRA= M): > > > > when a process that swapped out some memory previously was moved to= a new > > > > cgroup, and the original cgroup is dead, swapoff the swap device wi= ll > > > > make the swapped in pages accounted into the process doing the swap= off > > > > instead of the new cgroup the process was moved to. > > > > > > > > This can be easily reproduced by: > > > > - Setup a ramdisk (eg. ZRAM) swap. > > > > - Create memory cgroup A, B and C. > > > > - Spawn process P1 in cgroup A and make it swap out some pages. > > > > - Move process P1 to memory cgroup B. > > > > - Destroy cgroup A. > > > > - Do a swapoff in cgroup C. > > > > - Swapped in pages is accounted into cgroup C. > > In a strange way it makes sense to charge to C. > Swap out =3D=3D free up memory. > Swap in =3D=3D consume memory. > C turn off swap, effectively this behavior will consume a lot of memory. > C gets charged, so if the C is out of memory, it will punish C. > C will not be able to continue swap in memory. The problem gets under con= trol. Yes, I think charging either C or B makes sense in their own way. To me I think current behavior is kind of counter-intuitive. Image if there are cgroup PC1, and its child cgroup CC1, CC2. If a process swapped out some memory in CC1 then moved to CC2, and CC1 is dying. On swapoff the charge will be moved out of PC1... And swapoff often happens in some unlimited admin cgroup or some cgroup for management agents. If PC1 has a memory limit, the process in it can breach the limit easily, we will see a process that never left PC1 having a much higher RSS than PC1/CC1/CC2's limit. And if there is a limit for the management agent cgroup, the agent will be OOM instead of OOM in PC1. Simply moving a process between the child cgroup of the same parent cgroup won't cause a similar issue, things get weird when swapoff is involved. And actually with multiple layers of swap, it's less risky to swapoff a device since other swap devices can catch over committed memory. Oh, and there is one more case I forgot to cover in this series: Moving a process is indeed something not happening very frequently, but a process run in cgroup then exit, and leave some shmem swapped out could be a common case. Current behavior on swapoff will move these charges out of the original parent cgroup too. So maybe a more ideal solution for swapoff is: simply always charge a dying cgroup parent cgroup? Maybe a sysctl/cmdline could be introduced to control the behavior.