Received: by 2002:a05:7412:d1aa:b0:fc:a2b0:25d7 with SMTP id ba42csp2009682rdb; Wed, 31 Jan 2024 16:52:29 -0800 (PST) X-Google-Smtp-Source: AGHT+IGF2Oj3w2cVpksG2MkhUsxLYuGWqgVsYLp/66UbuK3xk7jyUyWKJvJjy+oUBtbCX2zPw7UI X-Received: by 2002:a17:906:c7d2:b0:a36:9f72:2461 with SMTP id dc18-20020a170906c7d200b00a369f722461mr527723ejb.69.1706748748916; Wed, 31 Jan 2024 16:52:28 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706748748; cv=pass; d=google.com; s=arc-20160816; b=Hm2a6No5oVfd6Xph5uecj0IF32/G/tpkNcovkvtcuaNUw2C6pjOxqGI4MVx6DWsmwU Y+9345x/TodxhxbHefVJrKwiYf6/oU/NGATvqDEcdyQ6n0TXpLtXNl8vnVAD/TG6c9yK aGJ20jWpMu2hReeTZPM3jBjuRWHKt1OKgpWddWkcNEKoJ+aDpbmuThobmI3mPZUHpZeh UuYjwcoJBlex7SljRXwifD1h3MTPNWtXzNBZFHBoJOMshaWIVdsGGN2habcuMYDNpqMX /6snUdgfWgAvu6SLWfyLWrbc+Xzu6WPXZpaSxeOBZIk4y6SrqKFgrCRdipKebK887cmk qusA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=9yXpOJvApZ9C1Qcc2eHFxJmC2HYPOkfttx+788ZQKws=; fh=WThAkpDtsar3YRV/Bfk0dpNmf1MyZO5VcLhfDuj9zWo=; b=MsU+k3aRhOb+0/N7aJEeKPRnf8AlEuO+g47p0y+ZnEjLJqCA5lg9T1CDxgVceUjKVp mC1T5TTfNNMYhGqUfHNd9klXtrmlNBJ9FLR3t7wXUidRXyl9hpLXU7xi8tgRyKkcWvA4 XKPPI5kzx9BcT7eyxoQpW1skoyP7vUsfxNFXmHsUuBRZVJ1E+17baX9Uos5OdmO40vN1 JPsC4vsNSYhYjn6Q5sVKIqsCnRHbYhwvxQvFKal3c5hQMvMW43aYRXr6UGUKwogcw05G 2/pPADiG5c1dKGL3duPILsNS2MPQES44mP6Ev4KlU5+fsV0COg64iop1QsL0PeQSpqlD grMQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="acZ/v7pp"; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-47445-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-47445-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Forwarded-Encrypted: i=1; AJvYcCWD+OOqDu/w+UZPbnXHI02dwJ3cueu0JjL5YHyY6Xjfc+DAWL7pNfcLKsUMc4kSEhLChqUPTx+gbuqk/sfNVg0ICf+dYxUIkhVQwwkROQ== Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id gx22-20020a170906f1d600b00a36666edde1si1442036ejb.1044.2024.01.31.16.52.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Jan 2024 16:52:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-47445-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="acZ/v7pp"; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-47445-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-47445-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 169441F2B4B8 for ; Thu, 1 Feb 2024 00:43:33 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A1CBD3D7B; Thu, 1 Feb 2024 00:43:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="acZ/v7pp" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B35FC3C24 for ; Thu, 1 Feb 2024 00:43:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706748198; cv=none; b=lr3MTzi0g43Zy2/OTYiLY9fj2V88XnkSbcHEL1V0demqjXmQKlQmuMz8nl1udmLbhrcUvpkH3PKiRxpzCRLH3nj4GcFkdgiYV/OY4tDH3CoZzbmqV+NeveZxAnnOLMS1ttPoXhbtPhv5teZPyoOi2z0YcasAcL2823dfsF0j++4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706748198; c=relaxed/simple; bh=SAKZoIcbXQBhVU/XPjInF6+T+qpKARwlN1qzK6xE+uA=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=TDZsowYorR6xoePsP9EtchK2eY4xwo/93PWNCkhFKUDkFz+NH+MvgoHVcPpTk1WZzL8/+3Sf3g5mxF1Ce50dz0/ixmhk1KCCXkq78m18ZnKW10bn4tgWzv7AG2t2VsgWrTeEmgbh9lbVTWB/Z0G3wD4Y6qR7LB6wgVzl0OpKJw0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=acZ/v7pp; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2D716C433F1 for ; Thu, 1 Feb 2024 00:43:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706748198; bh=SAKZoIcbXQBhVU/XPjInF6+T+qpKARwlN1qzK6xE+uA=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=acZ/v7ppW2j+V2QYbN4IPPvvf9H6a7QoZRMjBpSkVeizW6EmPz4EDjVdSC3i33T1S NRHKX6yW045sRFKDq3mBlnlMdViMgVZfg7BfLP+vJMnUIlEyc43aLyo2uPLBuvp+En 8waF7O4Mg/zESlH5//p+C8HmnS6bziHpjfcVGO/pYUw4oDKY1Q+hEqWehH7lWaAy/+ sV5I+MFRxzkpGfzfPWVUxfORStQ+f3ujdMnPDxuXs7utcJJ7A/E+7rX6v0hid4VgsC f85qTZBbt1viNIiXoL4p5maOCFA06/7bbcCW+DQnm8lRoviwALl9uv2Ge4jz96TXbV BTIQCWL2T3/AA== Received: by mail-il1-f176.google.com with SMTP id e9e14a558f8ab-3637860f03bso1458965ab.3 for ; Wed, 31 Jan 2024 16:43:18 -0800 (PST) X-Gm-Message-State: AOJu0YxLHPiyNgAY906oP4I+95amFcm5eTBISrUD1UzLpyEwe2gBxsDe 6Cbfnnt4FhmpLgNjRBjlzveIAhPqJwvk3EUrHDxEoYelb+tWAfscKMHRdX5sXii9QYScsPvPuc/ zfaEGVuKNZ3DTABGYJHAtZGzfSkepI3Kp1LSh X-Received: by 2002:a92:c74f:0:b0:363:8594:350 with SMTP id y15-20020a92c74f000000b0036385940350mr3537392ilp.1.1706748197388; Wed, 31 Jan 2024 16:43:17 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20231221-async-free-v1-1-94b277992cb0@kernel.org> <20231222115208.ab4d2aeacdafa4158b14e532@linux-foundation.org> <87o7eeg3ow.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87o7eeg3ow.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Chris Li Date: Wed, 31 Jan 2024 16:43:05 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] mm: swap: async free swap slot cache entries To: "Huang, Ying" Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Wei Xu , Yu Zhao , Greg Thelen , Chun-Tse Shao , Suren Baghdasaryan , Yosry Ahmed , Brain Geffon , Minchan Kim , Michal Hocko , Mel Gorman , Nhat Pham , Johannes Weiner , Kairui Song , Zhongkun He , Kemeng Shi , Barry Song , Hugh Dickins , Tim Chen Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Ying, Sorry for the late reply. On Sun, Dec 24, 2023 at 11:10=E2=80=AFPM Huang, Ying = wrote: > > Chris Li writes: > > > On Fri, Dec 22, 2023 at 11:52:08AM -0800, Andrew Morton wrote: > >> On Thu, 21 Dec 2023 22:25:39 -0800 Chris Li wrote: > >> > >> > We discovered that 1% swap page fault is 100us+ while 50% of > >> > the swap fault is under 20us. > >> > > >> > Further investigation show that a large portion of the time > >> > spent in the free_swap_slots() function for the long tail case. > >> > > >> > The percpu cache of swap slots is freed in a batch of 64 entries > >> > inside free_swap_slots(). These cache entries are accumulated > >> > from previous page faults, which may not be related to the current > >> > process. > >> > > >> > Doing the batch free in the page fault handler causes longer > >> > tail latencies and penalizes the current process. > >> > > >> > Move free_swap_slots() outside of the swapin page fault handler into= an > >> > async work queue to avoid such long tail latencies. > >> > >> This will require a larger amount of total work than the current > > > > Yes, there will be a tiny little bit of extra overhead to schedule the = job > > on to the other work queue. > > > >> scheme. So we're trading that off against better latency. > >> > >> Why is this a good tradeoff? > > > > That is a very good question. Both Hugh and Wei had asked me similar qu= estions > > before. +Hugh. > > > > The TL;DR is that it makes the swap more palleralizedable. > > > > Because morden computers typically have more than one CPU and the CPU u= tilization > > is rarely reached to 100%. We are actually not trading the latency for = some one > > run slower. Most of the time the real impact is that the current swapin= page fault > > can return quicker so more work can submit to the kernel sooner, at the= same time > > the other idle CPU can pick up the non latency critical work of freeing= of the > > swap slot cache entries. The net effect is that we speed things up and = increase > > the overall system utilization rather than slow things down. > > You solution depends on there is enough idle time in the system. This > isn't always true. > > In general, all async solutions have 2 possible issues. > > a) Unrelated applications may be punished. Because they may wait for > CPU which is running the async operations. In the original solution, > the application swap more will be punished. The typical time to perform on the async free is very brief, at about 100ms level. So the amount of punishment would be small. The original behavior was already delaying the freeing of swap slots due to batching. Adding a tiny bit of time does not change the overall behavior too much. Another thing is that, if the async free is pending, it will go through the direct free path. > b) The CPU time cannot be charged to appropriate applications. The > original behavior isn't perfect too. But it's better than async worker. Yes, the original behavior will free other cgroups' swap entries. > Given the runtime of worker is at 100us level, these issues may be not > severe. But I think that you may need to explain them at least. Thanks for the suggestion. Will do in V2. > > And, when swap slots freeing batching was introduced, it was mainly used > to reduce the lock contention of sis->lock (via swap_info_get_cont()). > So, we may move some operations (e.g., mem_cgroup_uncharge_swap, > clear_shadow_from_swap_cache(), etc.) out of batched operation (before > calling free_swap_slot()) to reduce the latency impact. That is good to know. Thanks for the explanation. Chris > > > The test result of chromebook and Google production server should be ab= le to show > > that it is beneficial to both laptop and server workloads, making them = more responsive > > in swap related workload. > > -- > Best Regards, > Huang, Ying >