Received: by 2002:a05:7412:b101:b0:e2:908c:2ebd with SMTP id az1csp2409689rdb; Tue, 14 Nov 2023 22:54:51 -0800 (PST) X-Google-Smtp-Source: AGHT+IGFBj0oIQWilwK2xR7OCfBq4Fb+wqDGbrAoqSJ7hiyhd2vskc6nBHaBt021k6K/7obMXrJz X-Received: by 2002:a62:ab06:0:b0:68e:2d59:b1f3 with SMTP id p6-20020a62ab06000000b0068e2d59b1f3mr9801712pff.13.1700031291363; Tue, 14 Nov 2023 22:54:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700031291; cv=none; d=google.com; s=arc-20160816; b=qNyxv4RZNZgA1fRpEYnvOBXH7cpckw8S/nzT57XyLxyjOUA+TM9qRHFoEL0qhP7Jb/ rJVeGEbTg6cHhE35hUuCbNI+dqtUGCtj4i5X1Wvo/tW2b28BQOPWWV8QMrfE7LIy4J74 5hjO4oMTqGyZW2DZnf9r9GxGksFOOmsjG9SM/ZAqzw+XzoowCTPEqzlbcKO2dwX2JRvg RbocOgIOi01LSrkGQwf/IQvyV+fJZwW4EPe4Q4r1A1F2fqh4+NM+x7eM035+yJtpDDp3 QUoqoaw/2W+WA3elbzh4g8BuYkKe+3EPgFMcNnhRX63H8XpMijWZcOGoTfNRM+VA+VXp HuHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:message-id:date:references:in-reply-to:subject:cc:to :from:dkim-signature; bh=DrPdU2SUZzn05TeKMGZZvxUsOIXfDJyEiiBY7/4c19Q=; fh=b30EUrfu8/WfFTq3q2j4Ghm3RkveCBFByN6ItihMTIM=; b=y4AHUgTGOxjLYuLF3mKkV7XyBFK7niOXEZdSWtBjPKkJr3Y605EQZtZuzeIeUwMgdJ B5HpICAg4lmpwnAUB8/8i8Q/sH5NOoM+AX9af0oiL1LAijCjqW1q+QTAtFokCn8P2f95 xbdkrZGpUpTojU2aTPUJgeDOJFpaIjbMdHsq6s2ihNtn2vjyfr0kX5ohty+eHV0usUBb Aemj7OfG4lOyZMYntTFYUp1Jn+b0zUlf4IzhCQiVJer3/tZJztzjXeqXxZ1atkCBGUMv HepmSbylRbx9B+qKqTPCjfCufbUc6jKNXvMYFITCIEFUAQ4FExFezU8OaOWizJeXaHva IE8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=LDXbTERB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id d6-20020a056a0010c600b0069023e4bca8si9677988pfu.214.2023.11.14.22.54.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Nov 2023 22:54:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=LDXbTERB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 4FB68807C65C; Tue, 14 Nov 2023 22:54:48 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234545AbjKOGyi (ORCPT + 99 others); Wed, 15 Nov 2023 01:54:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58162 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229600AbjKOGyg (ORCPT ); Wed, 15 Nov 2023 01:54:36 -0500 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76724C3; Tue, 14 Nov 2023 22:54:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700031274; x=1731567274; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=KQs4vJm/jx2KWNtm77V1LxYhHIJm4bwsu8GghKrRV7U=; b=LDXbTERBhNd6gGCSXh8Td0IWAYYwfaYaabwC3dhgoz/UKJkD7RddGWk2 Xa/tcJYLhUHHlnub+BRDO0614CBoVTXHtzwB4FNw9AcPeg6g0xqODodM9 jjvTnUfmrI3TKWwrsPEawPXoSFicPjqN0j6pS4TzfokjWixZp2ngRn74s Yob6uU/vHGqGlHDtJI9VKpfJGm3Imcuw3+qkRvE2InmN2JZJUMge9SCoX p3OulI+HWFL/WPJJiKMOdUhJz+3DP9WcU5jDkCES1AWih3SV1rYOxf3xQ 29fdvgSogEw3sHQg5VwhHiWwtzNAMq8eoUmUuWxSbQXJa5N8RGuZL3Uzf Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10894"; a="3937280" X-IronPort-AV: E=Sophos;i="6.03,304,1694761200"; d="scan'208";a="3937280" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Nov 2023 22:54:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.03,304,1694761200"; d="scan'208";a="6048782" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Nov 2023 22:54:26 -0800 From: "Huang, Ying" To: Huan Yang Cc: Michal Hocko , Tejun Heo , Zefan Li , Johannes Weiner , "Jonathan Corbet" , Roman Gushchin , "Shakeel Butt" , Muchun Song , "Andrew Morton" , David Hildenbrand , Matthew Wilcox , Kefeng Wang , Peter Xu , "Vishal Moola (Oracle)" , Yosry Ahmed , "Liu Shixin" , Hugh Dickins , , , , , Subject: Re: [RFC 0/4] Introduce unbalance proactive reclaim In-Reply-To: <97a3dbb3-9e73-4dcc-877d-f491ff47363b@vivo.com> (Huan Yang's message of "Mon, 13 Nov 2023 16:26:00 +0800") References: <87msvniplj.fsf@yhuang6-desk2.ccr.corp.intel.com> <1e699ff2-0841-490b-a8e7-bb87170d5604@vivo.com> <6b539e16-c835-49ff-9fae-a65960567657@vivo.com> <87edgufakm.fsf@yhuang6-desk2.ccr.corp.intel.com> <87a5rif58s.fsf@yhuang6-desk2.ccr.corp.intel.com> <97a3dbb3-9e73-4dcc-877d-f491ff47363b@vivo.com> Date: Wed, 15 Nov 2023 14:52:24 +0800 Message-ID: <87jzqjecev.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 14 Nov 2023 22:54:48 -0800 (PST) Huan Yang writes: > =E5=9C=A8 2023/11/13 16:05, Huang, Ying =E5=86=99=E9=81=93: >> Huan Yang writes: >> >>> =E5=9C=A8 2023/11/13 14:10, Huang, Ying =E5=86=99=E9=81=93: >>>> Huan Yang writes: >>>> >>>>> =E5=9C=A8 2023/11/10 20:24, Michal Hocko =E5=86=99=E9=81=93: >>>>>> On Fri 10-11-23 11:48:49, Huan Yang wrote: >>>>>> [...] >>>>>>> Also, When the application enters the foreground, the startup speed >>>>>>> may be slower. Also trace show that here are a lot of block I/O. >>>>>>> (usually 1000+ IO count and 200+ms IO Time) We usually observe very >>>>>>> little block I/O caused by zram refault.(read: 1698.39MB/s, write: >>>>>>> 995.109MB/s), usually, it is faster than random disk reads.(read: >>>>>>> 48.1907MB/s write: 49.1654MB/s). This test by zram-perf and I chang= e a >>>>>>> little to test UFS. >>>>>>> >>>>>>> Therefore, if the proactive reclamation encounters many file pages, >>>>>>> the application may become slow when it is opened. >>>>>> OK, this is an interesting information. From the above it seems that >>>>>> storage based IO refaults are order of magnitude more expensive than >>>>>> swap (zram in this case). That means that the memory reclaim should >>>>>> _in general_ prefer anonymous memory reclaim over refaulted page cac= he, >>>>>> right? Or is there any reason why "frozen" applications are any >>>>>> different in this case? >>>>> Frozen applications mean that the application process is no longer ac= tive, >>>>> so once its private anonymous page data is swapped out, the anonymous >>>>> pages will not be refaulted until the application becomes active agai= n. >>>>> >>>>> On the contrary, page caches are usually shared. Even if the >>>>> application that >>>>> first read the file is no longer active, other processes may still >>>>> read the file. >>>>> Therefore, it is not reasonable to use the proactive reclamation >>>>> interface to >>>>> reclaim=C2=A0page caches without considering memory pressure. >>>> No. Not all page caches are shared. For example, the page caches used >>>> for use-once streaming IO. And, they should be reclaimed firstly. >>> Yes, but this part is done very well in MGLRU and does not require our >>> intervention. >>> Moreover, the reclaim speed of clean files is very fast, but compared t= o it, >>> the reclaim speed of anonymous pages is a bit slower. >>>> So, your solution may work good for your specific use cases, but it's >>> Yes, this approach is not universal. >>>> not a general solution. Per my understanding, you want to reclaim only >>>> private pages to avoid impact the performance of other applications. >>>> Privately mapped anonymous pages is easy to be identified (And I sugge= st >>>> that you can find a way to avoid reclaim shared mapped anonymous pages= ). >>> Yes, it is not good to reclaim shared anonymous pages, and it needs to = be >>> identified. In the future, we will consider how to filter them. >>> Thanks. >>>> There's some heuristics to identify use-once page caches in reclaiming >>>> code. Why doesn't it work for your situation? >>> As mentioned above, the default reclaim algorithm is suitable for recyc= ling >>> file pages, but we do not need to intervene in it. >>> Direct reclaim or kswapd of these use-once file pages is very fast and = will >>> not cause lag or other effects. >>> Our overall goal is to actively and reasonably compress unused anonymous >>> pages based on certain strategies, in order to increase available memor= y to >>> a certain extent, avoid lag, and prevent applications from being killed. >>> Therefore, using the proactive reclaim interface, combined with LRU >>> algorithm >>> and reclaim tendencies, is a good way to achieve our goal. >> If so, why can't you just use the proactive reclaim with some large >> enough swappiness? That will reclaim use-once page caches and compress > This works very well for proactive memory reclaim that is only > executed once. > However, considering that we need to perform proactive reclaim in batches, > suppose that only 5% of the use-once page cache in this memcg can be > reclaimed, > but we need to call proactive memory reclaim step by step, such as 5%, > 10%, 15% ... 100%. > Then, the page cache may be reclaimed due to the balancing adjustment > of reclamation, > even if the 5% of use-once pages are reclaimed. We may still touch on > shared file pages. > (If I misunderstood anything, please correct me.) If the proactive reclaim amount is less than the size of anonymous pages, I think that you are safe. For example, if the size of anonymous pages is 100MB, the size of use-once file pages is 10MB, the size of shared file pages is 20MB. Then if you reclaim 100MB proactively with swappiness=3D200, you will reclaim 10MB use-once file pages and 90MB anonymous pages. In the next time, if you reclaim 10MB proactively, you will still not reclaim shared file pages. > We previously used the two values of modifying swappiness to 200 and 0 > to adjust reclaim > tendencies. However, the debug interface showed that some file pages > were reclaimed, > and after being actively reclaimed, some applications and the reopened > applications that were > reclaimed had some block IO and startup lag. If so, please research why use-once file page heuristics not work and try to fix it or raise the issue. > This way of having incomplete control over the process maybe is not > suitable for proactive memory > reclaim. Instead, with an proactive reclaim interface with tendencies, > we can issue a > 5% page cache trim once and then gradually reclaim anonymous pages. >> anonymous pages. So, more applications can be kept in memory before >> passive reclaiming or killing background applications? -- Best Regards, Huang, Ying