Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp1126514ybv; Fri, 7 Feb 2020 15:06:48 -0800 (PST) X-Google-Smtp-Source: APXvYqyu3/UGZQEZ0DncABWoKj2BQur5GxiEwWM0hc6+o5p6Hrju7HxaoGW4CvWD0eDOEILs5/XL X-Received: by 2002:a05:6830:1385:: with SMTP id d5mr1397670otq.61.1581116808807; Fri, 07 Feb 2020 15:06:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581116808; cv=none; d=google.com; s=arc-20160816; b=vogyKuFPpbCXXTvhtnKTfBQOqa7v2mYUJOP62gJWv++omCHeOXwW1RMd896TgKLmkJ /ZylijrsA5JWF18UCqLzlj1YllIBXMqjkGshTgVXAQzGVIEESuEXivF8IEaORwRgdnW/ 9Aguq1X/LsgPjEvYe6vK2lE+oYJ9+QReUn1cc4mNSZaSOz43gNIhSVitwlK/kQOcb6ov SoBR4J6v/MZ5j6wI8cKrVmwAm5YdkQWD4bX5Div2EDVo/zEaNkj1FGFiAlGH38ITvJD+ UgONlJGOa1pjnC3EdDG3trkY6RhUORqtev3WjjxD6CcjhY6d8u3fyx8nbdSCWQG5BH66 n1wA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:date:cc:to:from:subject:message-id; bh=vk5ZzonVmG7e9ctCq6B8wzeBuF7vsOL3BlN3MPeJAXA=; b=PjWJu4d2FGm9xkr680UFNt+mAXSPIOMQyLS8pDLRh2CAQ94sh6mDkXI8YHK7B3AmNu gzJhzUuXmSwyKFbi0I6FCWjNoYx8OSjVaKUr4dQqKIwjFkvEhItAUgmWsW9R48jxRACE gQfFvBmUf5MnUq4FNurWdZiuptWbaQ5mj/kXFRg1jlTySrg0bL3sK3C5ZrV1LzoxfsM8 8UeAo2zNyhi53s5beZZvKa2ggbXIs9biNjK0z/uRD84yLkaN71/yqusVBZgiGmId01pG DpKEEJy0oK5DDzSXSKl0VrVMeuTx5TZI+xhH/RzD45ME7/QfXDoyS0F1DtY2HiAvv62e XH7Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i3si406617otc.272.2020.02.07.15.06.37; Fri, 07 Feb 2020 15:06:48 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727443AbgBGXFP (ORCPT + 99 others); Fri, 7 Feb 2020 18:05:15 -0500 Received: from shelob.surriel.com ([96.67.55.147]:35972 "EHLO shelob.surriel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727048AbgBGXFP (ORCPT ); Fri, 7 Feb 2020 18:05:15 -0500 Received: from imladris.surriel.com ([96.67.55.152]) by shelob.surriel.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.92.3) (envelope-from ) id 1j0CgO-0000fj-P5; Fri, 07 Feb 2020 18:05:08 -0500 Message-ID: Subject: Re: Reclaim regression after 1c30844d2dfe From: Rik van Riel To: Ivan Babrou , linux-mm@kvack.org, Mel Gorman Cc: linux-kernel , kernel-team , Andrew Morton , Mel Gorman , Vlastimil Babka Date: Fri, 07 Feb 2020 18:05:08 -0500 In-Reply-To: References: Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-6A9rXYC18lShoxYcdkQX" User-Agent: Evolution 3.34.2 (3.34.2-1.fc31) MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-6A9rXYC18lShoxYcdkQX Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 2020-02-07 at 14:54 -0800, Ivan Babrou wrote: > This change from 5.5 times: >=20 > * https://github.com/torvalds/linux/commit/1c30844d2dfe >=20 > > mm: reclaim small amounts of memory when an external fragmentation > > event occurs >=20 > Introduced undesired effects in our environment. >=20 > * NUMA with 2 x CPU > * 128GB of RAM > * THP disabled > * Upgraded from 4.19 to 5.4 >=20 > Before we saw free memory hover at around 1.4GB with no spikes. After > the upgrade we saw some machines decide that they need a lot more > than > that, with frequent spikes above 10GB, often only on a single numa > node. >=20 > We can see kswapd quite active in balance_pgdat (it didn't look like > it slept at all): >=20 > $ ps uax | fgrep kswapd > root 1850 23.0 0.0 0 0 ? R Jan30 1902:24 > [kswapd0] > root 1851 1.8 0.0 0 0 ? S Jan30 152:16 > [kswapd1] >=20 > This in turn massively increased pressure on page cache, which did > not > go well to services that depend on having a quick response from a > local cache backed by solid storage. >=20 > Here's how it looked like when I zeroed vm.watermark_boost_factor: We have observed the same thing, even on single node systems. I have some hacky patches to apply the watermark_boost thing on a per pgdat basis, which seems to resolve the issue, but I have not yet found the time to get the locking for that correct. Given how rare the watermark boosting is, maybe the answer is just to use atomics? Not sure :) --=20 All Rights Reversed. --=-6A9rXYC18lShoxYcdkQX Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEKR73pCCtJ5Xj3yADznnekoTE3oMFAl497SQACgkQznnekoTE 3oNxGQf+OJuEbFhLoPE5sLSrff3Oy2r6y4K+3zTbgwCor6u81/US1dw02nOM7SDM Be/cv5hHzWe6RlAyTLuvEdw1GGB9iDV6p4VtcpeaG9wIUCXj2YLVP6JK8H5OLc5U fQLXBdGlauAYUUIXOvIzf5jXr4AWf2ta5hzz3lyYPBJjfTFhHd3Qjh6Ovf2KTG2P VukeNv2Df+eVDfSKSPrY4vpXkmlQc4hmkJQyQ1yo7Vek9ImE8T/YdmP7jvNsNbEe oncCPd1nbUu45CO3ikmX+LLksNF13+k7w5sL+s8mO2XVfDp/AEeuZLWJKvLTFXpL n1PBkdVntgV7ACE9yhRE8kOv0c956g== =xCiA -----END PGP SIGNATURE----- --=-6A9rXYC18lShoxYcdkQX--