Received: by 2002:ab2:7903:0:b0:1fb:b500:807b with SMTP id a3csp1455557lqj; Tue, 4 Jun 2024 01:59:18 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCW73qNCHInQ+UKIL3JVn3FdCLlPm689qJP8ClE7nyIUp4f8pyyb7lzJDUy3moctkEPwGfVR5hVk+M+mJCsxfaed4SGRwLQcG90k0hwxNg== X-Google-Smtp-Source: AGHT+IFjNj5/oQaW09L+Ok9Y+ieovLGQsR2DxBZQ2863X7NqRy542FMOQBW+HcrDHWhQ3vbcKJ1I X-Received: by 2002:a05:622a:14cb:b0:43d:e0ce:951f with SMTP id d75a77b69052e-43ff5262e58mr137944461cf.25.1717491558547; Tue, 04 Jun 2024 01:59:18 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717491558; cv=pass; d=google.com; s=arc-20160816; b=qe8UipXMuIAqEppB86yp3AfalNaHq72ZY/6TYAZ3iyHPA/39qGiQerUe3/Hsk5Ro1a zmAXj/5C+tRypWvFWxhTQfYPYI6iUFa5rBwK/U1XlaZLhEwprtkjFvlm4WemvDilV4RZ Ajoo7PM2HGDbND0qspu9rGuPS14+549yeGK87jFgK5BVl2XRkUb9mH7ITcJjSVEAzIwL gbMnHTXPFc0h14YWSEI0VyLWLiF9WJPs+pNLASA9gYJUKldPv1ZO8WR9flqi9wDpnlJH 3MWhz2pZ3Wb8NMO0Oqcetr4OtB5tqrkkl3R+LTLQa3FlfDT5VP1wsaTV85KpBmsHQq2r C2GQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:message-id:date:references:in-reply-to:subject:cc:to :from:dkim-signature; bh=i2NWpA2DRB/sYhZ/PJ+Iu5osHh6INiCHxEBFEQp/0To=; fh=0RSGDfl5ukT7k0klDSte7hEfY9c5ddallOcq2kSIhN4=; b=x00tMHUH2MMFg847Xya2DEzV4bn22hy0VsSraBYdc4/ZShpA8pmU4rfK6jo8vcSPcK 7SQOKn0Cp1bYWmpvslzhsqH5wCOI13M7USGt6aaZi2urT1kiKTPDm7BVU/0MQTzOtLe6 v7/eImJfk55zVWmuzYRKA8JG8gx6fTJxhMPzTAVB81s0zFfue4Ue3tAuuPqbOdEPyGm7 hUGM4oGu4jnpTfF5U7zutXWDgq/qF8Lvs0vPsJOzCR99YRXPt+k8d4oIGAiSpqW+r3Je 92Gs5jROOnz4IC091ZiuXYxY4+Sgrf7uGdXHOQoY0bqV+xGAiWlAzuEfyomlKODNGh8m PyLg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FZdgP9jW; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-200296-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-200296-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id d75a77b69052e-43ff25b6fb3si4591151cf.781.2024.06.04.01.59.18 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jun 2024 01:59:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-200296-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FZdgP9jW; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-200296-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-200296-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 489A91C20D62 for ; Tue, 4 Jun 2024 08:59:18 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6508B142E8E; Tue, 4 Jun 2024 08:59:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="FZdgP9jW" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3E821420DA for ; Tue, 4 Jun 2024 08:59:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717491553; cv=none; b=tBshfYbmn1Svt4f/sunPIsSljjcmga4q0CTfI2YU30INlBaKXZjReWvVBx6ZjDdGbeoq9UqvkR59kObX94jnFqWsG2H1GEcT08zNlhv2Ga4eWWrR6EEePelj3Cpw42zd0gy7Vmw4UqSz/VmoICR1FyudQ/xfrOd6dEcZzwiuM5o= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717491553; c=relaxed/simple; bh=foga+ezmZu/anp+IsqwBt0kvNai79GaLxGDoNOC9VXU=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=Bo/OThxXnL8OCsuq6YBTDASWtTNcFeiT7Qf9+v7nOqhe2BD3Drt7xMcYQIsPQk/4OLJNHB7S08b1QV87N6vA0fAfUsErDazVSkcb2DlgtDjnbxW+ReVyV5RYJAEIhjU1hw3U2YRmLVHl7GdlEL/duuLW9IMCrk928wDnbvtdQ1E= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=FZdgP9jW; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1717491552; x=1749027552; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=foga+ezmZu/anp+IsqwBt0kvNai79GaLxGDoNOC9VXU=; b=FZdgP9jWeq8nkGML4xdN4pfFkg0iHMrL0cuZ7x5+9zuKpiOp8WSehYAB 3nC8S0qKJAwIYJO5q8wJZLjx+kRpQNxjv3P+HmSeQ8j9kPEPYza2PsMOd CjGNoYFIC7S/WUgLQAg7lqLU+lFoMEIJFJJmO+RTsbTIsMDGYU+/gTe4S 0dDeI62c8TTDWuC16YLQD8Uge5eYpXP8xRwuY21UJ9TVdBMcvZCbRFtY7 NbUPV8M5JP1KGmGwjhjyGgRjJi80zMTzIjHdH4dLn4AbQZJ9w6mVolOJH QIOCC7N0phDmP+EfIZU4XxLa1OF0f7ZhAPP3wiry/+BjGHcMpXkqF33zs g==; X-CSE-ConnectionGUID: 8GacmL56RTC8s+fUcIghMg== X-CSE-MsgGUID: 8HoWqaRQTRqjg9N2Kggqrw== X-IronPort-AV: E=McAfee;i="6600,9927,11092"; a="14179821" X-IronPort-AV: E=Sophos;i="6.08,213,1712646000"; d="scan'208";a="14179821" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jun 2024 01:59:12 -0700 X-CSE-ConnectionGUID: e3dbYn1XQL6/3zIGtypTxQ== X-CSE-MsgGUID: bqV6yt05SM2uw38h0tSiaw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,213,1712646000"; d="scan'208";a="74676631" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jun 2024 01:59:09 -0700 From: "Huang, Ying" To: Byungchul Park Cc: , , , , , , Subject: Re: [PATCH v2] mm: let kswapd work again for node that used to be hopeless but may not now In-Reply-To: <20240604084533.GA68919@system.software.com> (Byungchul Park's message of "Tue, 4 Jun 2024 17:45:33 +0900") References: <20240604072323.10886-1-byungchul@sk.com> <87bk4hcf7h.fsf@yhuang6-desk2.ccr.corp.intel.com> <20240604084533.GA68919@system.software.com> Date: Tue, 04 Jun 2024 16:57:17 +0800 Message-ID: <8734ptccgi.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Byungchul Park writes: > On Tue, Jun 04, 2024 at 03:57:54PM +0800, Huang, Ying wrote: >> Byungchul Park writes: >> >> > Changes from v1: >> > 1. Don't allow to resume kswapd if the system is under memory >> > pressure that might affect direct reclaim by any chance, like >> > if NR_FREE_PAGES is less than (low wmark + min wmark)/2. >> > >> > --->8--- >> > From 6c73fc16b75907f5da9e6b33aff86bf7d7c9dd64 Mon Sep 17 00:00:00 2001 >> > From: Byungchul Park >> > Date: Tue, 4 Jun 2024 15:27:56 +0900 >> > Subject: [PATCH v2] mm: let kswapd work again for node that used to be hopeless but may not now >> > >> > A system should run with kswapd running in background when under memory >> > pressure, such as when the available memory level is below the low water >> > mark and there are reclaimable folios. >> > >> > However, the current code let the system run with kswapd stopped if >> > kswapd has been stopped due to more than MAX_RECLAIM_RETRIES failures >> > until direct reclaim will do for that, even if there are reclaimable >> > folios that can be reclaimed by kswapd. This case was observed in the >> > following scenario: >> > >> > CONFIG_NUMA_BALANCING enabled >> > sysctl_numa_balancing_mode set to NUMA_BALANCING_MEMORY_TIERING >> > numa node0 (500GB local DRAM, 128 CPUs) >> > numa node1 (100GB CXL memory, no CPUs) >> > swap off >> > >> > 1) Run a workload with big anon pages e.g. mmap(200GB). >> > 2) Continue adding the same workload to the system. >> > 3) The anon pages are placed in node0 by promotion/demotion. >> > 4) kswapd0 stops because of the unreclaimable anon pages in node0. >> > 5) Kill the memory hoggers to restore the system. >> > >> > After restoring the system at 5), the system starts to run without >> > kswapd. Even worse, tiering mechanism is no longer able to work since >> > the mechanism relies on kswapd for demotion. >> >> We have run into the situation that kswapd is kept in failure state for >> long in a multiple tiers system. I think that your solution is too > > My solution just gives a chance for kswapd to work again even if > kswapd_failures >= MAX_RECLAIM_RETRIES, if there are potential > reclaimable folios. That's it. > >> limited, because OOM killing may not happen, while the access pattern of > > I don't get this. OOM will happen as is, through direct reclaim. A system that fails to reclaim via kswapd may succeed to reclaim via direct reclaim, because more CPUs are used to scanning the page tables. In a system with NUMA balancing based page promotion and page demotion enabled, page promotion will wake up kswapd, but kswapd may fail in some situations. But page promotion will no trigger direct reclaim or OOM. >> the workloads may change. We have a preliminary and simple solution for >> this as follows, >> >> https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/commit/?h=tiering-0.8&id=17a24a354e12d4d4675d78481b358f668d5a6866 > > Whether tiering is involved or not, the same problem can arise if > kswapd gets stopped due to kswapd_failures >= MAX_RECLAIM_RETRIES. Your description is about tiering too. Can you describe a situation without tiering? -- Best Regards, Huang, Ying > Byungchul > >> where we will try to wake up kswapd to check every 10 seconds if kswapd >> is in failure state. This is another possible solution. >> >> > However, the node0 has pages newly allocated after 5), that might or >> > might not be reclaimable. Since those are potentially reclaimable, it's >> > worth hopefully trying reclaim by allowing kswapd to work again. >> > >> >> [snip] >> >> -- >> Best Regards, >> Huang, Ying