Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755317AbZFHNCI (ORCPT ); Mon, 8 Jun 2009 09:02:08 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754191AbZFHNBc (ORCPT ); Mon, 8 Jun 2009 09:01:32 -0400 Received: from gir.skynet.ie ([193.1.99.77]:53626 "EHLO gir.skynet.ie" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752151AbZFHNBb (ORCPT ); Mon, 8 Jun 2009 09:01:31 -0400 From: Mel Gorman To: Mel Gorman , KOSAKI Motohiro , Rik van Riel , Christoph Lameter , yanmin.zhang@intel.com, Wu Fengguang , linuxram@us.ibm.com Cc: linux-mm , LKML Subject: [PATCH 1/3] Reintroduce zone_reclaim_interval for when zone_reclaim() scans and fails to avoid CPU spinning at 100% on NUMA Date: Mon, 8 Jun 2009 14:01:28 +0100 Message-Id: <1244466090-10711-2-git-send-email-mel@csn.ul.ie> X-Mailer: git-send-email 1.5.6.5 In-Reply-To: <1244466090-10711-1-git-send-email-mel@csn.ul.ie> References: <1244466090-10711-1-git-send-email-mel@csn.ul.ie> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6909 Lines: 177 On NUMA machines, the administrator can configure zone_reclaim_mode that is a more targetted form of direct reclaim. On machines with large NUMA distances, zone_reclaim_mode defaults to 1 meaning that clean unmapped pages will be reclaimed if the zone watermarks are not being met. The problem is that zone_reclaim() can be in a situation where it scans excessively without making progress. One such situation is where a large tmpfs mount is occupying a large percentage of memory overall. The pages do not get cleaned or reclaimed by zone_reclaim(), but the lists are uselessly scanned frequencly making the CPU spin at 100%. The scanning occurs because zone_reclaim() cannot tell in advance the scan is pointless because the counters do not distinguish between pagecache pages backed by disk and by RAM. The observation in the field is that malloc() stalls for a long time (minutes in some cases) when this situation occurs. Accounting for ram-backed file pages was considered but not implemented on the grounds it would be introducing new branches and expensive checks into the page cache add/remove patches and increase the number of statistics needed in the zone. As zone_reclaim() failing is currently considered a corner case, this seemed like overkill. Note, if there are a large number of reports about CPU spinning at 100% on NUMA that is fixed by disabling zone_reclaim, then this assumption is false and zone_reclaim() scanning and failing is not a corner case but a common occurance This patch reintroduces zone_reclaim_interval which was removed by commit 34aa1330f9b3c5783d269851d467326525207422 [zoned vm counters: zone_reclaim: remove /proc/sys/vm/zone_reclaim_interval] because the zone counters were considered sufficient to determine in advance if the scan would succeed. As unsuccessful scans can still occur, zone_reclaim_interval is still required. Signed-off-by: Mel Gorman min_slab_pages) return 0; + /* Do not attempt a scan if scanning failed recently */ + if (time_before(jiffies, + zone->zone_reclaim_failure + zone_reclaim_interval)) + return 0; + if (zone_is_all_unreclaimable(zone)) return 0; @@ -2414,6 +2426,16 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order) ret = __zone_reclaim(zone, gfp_mask, order); zone_clear_flag(zone, ZONE_RECLAIM_LOCKED); + if (!ret) { + /* + * We were unable to reclaim enough pages to stay on node and + * unable to detect in advance that the scan would fail. Allow + * off node accesses for zone_reclaim_inteval jiffies before + * trying zone_reclaim() again + */ + zone->zone_reclaim_failure = jiffies; + } + return ret; } #endif -- 1.5.6.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/