Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756665Ab3CTSUE (ORCPT ); Wed, 20 Mar 2013 14:20:04 -0400 Received: from cantor2.suse.de ([195.135.220.15]:51271 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755269Ab3CTSUB (ORCPT ); Wed, 20 Mar 2013 14:20:01 -0400 Date: Wed, 20 Mar 2013 18:19:57 +0000 From: Mel Gorman To: Andrew Morton Cc: Michal Hocko , Hedi Berriche , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim() Message-ID: <20130320181957.GA1878@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2836 Lines: 74 The following problem was reported against a distribution kernel when zone_reclaim was enabled but the same problem applies to the mainline kernel. The reproduction case was as follows 1. Run numactl -m +0 dd if=largefile of=/dev/null This allocates a large number of clean pages in node 0 2. numactl -N +0 memhog 0.5*Mg This start a memory-using application in node 0. The expected behaviour is that the clean pages get reclaimed and the application uses node 0 for its memory. The observed behaviour was that the memory for the memhog application was allocated off-node since commits cd38b11 (mm: page allocator: initialise ZLC for first zone eligible for zone_reclaim) and commit 76d3fbf (mm: page allocator: reconsider zones for allocation after direct reclaim). The assumption of those patches was that it was always preferable to allocate quickly than stall for long periods of time and they were meant to take care that the zone was only marked full when necessary but an important case was missed. In the allocator fast path, only the low watermarks are checked. If the zones free pages are between the low and min watermark then allocations from the allocators slow path will succeed. However, zone_reclaim will only reclaim SWAP_CLUSTER_MAX or 1< Signed-off-by: Mel Gorman --- mm/page_alloc.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8fcced7..adce823 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1940,9 +1940,24 @@ zonelist_scan: continue; default: /* did we reclaim enough */ - if (!zone_watermark_ok(zone, order, mark, + if (zone_watermark_ok(zone, order, mark, classzone_idx, alloc_flags)) + goto try_this_zone; + + /* + * Failed to reclaim enough to meet watermark. + * Only mark the zone full if checking the min + * watermark or if we failed to reclaim just + * 1<