Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933213Ab3CTSpP (ORCPT ); Wed, 20 Mar 2013 14:45:15 -0400 Received: from mail-ee0-f50.google.com ([74.125.83.50]:45196 "EHLO mail-ee0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757646Ab3CTSpM (ORCPT ); Wed, 20 Mar 2013 14:45:12 -0400 Date: Wed, 20 Mar 2013 19:45:08 +0100 From: Michal Hocko To: Mel Gorman Cc: Andrew Morton , Hedi Berriche , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim() Message-ID: <20130320184508.GB970@dhcp22.suse.cz> References: <20130320181957.GA1878@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130320181957.GA1878@suse.de> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3091 Lines: 82 On Wed 20-03-13 18:19:57, Mel Gorman wrote: > The following problem was reported against a distribution kernel when > zone_reclaim was enabled but the same problem applies to the mainline > kernel. The reproduction case was as follows > > 1. Run numactl -m +0 dd if=largefile of=/dev/null > This allocates a large number of clean pages in node 0 > > 2. numactl -N +0 memhog 0.5*Mg > This start a memory-using application in node 0. > > The expected behaviour is that the clean pages get reclaimed and the > application uses node 0 for its memory. The observed behaviour was that > the memory for the memhog application was allocated off-node since commits > cd38b11 (mm: page allocator: initialise ZLC for first zone eligible for > zone_reclaim) and commit 76d3fbf (mm: page allocator: reconsider zones > for allocation after direct reclaim). > > The assumption of those patches was that it was always preferable to > allocate quickly than stall for long periods of time and they were > meant to take care that the zone was only marked full when necessary but > an important case was missed. > > In the allocator fast path, only the low watermarks are checked. If the > zones free pages are between the low and min watermark then allocations > from the allocators slow path will succeed. However, zone_reclaim > will only reclaim SWAP_CLUSTER_MAX or 1< guarantee that this will meet the low watermark causing the zone to be > marked full prematurely. > > This patch will only mark the zone full after zone_reclaim if it the min > watermarks are checked or if page reclaim failed to make sufficient > progress. > > Reported-and-tested-by: Hedi Berriche > Signed-off-by: Mel Gorman Reviewed-by: Michal Hocko > --- > mm/page_alloc.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8fcced7..adce823 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1940,9 +1940,24 @@ zonelist_scan: > continue; > default: > /* did we reclaim enough */ > - if (!zone_watermark_ok(zone, order, mark, > + if (zone_watermark_ok(zone, order, mark, > classzone_idx, alloc_flags)) > + goto try_this_zone; > + > + /* > + * Failed to reclaim enough to meet watermark. > + * Only mark the zone full if checking the min > + * watermark or if we failed to reclaim just > + * 1< + * fastpath will prematurely mark zones full > + * when the watermark is between the low and > + * min watermarks. > + */ > + if ((alloc_flags & ALLOC_WMARK_MIN) || > + ret == ZONE_RECLAIM_SOME) > goto this_zone_full; > + > + continue; > } > } > -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/