Date: Wed, 16 Feb 2011 13:38:57 +0100
From: Johannes Weiner <hannes@cmpxchg.org>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>,
        Andrea Arcangeli <aarcange@redhat.com>, Rik van Riel <riel@redhat.com>,
        Michal Hocko <mhocko@suse.cz>,
        Kent Overstreet <kent.overstreet@gmail.com>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due to
 insufficient progress if !__GFP_REPEAT
Message-ID: <20110216123857.GE2380@cmpxchg.org>
References: <20110209154606.GJ27110@cmpxchg.org>
 <20110209164656.GA1063@csn.ul.ie>
 <20110209182846.GN3347@random.random>
 <20110210102109.GB17873@csn.ul.ie>
 <20110210124838.GU3347@random.random>
 <20110210133323.GH17873@csn.ul.ie>
 <20110210141447.GW3347@random.random>
 <20110210145813.GK17873@csn.ul.ie>
 <20110216095048.GA4473@csn.ul.ie>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110216095048.GA4473@csn.ul.ie>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5796
Lines: 116

On Wed, Feb 16, 2011 at 09:50:49AM +0000, Mel Gorman wrote:
> should_continue_reclaim() for reclaim/compaction allows scanning to continue
> even if pages are not being reclaimed until the full list is scanned. In
> terms of allocation success, this makes sense but potentially it introduces
> unwanted latency for high-order allocations such as transparent hugepages
> and network jumbo frames that would prefer to fail the allocation attempt
> and fallback to order-0 pages.  Worse, there is a potential that the full
> LRU scan will clear all the young bits, distort page aging information and
> potentially push pages into swap that would have otherwise remained resident.
> 
> This patch will stop reclaim/compaction if no pages were reclaimed in the
> last SWAP_CLUSTER_MAX pages that were considered. For allocations such as
> hugetlbfs that use GFP_REPEAT and have fewer fallback options, the full LRU
> list may still be scanned.
> 
> To test this, a tool was developed based on ftrace that tracked the latency of
> high-order allocations while transparent hugepage support was enabled and three
> benchmarks were run. The "fix-infinite" figures are 2.6.38-rc4 with Johannes's
> patch "vmscan: fix zone shrinking exit when scan work is done" applied.
> 
> STREAM Highorder Allocation Latency Statistics
> 	       fix-infinite	break-early
> 1 :: Count            10298           10229
> 1 :: Min             0.4560          0.4640
> 1 :: Mean            1.0589          1.0183
> 1 :: Max            14.5990         11.7510
> 1 :: Stddev          0.5208          0.4719
> 2 :: Count                2               1
> 2 :: Min             1.8610          3.7240
> 2 :: Mean            3.4325          3.7240
> 2 :: Max             5.0040          3.7240
> 2 :: Stddev          1.5715          0.0000
> 9 :: Count           111696          111694
> 9 :: Min             0.5230          0.4110
> 9 :: Mean           10.5831         10.5718
> 9 :: Max            38.4480         43.2900
> 9 :: Stddev          1.1147          1.1325
> 
> Mean time for order-1 allocations is reduced. order-2 looks increased
> but with so few allocations, it's not particularly significant. THP mean
> allocation latency is also reduced. That said, allocation time varies so
> significantly that the reductions are within noise.
> 
> Max allocation time is reduced by a significant amount for low-order
> allocations but reduced for THP allocations which presumably are now
> breaking before reclaim has done enough work.
> 
> SysBench Highorder Allocation Latency Statistics
> 	       fix-infinite	break-early
> 1 :: Count            15745           15677
> 1 :: Min             0.4250          0.4550
> 1 :: Mean            1.1023          1.0810
> 1 :: Max            14.4590         10.8220
> 1 :: Stddev          0.5117          0.5100
> 2 :: Count                1               1
> 2 :: Min             3.0040          2.1530
> 2 :: Mean            3.0040          2.1530
> 2 :: Max             3.0040          2.1530
> 2 :: Stddev          0.0000          0.0000
> 9 :: Count             2017            1931
> 9 :: Min             0.4980          0.7480
> 9 :: Mean           10.4717         10.3840
> 9 :: Max            24.9460         26.2500
> 9 :: Stddev          1.1726          1.1966
> 
> Again, mean time for order-1 allocations is reduced while order-2 allocations
> are too few to draw conclusions from. The mean time for THP allocations is
> also slightly reduced albeit the reductions are within varianes.
> 
> Once again, our maximum allocation time is significantly reduced for
> low-order allocations and slightly increased for THP allocations.
> 
> Anon stream mmap reference Highorder Allocation Latency Statistics
> 1 :: Count             1376            1790
> 1 :: Min             0.4940          0.5010
> 1 :: Mean            1.0289          0.9732
> 1 :: Max             6.2670          4.2540
> 1 :: Stddev          0.4142          0.2785
> 2 :: Count                1               -
> 2 :: Min             1.9060               -
> 2 :: Mean            1.9060               -
> 2 :: Max             1.9060               -
> 2 :: Stddev          0.0000               -
> 9 :: Count            11266           11257
> 9 :: Min             0.4990          0.4940
> 9 :: Mean        27250.4669      24256.1919
> 9 :: Max      11439211.0000    6008885.0000
> 9 :: Stddev     226427.4624     186298.1430
> 
> This benchmark creates one thread per CPU which references an amount of
> anonymous memory 1.5 times the size of physical RAM. This pounds swap quite
> heavily and is intended to exercise THP a bit.
> 
> Mean allocation time for order-1 is reduced as before. It's also reduced
> for THP allocations but the variations here are pretty massive due to swap.
> As before, maximum allocation times are significantly reduced.
> 
> Overall, the patch reduces the mean and maximum allocation latencies for
> the smaller high-order allocations. This was with Slab configured so it
> would be expected to be more significant with Slub which uses these size
> allocations more aggressively.
> 
> The mean allocation times for THP allocations are also slightly reduced.
> The maximum latency was slightly increased as predicted by the comments due
> to reclaim/compaction breaking early. However, workloads care more about the
> latency of lower-order allocations than THP so it's an acceptable trade-off.
> Please consider merging for 2.6.38.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/