Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932599AbXHIVG3 (ORCPT ); Thu, 9 Aug 2007 17:06:29 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756088AbXHIVGT (ORCPT ); Thu, 9 Aug 2007 17:06:19 -0400 Received: from gir.skynet.ie ([193.1.99.77]:38843 "EHLO gir.skynet.ie" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755613AbXHIVGS (ORCPT ); Thu, 9 Aug 2007 17:06:18 -0400 From: Mel Gorman To: Lee.Schermerhorn@hp.com, ak@suse.de, clameter@sgi.com Cc: Mel Gorman , linux-kernel@vger.kernel.org, linux-mm@kvack.org Message-Id: <20070809210616.14702.73376.sendpatchset@skynet.skynet.ie> Subject: [PATCH 0/4] Use one zonelist per node instead of multiple zonelists v3 Date: Thu, 9 Aug 2007 22:06:16 +0100 (IST) Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4023 Lines: 83 Changelog since V2 o shrink_zones() uses zonelist instead of zonelist->zones o hugetlb uses zonelist iterator o zone_idx information is embedded in zonelist pointers o replace NODE_DATA(nid)->node_zonelist with node_zonelist(nid) Changelog since V1 o Break up the patch into 3 patches o Introduce iterators for zonelists o Performance regression test The following patches replace multiple zonelists per node with one zonelist that is filtered based on the GFP flags. The patches as a set fix a bug with regard to the use of MPOL_BIND and ZONE_MOVABLE. With this patchset, the MPOL_BIND will apply to the two highest zones when the highest zone is ZONE_MOVABLE. This should be considered as an alternative fix for the MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that filters only custom zonelists. As a bonus, the patchset reduces the cache footprint of the kernel and should improve performance in a number of cases. The first patch cleans up an inconsitency where direct reclaim uses zonelist->zones where other places use zonelist. The second patch replaces multiple zonelists with one zonelist that is filtered. The final patch is a fix that depends on the previous two patches. The patch changes policy zone so that the MPOL_BIND policy gets applied to the two highest populated zones when the highest populated zone is ZONE_MOVABLE. Otherwise, MPOL_BIND only applies to the highest populated zone. The tests passed regression tests with numactltest. Performance results varied depending on the machine configuration but were usually small performance gains. The new algorithm relies heavily on the implementation of zone_idx which is currently pretty expensive. Experiments to optimise this have shown significant improvements for this algorithm, but is beyond the scope of this patchset. Due to the nature of the change, the results for other people are likely to vary - it'll usually win but occasionally lose. In real workloads the gain/loss will depend on how much the userspace portion of the benchmark benefits from having more cache available due to reduced referencing of zonelists. I expect it'll be more noticable on x86_64 with many zones than on IA64 which typically would only have one active zonelist-per-node. These are the range of performance losses/gains I found when running against 2.6.23-rc1-mm2. The set and these machines are a mix of i386, x86_64 and ppc64 both NUMA and non-NUMA. Total CPU time on Kernbench: -0.02% to 0.27% Elapsed time on Kernbench: -0.21% to 1.26% page_test from aim9: -3.41% to 3.90% brk_test from aim9: -0.20% to 40.94% fork_test from aim9: -0.42% to 4.59% exec_test from aim9: -0.78% to 1.95% Size reduction of pg_dat_t: 0 to 7808 bytes (depends on alignment) The TBench figures were too variable between runs to draw conclusions from but there didn't appear to be any regressions there. The hackbench results for both sockets and pipes was within noise. I haven't gone though lmbench. These four patches are a standalone set which address the MPOL_BIND problem with ZONE_MOVABLE as well as reducing memory usage and in many cases the cache footprint of the kernel. They should be considered as a bug fix due to the MPOL_BIND fixup. If these patches are accepted, the follow-on work would entail; o Encode zone_id in the zonelist pointers to avoid zone_idx() (Christoph's idea) o If zone_id works out, eliminate z_to_n from the zonelist cache as unnecessary o Remove bind_zonelist() (Patch in progress, very messy right now) o Eliminate policy_zone (Trickier) Comments? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/