Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753554AbYBTI4S (ORCPT ); Wed, 20 Feb 2008 03:56:18 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753546AbYBTI4G (ORCPT ); Wed, 20 Feb 2008 03:56:06 -0500 Received: from wx-out-0506.google.com ([66.249.82.238]:64183 "EHLO wx-out-0506.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752830AbYBTI4E (ORCPT ); Wed, 20 Feb 2008 03:56:04 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=io+24s1bbgdHoJYOnJsWvYbXvtD4ghx9Bi85jPknN/p91b2S5m5vtZDh9i5ooaorGQsoLlot/TgslEdLLRgkR7e+3wT5F2lrHcslr3BQx8U7KNGiaO7jeG/W2lsLy7E4fZTydvsuOzKI+eog8UHLHgayJ8SWRUahZJVmcbHKmro= Message-ID: <44c63dc40802200056va847417v1cfc847341bb8cc0@mail.gmail.com> Date: Wed, 20 Feb 2008 17:56:02 +0900 From: "minchan Kim" To: "KOSAKI Motohiro" Subject: Re: [RFC][PATCH] the proposal of improve page reclaim by throttle Cc: "KAMEZAWA Hiroyuki" , "Balbir Singh" , "Rik van Riel" , "Lee Schermerhorn" , linux-mm@kvack.org, linux-kernel@vger.kernel.org In-Reply-To: <20080219134715.7E90.KOSAKI.MOTOHIRO@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20080219134715.7E90.KOSAKI.MOTOHIRO@jp.fujitsu.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7994 Lines: 230 Hi, KOSAKI. I am a many interested in your patch. so I want to test it with exact same method as you did. I will test it in embedded environment(ARM 920T, 32M ram) and my desktop machine.(Core2Duo 2.2G, 2G ram) I guess this patch won't be efficient in embedded environment. Since many embedded board just have one processor and don't have any swap device. What I want to know is that this patch have a regression in UP and NO swap device like embedded. I think I can't show some field only top or freemem. Becuase top or freemem won't be able to work well if system have a great overhead with page reclaiming and swapping. So, How do I evaluate following field as you did ? * elapse (what do you mean it ??) * major fault * max parallel reclaim tasks: * max consumption time of try_to_free_pages(): If you have a patch for testing, Let me receive it. On Feb 19, 2008 2:44 PM, KOSAKI Motohiro wrote: > background > ======================================== > current VM implementation doesn't has limit of # of parallel reclaim. > when heavy workload, it bring to 2 bad things > - heavy lock contention > - unnecessary swap out > > abount 2 month ago, KAMEZA Hiroyuki proposed the patch of page > reclaim throttle and explain it improve reclaim time. > http://marc.info/?l=linux-mm&m=119667465917215&w=2 > > but unfortunately it works only memcgroup reclaim. > Today, I implement it again for support global reclaim and mesure it. > > > test machine, method and result > ================================================== > > CPU: IA64 x8 > MEM: 8GB > SWAP: 2GB > > > got hackbench from > http://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c > > $ /usr/bin/time hackbench 120 process 1000 > > this parameter mean consume all physical memory and > 1GB swap space on my test environment. > > > > before: > hackbench result: 282.30 > /usr/bin/time result > user: 14.16 > sys: 1248.47 > elapse: 432.93 > major fault: 29026 > max parallel reclaim tasks: 1298 > max consumption time of > try_to_free_pages(): 70394 > > after: > hackbench result: 30.36 > /usr/bin/time result > user: 14.26 > sys: 294.44 > elapse: 118.01 > major fault: 3064 > max parallel reclaim tasks: 4 > max consumption time of > try_to_free_pages(): 12234 > > > conclusion > ========================================= > this patch improve 3 things. > 1. reduce unnecessary swap > (see above major fault. about 90% reduced) > 2. improve throughput performance > (see above hackbench result. about 90% reduced) > 3. improve interactive performance. > (see above max consumption of try_to_free_pages. > about 80% reduced) > 4. reduce lock contention. > (see above sys time. about 80% reduced) > > > Now, we got about 1000% performance improvement of hackbench :) > > > > foture works > ========================================================== > - more discussion with memory controller guys. > > > > Signed-off-by: KOSAKI Motohiro > CC: KAMEZAWA Hiroyuki > CC: Balbir Singh > CC: Rik van Riel > CC: Lee Schermerhorn > > --- > include/linux/nodemask.h | 1 > mm/vmscan.c | 49 +++++++++++++++++++++++++++++++++++++++++++++-- > 2 files changed, 48 insertions(+), 2 deletions(-) > > Index: b/include/linux/nodemask.h > =================================================================== > --- a/include/linux/nodemask.h 2008-02-19 13:58:05.000000000 +0900 > +++ b/include/linux/nodemask.h 2008-02-19 13:58:23.000000000 +0900 > @@ -431,6 +431,7 @@ static inline int num_node_state(enum no > > #define num_online_nodes() num_node_state(N_ONLINE) > #define num_possible_nodes() num_node_state(N_POSSIBLE) > +#define num_highmem_nodes() num_node_state(N_HIGH_MEMORY) > #define node_online(node) node_state((node), N_ONLINE) > #define node_possible(node) node_state((node), N_POSSIBLE) > > Index: b/mm/vmscan.c > =================================================================== > --- a/mm/vmscan.c 2008-02-19 13:58:05.000000000 +0900 > +++ b/mm/vmscan.c 2008-02-19 14:04:06.000000000 +0900 > @@ -127,6 +127,11 @@ long vm_total_pages; /* The total number > static LIST_HEAD(shrinker_list); > static DECLARE_RWSEM(shrinker_rwsem); > > +static atomic_t nr_reclaimers = ATOMIC_INIT(0); > +static DECLARE_WAIT_QUEUE_HEAD(reclaim_throttle_waitq); > +#define RECLAIM_LIMIT (2 * num_highmem_nodes()) > + > + > #ifdef CONFIG_CGROUP_MEM_CONT > #define scan_global_lru(sc) (!(sc)->mem_cgroup) > #else > @@ -1421,6 +1426,46 @@ out: > return ret; > } > > +static unsigned long try_to_free_pages_throttled(struct zone **zones, > + int order, > + gfp_t gfp_mask, > + struct scan_control *sc) > +{ > + unsigned long nr_reclaimed = 0; > + unsigned long start_time; > + int i; > + > + start_time = jiffies; > + > + wait_event(reclaim_throttle_waitq, > + atomic_add_unless(&nr_reclaimers, 1, RECLAIM_LIMIT)); > + > + /* more reclaim until needed? */ > + if (unlikely(time_after(jiffies, start_time + HZ))) { > + for (i = 0; zones[i] != NULL; i++) { > + struct zone *zone = zones[i]; > + int classzone_idx = zone_idx(zones[0]); > + > + if (!populated_zone(zone)) > + continue; > + > + if (zone_watermark_ok(zone, order, 4*zone->pages_high, > + classzone_idx, 0)) { > + nr_reclaimed = 1; > + goto out; > + } > + } > + } > + > + nr_reclaimed = do_try_to_free_pages(zones, gfp_mask, sc); > + > +out: > + atomic_dec(&nr_reclaimers); > + wake_up_all(&reclaim_throttle_waitq); > + > + return nr_reclaimed; > +} > + > unsigned long try_to_free_pages(struct zone **zones, int order, gfp_t gfp_mask) > { > struct scan_control sc = { > @@ -1434,7 +1479,7 @@ unsigned long try_to_free_pages(struct z > .isolate_pages = isolate_pages_global, > }; > > - return do_try_to_free_pages(zones, gfp_mask, &sc); > + return try_to_free_pages_throttled(zones, order, gfp_mask, &sc); > } > > #ifdef CONFIG_CGROUP_MEM_CONT > @@ -1456,7 +1501,7 @@ unsigned long try_to_free_mem_cgroup_pag > int target_zone = gfp_zone(GFP_HIGHUSER_MOVABLE); > > zones = NODE_DATA(numa_node_id())->node_zonelists[target_zone].zones; > - if (do_try_to_free_pages(zones, sc.gfp_mask, &sc)) > + if (try_to_free_pages_throttled(zones, 0, sc.gfp_mask, &sc)) > return 1; > return 0; > } > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- Thanks, barrios -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/