Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755658Ab1EWAcr (ORCPT ); Sun, 22 May 2011 20:32:47 -0400 Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:49598 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755477Ab1EWAcp (ORCPT ); Sun, 22 May 2011 20:32:45 -0400 X-SecurityPolicyCheck-FJ: OK by FujitsuOutboundMailChecker v1.3.1 Date: Mon, 23 May 2011 09:25:57 +0900 From: KAMEZAWA Hiroyuki To: Andrew Morton Cc: Hiroyuki Kamezawa , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "nishimura@mxp.nes.nec.co.jp" , "balbir@linux.vnet.ibm.com" , Ying Han , hannes@cmpxchg.org, Michal Hocko Subject: Re: [PATCH 8/8] memcg asyncrhouns reclaim workqueue Message-Id: <20110523092557.30d322aa.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20110520182640.7e71af33.akpm@linux-foundation.org> References: <20110520123749.d54b32fa.kamezawa.hiroyu@jp.fujitsu.com> <20110520124837.72978344.kamezawa.hiroyu@jp.fujitsu.com> <20110520145115.d52f3693.akpm@linux-foundation.org> <20110520182640.7e71af33.akpm@linux-foundation.org> Organization: FUJITSU Co. LTD. X-Mailer: Sylpheed 3.1.0 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5101 Lines: 133 On Fri, 20 May 2011 18:26:40 -0700 Andrew Morton wrote: > On Sat, 21 May 2011 09:41:50 +0900 Hiroyuki Kamezawa wrote: > > > 2011/5/21 Andrew Morton : > > > On Fri, 20 May 2011 12:48:37 +0900 > > > KAMEZAWA Hiroyuki wrote: > > > > > >> workqueue for memory cgroup asynchronous memory shrinker. > > >> > > >> This patch implements the workqueue of async shrinker routine. each > > >> memcg has a work and only one work can be scheduled at the same time. > > >> > > >> If shrinking memory doesn't goes well, delay will be added to the work. > > >> > > > > > > When this code explodes (as it surely will), users will see large > > > amounts of CPU consumption in the work queue thread. __We want to make > > > this as easy to debug as possible, so we should try to make the > > > workqueue's names mappable back onto their memcg's. __And anything else > > > we can think of to help? > > > > > > > I had a patch for showing per-memcg reclaim latency stats. It will be help. > > I'll add it again to this set. I just dropped it because there are many patches > > onto memory.stat in flight.. > > Will that patch help us when users report the memcg equivalent of > "kswapd uses 99% of CPU"? > I think so. Each memcg shows what amount of cpu is used. But, maybe it's not an easy interface. I have several idea. An idea I have is to rename task->comm by overwrite from kworker/u:%d as to memcg/%d when the work is scheduled. I think this can be implemented in very simple interface and flags to workqueue. Then, ps -elf can show what was goin on. If necessary, I'll add a hardlimit of cpu usage for a work or I'll limit the number of thread for memcg workqueue. Considering there are user who uses 2000+ memcg on a system, a thread per a memcg was not a choice to me. Another idea was thread poll or workqueue. Because thread pool can be a poor reimplemenation of workqueue, I used workqueue. I'll implement some idea in above to the next version. > > > > > >> + __ __ limit = res_counter_read_u64(&mem->res, RES_LIMIT); > > >> + __ __ shrink_to = limit - MEMCG_ASYNC_MARGIN - PAGE_SIZE; > > >> + __ __ usage = res_counter_read_u64(&mem->res, RES_USAGE); > > >> + __ __ if (shrink_to <= usage) { > > >> + __ __ __ __ __ __ required = usage - shrink_to; > > >> + __ __ __ __ __ __ required = (required >> PAGE_SHIFT) + 1; > > >> + __ __ __ __ __ __ /* > > >> + __ __ __ __ __ __ __* This scans some number of pages and returns that memory > > >> + __ __ __ __ __ __ __* reclaim was slow or now. If slow, we add a delay as > > >> + __ __ __ __ __ __ __* congestion_wait() in vmscan.c > > >> + __ __ __ __ __ __ __*/ > > >> + __ __ __ __ __ __ congested = mem_cgroup_shrink_static_scan(mem, (long)required); > > >> + __ __ } > > >> + __ __ if (test_bit(ASYNC_NORESCHED, &mem->async_flags) > > >> + __ __ __ __ || mem_cgroup_async_should_stop(mem)) > > >> + __ __ __ __ __ __ goto finish_scan; > > >> + __ __ /* If memory reclaim couldn't go well, add delay */ > > >> + __ __ if (congested) > > >> + __ __ __ __ __ __ delay = HZ/10; > > > > > > Another magic number. > > > > > > If Moore's law holds, we need to reduce this number by 1.4 each year. > > > Is this good? > > > > > > > not good. I just used the same magic number now used with wait_iff_congested. > > Other than timer, I can use pagein/pageout event counter. If we have > > dirty_ratio, > > I may able to link this to dirty_ratio and wait until dirty_ratio is enough low. > > Or, wake up again hit limit. > > > > Do you have suggestion ? > > > > mm.. It would be pretty easy to generate an estimate of "pages scanned > per second" from the contents of (and changes in) the scan_control. Hmm. > Konwing that datum and knowing the number of pages in the memcg, we > should be able to come up with a delay period which scales > appropriately with CPU speed and with memory size? > > Such a thing could be used to rationalise magic delays in other places, > hopefully. > Ok, I'll conder that. Thank you for nice idea. > > > > >> + __ __ queue_delayed_work(memcg_async_shrinker, &mem->async_work, delay); > > >> + __ __ return; > > >> +finish_scan: > > >> + __ __ cgroup_release_and_wakeup_rmdir(&mem->css); > > >> + __ __ clear_bit(ASYNC_RUNNING, &mem->async_flags); > > >> + __ __ return; > > >> +} > > >> + > > >> +static void run_mem_cgroup_async_shrinker(struct mem_cgroup *mem) > > >> +{ > > >> + __ __ if (test_bit(ASYNC_NORESCHED, &mem->async_flags)) > > >> + __ __ __ __ __ __ return; > > > > > > I can't work out what ASYNC_NORESCHED does. __Is its name well-chosen? > > > > > how about BLOCK/STOP_ASYNC_RECLAIM ? > > I can't say - I don't know what it does! Or maybe I did, and immediately > forgot ;) > I'll find a better name ;) Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/