Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932606AbZJEKiP (ORCPT ); Mon, 5 Oct 2009 06:38:15 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932549AbZJEKiO (ORCPT ); Mon, 5 Oct 2009 06:38:14 -0400 Received: from e28smtp05.in.ibm.com ([59.145.155.5]:59560 "EHLO e28smtp05.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932536AbZJEKiN (ORCPT ); Mon, 5 Oct 2009 06:38:13 -0400 Date: Mon, 5 Oct 2009 16:07:33 +0530 From: Balbir Singh To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "akpm@linux-foundation.org" , "nishimura@mxp.nes.nec.co.jp" Subject: Re: [PATCH 0/2] memcg: improving scalability by reducing lock contention at charge/uncharge Message-ID: <20091005103733.GC3036@balbir.in.ibm.com> Reply-To: balbir@linux.vnet.ibm.com References: <20091002135531.3b5abf5c.kamezawa.hiroyu@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20091002135531.3b5abf5c.kamezawa.hiroyu@jp.fujitsu.com> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3801 Lines: 86 * KAMEZAWA Hiroyuki [2009-10-02 13:55:31]: > Hi, > > This patch is against mmotm + softlimit fix patches. > (which are now in -rc git tree.) > > In the latest -rc series, the kernel avoids accessing res_counter when > cgroup is root cgroup. This helps scalabilty when memcg is not used. > > It's necessary to improve scalabilty even when memcg is used. This patch > is for that. Previous Balbir's work shows that the biggest obstacles for > better scalabilty is memcg's res_counter. Then, there are 2 ways. > > (1) make counter scale well. > (2) avoid accessing core counter as much as possible. > > My first direction was (1). But no, there is no counter which is free > from false sharing when it needs system-wide fine grain synchronization. > And res_counter has several functionality...this makes (1) difficult. > spin_lock (in slow path) around counter means tons of invalidation will > happen even when we just access counter without modification. > > This patch series is for (2). This implements charge/uncharge in bached manner. > This coalesces access to res_counter at charge/uncharge using nature of > access locality. > > Tested for a month. And I got good reorts from Balbir and Nishimura, thanks. > One concern is that this adds some members to the bottom of task_struct. > Better idea is welcome. > > Following is test result of continuous page-fault on my 8cpu box(x86-64). > > A loop like this runs on all cpus in parallel for 60secs. > == > while (1) { > x = mmap(NULL, MEGA, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); > > for (off = 0; off < MEGA; off += PAGE_SIZE) > x[off]=0; > munmap(x, MEGA); > } > == > please see # of page faults. I think this is good improvement. > > > [Before] > Performance counter stats for './runpause.sh' (5 runs): > > 474539.756944 task-clock-msecs # 7.890 CPUs ( +- 0.015% ) > 10284 context-switches # 0.000 M/sec ( +- 0.156% ) > 12 CPU-migrations # 0.000 M/sec ( +- 0.000% ) > 18425800 page-faults # 0.039 M/sec ( +- 0.107% ) > 1486296285360 cycles # 3132.080 M/sec ( +- 0.029% ) > 380334406216 instructions # 0.256 IPC ( +- 0.058% ) > 3274206662 cache-references # 6.900 M/sec ( +- 0.453% ) > 1272947699 cache-misses # 2.682 M/sec ( +- 0.118% ) > > 60.147907341 seconds time elapsed ( +- 0.010% ) > > [After] > Performance counter stats for './runpause.sh' (5 runs): > > 474658.997489 task-clock-msecs # 7.891 CPUs ( +- 0.006% ) > 10250 context-switches # 0.000 M/sec ( +- 0.020% ) > 11 CPU-migrations # 0.000 M/sec ( +- 0.000% ) > 33177858 page-faults # 0.070 M/sec ( +- 0.152% ) > 1485264748476 cycles # 3129.120 M/sec ( +- 0.021% ) > 409847004519 instructions # 0.276 IPC ( +- 0.123% ) > 3237478723 cache-references # 6.821 M/sec ( +- 0.574% ) > 1182572827 cache-misses # 2.491 M/sec ( +- 0.179% ) > > 60.151786309 seconds time elapsed ( +- 0.014% ) > I agree, I liked the previous patchset, let me re-review this one! Definitely a good candidate to -mm. -- Balbir -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/