Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751564AbZL1Es7 (ORCPT ); Sun, 27 Dec 2009 23:48:59 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751297AbZL1Es7 (ORCPT ); Sun, 27 Dec 2009 23:48:59 -0500 Received: from TYO201.gate.nec.co.jp ([202.32.8.193]:37784 "EHLO tyo201.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751113AbZL1Es6 (ORCPT ); Sun, 27 Dec 2009 23:48:58 -0500 Date: Mon, 28 Dec 2009 13:42:45 +0900 From: Daisuke Nishimura To: "Kirill A. Shutemov" Cc: containers@lists.linux-foundation.org, linux-mm@kvack.org, Paul Menage , Li Zefan , Andrew Morton , KAMEZAWA Hiroyuki , Balbir Singh , Pavel Emelyanov , Dan Malek , Vladislav Buzov , Alexander Shishkin , linux-kernel@vger.kernel.org, Daisuke Nishimura Subject: Re: [PATCH v4 4/4] memcg: implement memory thresholds Message-Id: <20091228134245.8db992d1.nishimura@mxp.nes.nec.co.jp> In-Reply-To: <7a4e1d758b98ca633a0be06e883644ad8813c077.1261858972.git.kirill@shutemov.name> References: <3f29ccc3c93e2defd70fc1c4ca8c133908b70b0b.1261858972.git.kirill@shutemov.name> <59a7f92356bf1508f06d12c501a7aa4feffb1bbc.1261858972.git.kirill@shutemov.name> <7a4e1d758b98ca633a0be06e883644ad8813c077.1261858972.git.kirill@shutemov.name> Organization: NEC Soft, Ltd. X-Mailer: Sylpheed 2.6.0 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4198 Lines: 116 On Sun, 27 Dec 2009 04:09:02 +0200, "Kirill A. Shutemov" wrote: > It allows to register multiple memory and memsw thresholds and gets > notifications when it crosses. > > To register a threshold application need: > - create an eventfd; > - open memory.usage_in_bytes or memory.memsw.usage_in_bytes; > - write string like " " to > cgroup.event_control. > > Application will be notified through eventfd when memory usage crosses > threshold in any direction. > > It's applicable for root and non-root cgroup. > > It uses stats to track memory usage, simmilar to soft limits. It checks > if we need to send event to userspace on every 100 page in/out. I guess > it's good compromise between performance and accuracy of thresholds. > > Signed-off-by: Kirill A. Shutemov > --- > Documentation/cgroups/memory.txt | 19 +++- > mm/memcontrol.c | 275 ++++++++++++++++++++++++++++++++++++++ > 2 files changed, 293 insertions(+), 1 deletions(-) > > diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt > index b871f25..195af07 100644 > --- a/Documentation/cgroups/memory.txt > +++ b/Documentation/cgroups/memory.txt > @@ -414,7 +414,24 @@ NOTE1: Soft limits take effect over a long period of time, since they involve > NOTE2: It is recommended to set the soft limit always below the hard limit, > otherwise the hard limit will take precedence. > > -8. TODO > +8. Memory thresholds > + > +Memory controler implements memory thresholds using cgroups notification > +API (see cgroups.txt). It allows to register multiple memory and memsw > +thresholds and gets notifications when it crosses. > + > +To register a threshold application need: > + - create an eventfd using eventfd(2); > + - open memory.usage_in_bytes or memory.memsw.usage_in_bytes; > + - write string like " " to > + cgroup.event_control. > + > +Application will be notified through eventfd when memory usage crosses > +threshold in any direction. > + > +It's applicable for root and non-root cgroup. > + > +9. TODO > > 1. Add support for accounting huge pages (as a separate controller) > 2. Make per-cgroup scanner reclaim not-shared pages first > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 36eb7af..3a0a6a1 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c It would be a nitpick, but my patch(http://marc.info/?l=linux-mm-commits&m=126152804420992&w=2) has already modified here. I think it might be better for you to apply my patches by hand or wait for next mmotm to be released to avoid bothering Andrew. (There is enough time left till the next merge window :)) (snip) > +static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap) > +{ > + struct mem_cgroup_threshold_ary *thresholds; > + u64 usage = mem_cgroup_usage(memcg, swap); > + int i, cur; > + I think calling mem_cgroup_usage() after checking "if(!thresholds)" decreases the overhead a little when we don't set any thresholds. I've confirmed that the change makes the assembler output different. > + rcu_read_lock(); > + if (!swap) { > + thresholds = rcu_dereference(memcg->thresholds); > + } else { > + thresholds = rcu_dereference(memcg->memsw_thresholds); > + } > + > + if (!thresholds) > + goto unlock; > + > + cur = atomic_read(&thresholds->cur); > + > + /* Check if a threshold crossed in any direction */ > + > + for(i = cur; i >= 0 && > + unlikely(thresholds->entries[i].threshold > usage); i--) { > + atomic_dec(&thresholds->cur); > + eventfd_signal(thresholds->entries[i].eventfd, 1); > + } > + > + for(i = cur + 1; i < thresholds->size && > + unlikely(thresholds->entries[i].threshold <= usage); i++) { > + atomic_inc(&thresholds->cur); > + eventfd_signal(thresholds->entries[i].eventfd, 1); > + } > +unlock: > + rcu_read_unlock(); > +} > + Thanks, Daisuke Nishimura. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/