Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752126AbZGNAt1 (ORCPT ); Mon, 13 Jul 2009 20:49:27 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751982AbZGNAt0 (ORCPT ); Mon, 13 Jul 2009 20:49:26 -0400 Received: from fgwmail7.fujitsu.co.jp ([192.51.44.37]:40971 "EHLO fgwmail7.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751896AbZGNAtZ (ORCPT ); Mon, 13 Jul 2009 20:49:25 -0400 X-SecurityPolicyCheck-FJ: OK by FujitsuOutboundMailChecker v1.3.1 Date: Tue, 14 Jul 2009 09:47:29 +0900 From: KAMEZAWA Hiroyuki To: Paul Menage Cc: Vladislav Buzov , Linux Kernel Mailing List , Linux Containers Mailing List , Linux memory management list , Dan Malek , Andrew Morton , Balbir Singh Subject: Re: [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) Message-Id: <20090714094729.45d4dff4.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <6599ad830907131736w4397d336xad733f274c812690@mail.gmail.com> References: <1246998310-16764-1-git-send-email-vbuzov@embeddedalley.com> <1247530581-31416-1-git-send-email-vbuzov@embeddedalley.com> <1247530581-31416-2-git-send-email-vbuzov@embeddedalley.com> <6599ad830907131736w4397d336xad733f274c812690@mail.gmail.com> Organization: FUJITSU Co. LTD. X-Mailer: Sylpheed 2.5.0 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9672 Lines: 250 On Mon, 13 Jul 2009 17:36:40 -0700 Paul Menage wrote: > As I mentioned in another thread, I think that associating the > threshold with the res_counter rather than with each individual waiter > is a mistake, since it creates global state and makes it hard to have > multiple waiters on the same cgroup. > Ah, Hmm...maybe yes. But the problem is "hierarchy". (even if this usage notifier don't handle it.) While we charge as following res_coutner+hierarchy res_counter_A + PAGE_SIZE res_counter_B + PAGE_SIZE res_counter_C + PAGE_SIZE Checking "where we exceeds" in smart way is not very easy. Balbir's soft limit does similar check but it's not very smart, either I think. If there are prural thesholds (notifer, softlimit, etc...), this is worth to be tried. Hmm...if not, size of res_coutner excees 128bytes and we'll see terrible counter. Any idea ? Thanks, -Kame > Paul > > On Mon, Jul 13, 2009 at 5:16 PM, Vladislav > Buzov wrote: > > This patch updates the Resource Counter to add a configurable resource usage > > threshold notification mechanism. > > > > Signed-off-by: Vladislav Buzov > > Signed-off-by: Dan Malek > > --- > >  Documentation/cgroups/resource_counter.txt |   21 ++++++++- > >  include/linux/res_counter.h                |   69 ++++++++++++++++++++++++++++ > >  kernel/res_counter.c                       |    7 +++ > >  3 files changed, 95 insertions(+), 2 deletions(-) > > > > diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt > > index 95b24d7..1369dff 100644 > > --- a/Documentation/cgroups/resource_counter.txt > > +++ b/Documentation/cgroups/resource_counter.txt > > @@ -39,7 +39,20 @@ to work with it. > >        The failcnt stands for "failures counter". This is the number of > >        resource allocation attempts that failed. > > > > - c. spinlock_t lock > > + e. unsigned long long threshold > > + > > +       The resource usage threshold to notify the resouce controller. This is > > +       the minimal difference between the resource limit and current usage > > +       to fire a notification. > > + > > + f. void (*threshold_notifier)(struct res_counter *counter) > > + > > +       The threshold notification callback installed by the resource > > +       controller. Called when the usage reaches or exceeds the threshold. > > +       Should be fast and not sleep because called when interrupts are > > +       disabled. > > + > > + g. spinlock_t lock > > > >        Protects changes of the above values. > > > > @@ -140,6 +153,7 @@ counter fields. They are recommended to adhere to the following rules: > >        usage           usage_in_ > >        max_usage       max_usage_in_ > >        limit           limit_in_ > > +       threshold       notify_threshold_in_ > >        failcnt         failcnt > >        lock            no file :) > > > > @@ -153,9 +167,12 @@ counter fields. They are recommended to adhere to the following rules: > >        usage           prohibited > >        max_usage       reset to usage > >        limit           set the limit > > +       threshold       set the threshold > >        failcnt         reset to zero > > > > - > > + d. Notification is enabled by installing the threshold notifier callback. It > > +    is up to the resouce controller to communicate the notification to user > > +    space tasks. > > > >  5. Usage example > > > > diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h > > index 511f42f..5ec98d7 100644 > > --- a/include/linux/res_counter.h > > +++ b/include/linux/res_counter.h > > @@ -9,6 +9,11 @@ > >  * > >  * Author: Pavel Emelianov > >  * > > + * Resouce usage threshold notification update > > + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc. > > + * Author: Dan Malek > > + * Author: Vladislav Buzov > > + * > >  * See Documentation/cgroups/resource_counter.txt for more > >  * info about what this counter is. > >  */ > > @@ -35,6 +40,19 @@ struct res_counter { > >         */ > >        unsigned long long limit; > >        /* > > +        * the resource usage threshold to notify the resouce controller. This > > +        * is the minimal difference between the resource limit and current > > +        * usage to fire a notification. > > +        */ > > +       unsigned long long threshold; > > +       /* > > +        * the threshold notification callback installed by the resource > > +        * controller. Called when the usage reaches or exceeds the threshold. > > +        * Should be fast and not sleep because called when interrupts are > > +        * disabled. > > +        */ > > +       void (*threshold_notifier)(struct res_counter *counter); > > +       /* > >         * the number of unsuccessful attempts to consume the resource > >         */ > >        unsigned long long failcnt; > > @@ -87,6 +105,7 @@ enum { > >        RES_MAX_USAGE, > >        RES_LIMIT, > >        RES_FAILCNT, > > +       RES_THRESHOLD, > >  }; > > > >  /* > > @@ -132,6 +151,21 @@ static inline bool res_counter_limit_check_locked(struct res_counter *cnt) > >        return false; > >  } > > > > +static inline bool res_counter_threshold_check_locked(struct res_counter *cnt) > > +{ > > +       if (cnt->usage + cnt->threshold < cnt->limit) > > +               return true; > > + > > +       return false; > > +} > > + > > +static inline void res_counter_threshold_notify_locked(struct res_counter *cnt) > > +{ > > +       if (!res_counter_threshold_check_locked(cnt) && > > +           cnt->threshold_notifier) > > +               cnt->threshold_notifier(cnt); > > +} > > + > >  /* > >  * Helper function to detect if the cgroup is within it's limit or > >  * not. It's currently called from cgroup_rss_prepare() > > @@ -147,6 +181,21 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt) > >        return ret; > >  } > > > > +/* > > + * Helper function to detect if the cgroup usage is under it's threshold or > > + * not. > > + */ > > +static inline bool res_counter_check_under_threshold(struct res_counter *cnt) > > +{ > > +       bool ret; > > +       unsigned long flags; > > + > > +       spin_lock_irqsave(&cnt->lock, flags); > > +       ret = res_counter_threshold_check_locked(cnt); > > +       spin_unlock_irqrestore(&cnt->lock, flags); > > +       return ret; > > +} > > + > >  static inline void res_counter_reset_max(struct res_counter *cnt) > >  { > >        unsigned long flags; > > @@ -174,6 +223,26 @@ static inline int res_counter_set_limit(struct res_counter *cnt, > >        spin_lock_irqsave(&cnt->lock, flags); > >        if (cnt->usage <= limit) { > >                cnt->limit = limit; > > +               if (limit <= cnt->threshold) > > +                       cnt->threshold = 0; > > +               else > > +                       res_counter_threshold_notify_locked(cnt); > > +               ret = 0; > > +       } > > +       spin_unlock_irqrestore(&cnt->lock, flags); > > +       return ret; > > +} > > + > > +static inline int res_counter_set_threshold(struct res_counter *cnt, > > +               unsigned long long threshold) > > +{ > > +       unsigned long flags; > > +       int ret = -EINVAL; > > + > > +       spin_lock_irqsave(&cnt->lock, flags); > > +       if (cnt->limit > threshold) { > > +               cnt->threshold = threshold; > > +               res_counter_threshold_notify_locked(cnt); > >                ret = 0; > >        } > >        spin_unlock_irqrestore(&cnt->lock, flags); > > diff --git a/kernel/res_counter.c b/kernel/res_counter.c > > index e1338f0..9b36748 100644 > > --- a/kernel/res_counter.c > > +++ b/kernel/res_counter.c > > @@ -5,6 +5,10 @@ > >  * > >  * Author: Pavel Emelianov > >  * > > + * Resouce usage threshold notification update > > + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc. > > + * Author: Dan Malek > > + * Author: Vladislav Buzov > >  */ > > > >  #include > > @@ -32,6 +36,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val) > >        counter->usage += val; > >        if (counter->usage > counter->max_usage) > >                counter->max_usage = counter->usage; > > +       res_counter_threshold_notify_locked(counter); > >        return 0; > >  } > > > > @@ -101,6 +106,8 @@ res_counter_member(struct res_counter *counter, int member) > >                return &counter->limit; > >        case RES_FAILCNT: > >                return &counter->failcnt; > > +       case RES_THRESHOLD: > > +               return &counter->threshold; > >        }; > > > >        BUG(); > > -- > > 1.5.6.3 > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/