2011-02-15 22:11:21

by Jacob Pan

[permalink] [raw]
Subject: [PATCH 0/1, v9] cgroup/freezer: add per freezer duty ratio control

From: Jacob Pan <[email protected]>

Fixing subject in cover letter, should be v4 instead of v3

Changelog:
v9: - merged duty ratio related data structs
- removed toggling bit, replaced with freezer->state
- consolidated toggling mode cancellation function
- use spin lock to protect duty ratio parameter writes
- use second as period
- allow 0% and 100% frozen time, aligned behavior with
freezer_write()
v8: - minor clean up
v7: - use delayed workqueue instead of per cgroup kernel thread
to manage toggling
- make freezer.state higher priority than duty ratio setting
- small correction in documentation

v6: - remove check for duty ratio == 100 case
v5: - improve error handling in duty ratio/ period writes
- fix the logic to avoid kthread creation when ratio and
period are invalid.
- 100% frozen is not permitted in duty ratio set up

v4: - limit kthread creation only when valid duty ratio
parameters are given.
- changed period back to milisecond
- consolidated paramter write functions

v3: - added documentation

v2:
- use read_u64/write_u64 instead of string read/write
- use period in second instead of ms
- renamed duty_ratio to duty_ratio_pct, be more specific
- removed locking around period and duty_ratio reads


Jacob Pan (1):
cgroup/freezer: add per freezer duty ratio control

Documentation/cgroups/freezer-subsystem.txt | 23 ++++++
kernel/cgroup_freezer.c | 109 ++++++++++++++++++++++++++-
2 files changed, 130 insertions(+), 2 deletions(-)


2011-02-15 22:11:27

by Jacob Pan

[permalink] [raw]
Subject: [PATCH 1/1, v9] cgroup/freezer: add per freezer duty ratio control

From: Jacob Pan <[email protected]>

Freezer subsystem is used to manage batch jobs which can start
stop at the same time. However, sometime it is desirable to let
the kernel manage the freezer state automatically with a given
duty ratio.
For example, if we want to reduce the time that backgroup apps
are allowed to run we can put them into a freezer subsystem and
set the kernel to turn them THAWED/FROZEN at given period and
percentage of frozen time.

This patch introduces two file nodes under cgroup
freezer.period_second
freezer.frozen_time_percentage

Usage example: set period to be 5 seconds and frozen duty ratio 90%
[root@localhost aoa]# echo 90 > freezer.frozen_time_percentage
[root@localhost aoa]# echo 5 > freezer.period_second

Signed-off-by: Jacob Pan <[email protected]>
---
Documentation/cgroups/freezer-subsystem.txt | 33 ++++++
kernel/cgroup_freezer.c | 155 ++++++++++++++++++++++++++-
2 files changed, 185 insertions(+), 3 deletions(-)

diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt
index 41f37fe..df0f998 100644
--- a/Documentation/cgroups/freezer-subsystem.txt
+++ b/Documentation/cgroups/freezer-subsystem.txt
@@ -100,3 +100,36 @@ things happens:
and returns EINVAL)
3) The tasks that blocked the cgroup from entering the "FROZEN"
state disappear from the cgroup's set of tasks.
+
+In embedded systems, it is desirable to manage group of applications
+for power saving. E.g. tasks that are not in the foreground may be
+frozen and unfrozen periodically to save power without affecting user
+experience. In this case, user/management software can attach tasks
+into freezer cgroup then specify a period and percentage of time the
+tasks within the cgroup shall be frozen.
+
+100% frozen will result in the same behavior as setting freezer.state
+to FROZEN. Similarly, 0% is equivalent to setting freezer.state to
+THAWED.
+The granularity of the period is second, due to the fact that small
+period may result in greater overhead and causes stability issue since
+common workqueue is used here to perform the management.
+
+Zero value of period_second is not allowed, the default period is 1
+second.
+
+Usage example:
+Assuming freezer cgroup is already mounted, application being managed
+are included the "tasks" file node of the given freezer cgroup.
+To make the tasks frozen at 90% of the time every 5 seconds, do:
+
+[root@localhost ]# echo 90 > freezer.frozen_time_percentage
+[root@localhost ]# echo 5 > freezer.period_second
+
+After that, the application in this freezer cgroup will only be
+allowed to run at the following pattern.
+ __ __ __
+ | |<-- 90% frozen -->| | | |
+____| |__________________| |__________________| |_____
+
+ |<---- 5 seconds ---->|
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index e7bebb7..32beacb 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -28,12 +28,31 @@ enum freezer_state {
CGROUP_FROZEN,
};

+enum duty_ratio_params {
+ FREEZER_DUTY_RATIO = 0,
+ FREEZER_PERIOD,
+};
+
+struct freezer_duty {
+ u32 ratio; /* percentage of time frozen */
+ u32 period_pct_ms; /* one percent of the period in miliseconds */
+ u32 enabled;
+};
+
struct freezer {
struct cgroup_subsys_state css;
enum freezer_state state;
+ struct freezer_duty duty;
+ struct delayed_work freezer_work; /* work to duty-cycle a cgroup */
spinlock_t lock; /* protects _writes_ to state */
};

+static int freezer_change_state(struct cgroup *cgroup,
+ enum freezer_state goal_state);
+static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer);
+static int try_to_freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer);
+static void update_if_frozen(struct cgroup *cgroup, struct freezer *freezer);
+
static inline struct freezer *cgroup_freezer(
struct cgroup *cgroup)
{
@@ -63,6 +82,35 @@ int cgroup_freezing_or_frozen(struct task_struct *task)
return result;
}

+static void freezer_work_fn(struct work_struct *work)
+{
+ struct freezer *freezer;
+ unsigned long delay_jiffies = 0;
+
+ freezer = container_of(work, struct freezer, freezer_work.work);
+ spin_lock_irq(&freezer->lock);
+ /* toggle between THAWED and FROZEN state. */
+ update_if_frozen(freezer->css.cgroup, freezer);
+ switch (freezer->state) {
+ case CGROUP_FROZEN:
+ case CGROUP_FREEZING:
+ delay_jiffies = msecs_to_jiffies((100 - freezer->duty.ratio) *
+ freezer->duty.period_pct_ms);
+ unfreeze_cgroup(freezer->css.cgroup, freezer);
+ break;
+ case CGROUP_THAWED:
+ delay_jiffies = msecs_to_jiffies(freezer->duty.ratio *
+ freezer->duty.period_pct_ms);
+ try_to_freeze_cgroup(freezer->css.cgroup, freezer);
+ break;
+ default:
+ BUG();
+ }
+
+ schedule_delayed_work(&freezer->freezer_work, delay_jiffies);
+ spin_unlock_irq(&freezer->lock);
+}
+
/*
* cgroups_write_string() limits the size of freezer state strings to
* CGROUP_LOCAL_BUFFER_SIZE
@@ -144,13 +192,19 @@ static struct cgroup_subsys_state *freezer_create(struct cgroup_subsys *ss,

spin_lock_init(&freezer->lock);
freezer->state = CGROUP_THAWED;
+ freezer->duty.period_pct_ms = 10; /* default to 1 second period */
return &freezer->css;
}

static void freezer_destroy(struct cgroup_subsys *ss,
struct cgroup *cgroup)
{
- kfree(cgroup_freezer(cgroup));
+ struct freezer *freezer;
+
+ freezer = cgroup_freezer(cgroup);
+ if (freezer->duty.enabled)
+ cancel_delayed_work_sync(&freezer->freezer_work);
+ kfree(freezer);
}

/*
@@ -282,6 +336,16 @@ static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
return 0;
}

+static u64 freezer_read_duty_ratio(struct cgroup *cgroup, struct cftype *cft)
+{
+ return cgroup_freezer(cgroup)->duty.ratio;
+}
+
+static u64 freezer_read_period(struct cgroup *cgroup, struct cftype *cft)
+{
+ return cgroup_freezer(cgroup)->duty.period_pct_ms / 10;
+}
+
static int try_to_freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
{
struct cgroup_iter it;
@@ -347,20 +411,56 @@ out:
return retval;
}

+static void freezer_check_duty_cycling(struct freezer *freezer, int set_state)
+{
+ if (freezer->duty.enabled) {
+ if (freezer->duty.ratio && freezer->duty.ratio < 100)
+ return;
+ pr_debug("stopping duty ratio mode\n");
+ cancel_delayed_work_sync(&freezer->freezer_work);
+ freezer->duty.enabled = 0;
+ if (set_state) {
+ update_if_frozen(freezer->css.cgroup, freezer);
+ if (freezer->duty.ratio == 100)
+ try_to_freeze_cgroup(freezer->css.cgroup,
+ freezer);
+ else if (freezer->duty.ratio == 0)
+ unfreeze_cgroup(freezer->css.cgroup, freezer);
+ else
+ BUG();
+ }
+ } else if (freezer->duty.ratio < 100 && freezer->duty.ratio) {
+ pr_debug("starting duty ratio mode\n");
+ INIT_DELAYED_WORK(&freezer->freezer_work, freezer_work_fn);
+ freezer->duty.enabled = 1;
+ schedule_delayed_work(&freezer->freezer_work, 0);
+ }
+}
+
static int freezer_write(struct cgroup *cgroup,
struct cftype *cft,
const char *buffer)
{
int retval;
enum freezer_state goal_state;
+ struct freezer *freezer;

- if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) == 0)
+ freezer = cgroup_freezer(cgroup);
+ if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) == 0) {
goal_state = CGROUP_THAWED;
- else if (strcmp(buffer, freezer_state_strs[CGROUP_FROZEN]) == 0)
+ freezer->duty.ratio = 0;
+ } else if (strcmp(buffer, freezer_state_strs[CGROUP_FROZEN]) == 0) {
+ freezer->duty.ratio = 100;
goal_state = CGROUP_FROZEN;
+ }
else
return -EINVAL;

+ /* we should stop duty ratio toggling if user wants to
+ * force change to a valid state.
+ */
+ freezer_check_duty_cycling(freezer, 0);
+
if (!cgroup_lock_live_group(cgroup))
return -ENODEV;
retval = freezer_change_state(cgroup, goal_state);
@@ -368,12 +468,61 @@ static int freezer_write(struct cgroup *cgroup,
return retval;
}

+static int freezer_write_param(struct cgroup *cgroup, struct cftype *cft,
+ u64 val)
+{
+ struct freezer *freezer;
+ int ret = 0;
+
+ freezer = cgroup_freezer(cgroup);
+ if (!cgroup_lock_live_group(cgroup))
+ return -ENODEV;
+
+ spin_lock_irq(&freezer->lock);
+ switch (cft->private) {
+ case FREEZER_DUTY_RATIO:
+ if (val > 100) {
+ ret = -EINVAL;
+ goto exit;
+ }
+ freezer->duty.ratio = val;
+ break;
+ case FREEZER_PERIOD:
+ if (!val) {
+ ret = -EINVAL;
+ goto exit;
+ }
+ freezer->duty.period_pct_ms = val * 10;
+ break;
+ default:
+ BUG();
+ }
+ spin_unlock_irq(&freezer->lock);
+ freezer_check_duty_cycling(freezer, 1);
+exit:
+ cgroup_unlock();
+ return ret;
+}
+
static struct cftype files[] = {
{
.name = "state",
.read_seq_string = freezer_read,
.write_string = freezer_write,
},
+ {
+ .name = "frozen_time_percentage",
+ .private = FREEZER_DUTY_RATIO,
+ .read_u64 = freezer_read_duty_ratio,
+ .write_u64 = freezer_write_param,
+ },
+ {
+ .name = "period_second",
+ .private = FREEZER_PERIOD,
+ .read_u64 = freezer_read_period,
+ .write_u64 = freezer_write_param,
+ },
+
};

static int freezer_populate(struct cgroup_subsys *ss, struct cgroup *cgroup)
--
1.7.0.4

2011-02-16 00:00:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/1, v9] cgroup/freezer: add per freezer duty ratio control

On Tuesday, February 15, 2011, [email protected] wrote:
> From: Jacob Pan <[email protected]>
>
> Freezer subsystem is used to manage batch jobs which can start
> stop at the same time. However, sometime it is desirable to let
> the kernel manage the freezer state automatically with a given
> duty ratio.
> For example, if we want to reduce the time that backgroup apps
> are allowed to run we can put them into a freezer subsystem and
> set the kernel to turn them THAWED/FROZEN at given period and
> percentage of frozen time.
>
> This patch introduces two file nodes under cgroup
> freezer.period_second
> freezer.frozen_time_percentage
>
> Usage example: set period to be 5 seconds and frozen duty ratio 90%
> [root@localhost aoa]# echo 90 > freezer.frozen_time_percentage
> [root@localhost aoa]# echo 5 > freezer.period_second
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> Documentation/cgroups/freezer-subsystem.txt | 33 ++++++
> kernel/cgroup_freezer.c | 155 ++++++++++++++++++++++++++-
> 2 files changed, 185 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt
> index 41f37fe..df0f998 100644
> --- a/Documentation/cgroups/freezer-subsystem.txt
> +++ b/Documentation/cgroups/freezer-subsystem.txt
> @@ -100,3 +100,36 @@ things happens:
> and returns EINVAL)
> 3) The tasks that blocked the cgroup from entering the "FROZEN"
> state disappear from the cgroup's set of tasks.
> +
> +In embedded systems, it is desirable to manage group of applications
> +for power saving. E.g. tasks that are not in the foreground may be
> +frozen and unfrozen periodically to save power without affecting user
> +experience. In this case, user/management software can attach tasks
> +into freezer cgroup then specify a period and percentage of time the
> +tasks within the cgroup shall be frozen.
> +
> +100% frozen will result in the same behavior as setting freezer.state
> +to FROZEN. Similarly, 0% is equivalent to setting freezer.state to
> +THAWED.
> +The granularity of the period is second, due to the fact that small
> +period may result in greater overhead and causes stability issue since
> +common workqueue is used here to perform the management.
> +
> +Zero value of period_second is not allowed, the default period is 1
> +second.
> +
> +Usage example:
> +Assuming freezer cgroup is already mounted, application being managed
> +are included the "tasks" file node of the given freezer cgroup.
> +To make the tasks frozen at 90% of the time every 5 seconds, do:
> +
> +[root@localhost ]# echo 90 > freezer.frozen_time_percentage
> +[root@localhost ]# echo 5 > freezer.period_second

Well, are you trying to work around a problem with some applications that
cause too many wakeups to occur?

Why do you need the kernel to do that for you on a per-cgroup basis?
What about simply sending SIGSTOP/SIGCONT periodically from user space to the
applications in question?

Moreover, what about having a user space process that will freeze/unfreeze the
entire cgroup as needed?

> +After that, the application in this freezer cgroup will only be
> +allowed to run at the following pattern.
> + __ __ __
> + | |<-- 90% frozen -->| | | |
> +____| |__________________| |__________________| |_____
> +
> + |<---- 5 seconds ---->|

How are you going to handle dependencies between the "periodically frozen"
applications and other applications that may need them to do some work?

> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index e7bebb7..32beacb 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -28,12 +28,31 @@ enum freezer_state {
> CGROUP_FROZEN,
> };
>
> +enum duty_ratio_params {
> + FREEZER_DUTY_RATIO = 0,
> + FREEZER_PERIOD,
> +};
> +
> +struct freezer_duty {
> + u32 ratio; /* percentage of time frozen */
> + u32 period_pct_ms; /* one percent of the period in miliseconds */

What period?

> + u32 enabled;
> +};
> +
> struct freezer {
> struct cgroup_subsys_state css;
> enum freezer_state state;
> + struct freezer_duty duty;
> + struct delayed_work freezer_work; /* work to duty-cycle a cgroup */
> spinlock_t lock; /* protects _writes_ to state */
> };
>
> +static int freezer_change_state(struct cgroup *cgroup,
> + enum freezer_state goal_state);
> +static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer);
> +static int try_to_freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer);
> +static void update_if_frozen(struct cgroup *cgroup, struct freezer *freezer);
> +
> static inline struct freezer *cgroup_freezer(
> struct cgroup *cgroup)
> {
> @@ -63,6 +82,35 @@ int cgroup_freezing_or_frozen(struct task_struct *task)
> return result;
> }
>
> +static void freezer_work_fn(struct work_struct *work)
> +{
> + struct freezer *freezer;
> + unsigned long delay_jiffies = 0;
> +
> + freezer = container_of(work, struct freezer, freezer_work.work);
> + spin_lock_irq(&freezer->lock);

So you switch interrupts off every once a while simply to freeze/unfreeze
your cgroup. That's going to affect the scheduler quite a bit, isn't it?

> + /* toggle between THAWED and FROZEN state. */
> + update_if_frozen(freezer->css.cgroup, freezer);

What about using a local variable cgroup = freezer->css.cgroup ?

> + switch (freezer->state) {
> + case CGROUP_FROZEN:
> + case CGROUP_FREEZING:
> + delay_jiffies = msecs_to_jiffies((100 - freezer->duty.ratio) *
> + freezer->duty.period_pct_ms);
> + unfreeze_cgroup(freezer->css.cgroup, freezer);
> + break;
> + case CGROUP_THAWED:
> + delay_jiffies = msecs_to_jiffies(freezer->duty.ratio *
> + freezer->duty.period_pct_ms);
> + try_to_freeze_cgroup(freezer->css.cgroup, freezer);
> + break;
> + default:
> + BUG();

Don't crash the kernel if you don't have to.

> + }
> +
> + schedule_delayed_work(&freezer->freezer_work, delay_jiffies);

What if this work happens to run during system suspend/resume?

> + spin_unlock_irq(&freezer->lock);
> +}
> +
> /*
> * cgroups_write_string() limits the size of freezer state strings to
> * CGROUP_LOCAL_BUFFER_SIZE
> @@ -144,13 +192,19 @@ static struct cgroup_subsys_state *freezer_create(struct cgroup_subsys *ss,
>
> spin_lock_init(&freezer->lock);
> freezer->state = CGROUP_THAWED;
> + freezer->duty.period_pct_ms = 10; /* default to 1 second period */

Why this number?

> return &freezer->css;
> }
>
> static void freezer_destroy(struct cgroup_subsys *ss,
> struct cgroup *cgroup)
> {
> - kfree(cgroup_freezer(cgroup));
> + struct freezer *freezer;
> +
> + freezer = cgroup_freezer(cgroup);
> + if (freezer->duty.enabled)
> + cancel_delayed_work_sync(&freezer->freezer_work);
> + kfree(freezer);
> }
>
> /*
> @@ -282,6 +336,16 @@ static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
> return 0;
> }
>
> +static u64 freezer_read_duty_ratio(struct cgroup *cgroup, struct cftype *cft)
> +{
> + return cgroup_freezer(cgroup)->duty.ratio;
> +}
> +
> +static u64 freezer_read_period(struct cgroup *cgroup, struct cftype *cft)
> +{
> + return cgroup_freezer(cgroup)->duty.period_pct_ms / 10;
> +}
> +
> static int try_to_freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
> {
> struct cgroup_iter it;
> @@ -347,20 +411,56 @@ out:
> return retval;
> }
>

It would be _nice_ to add a kerneldoc comment explaining the purpose of the
new function, right?

> +static void freezer_check_duty_cycling(struct freezer *freezer, int set_state)
> +{
> + if (freezer->duty.enabled) {
> + if (freezer->duty.ratio && freezer->duty.ratio < 100)
> + return;
> + pr_debug("stopping duty ratio mode\n");
> + cancel_delayed_work_sync(&freezer->freezer_work);
> + freezer->duty.enabled = 0;
> + if (set_state) {
> + update_if_frozen(freezer->css.cgroup, freezer);
> + if (freezer->duty.ratio == 100)
> + try_to_freeze_cgroup(freezer->css.cgroup,
> + freezer);
> + else if (freezer->duty.ratio == 0)
> + unfreeze_cgroup(freezer->css.cgroup, freezer);
> + else
> + BUG();

Again, don't crash the kernel if you don't have to.

> + }
> + } else if (freezer->duty.ratio < 100 && freezer->duty.ratio) {
> + pr_debug("starting duty ratio mode\n");
> + INIT_DELAYED_WORK(&freezer->freezer_work, freezer_work_fn);
> + freezer->duty.enabled = 1;
> + schedule_delayed_work(&freezer->freezer_work, 0);
> + }
> +}
> +
> static int freezer_write(struct cgroup *cgroup,
> struct cftype *cft,
> const char *buffer)
> {
> int retval;
> enum freezer_state goal_state;
> + struct freezer *freezer;
>
> - if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) == 0)
> + freezer = cgroup_freezer(cgroup);
> + if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) == 0) {
> goal_state = CGROUP_THAWED;
> - else if (strcmp(buffer, freezer_state_strs[CGROUP_FROZEN]) == 0)
> + freezer->duty.ratio = 0;
> + } else if (strcmp(buffer, freezer_state_strs[CGROUP_FROZEN]) == 0) {
> + freezer->duty.ratio = 100;
> goal_state = CGROUP_FROZEN;
> + }
> else
> return -EINVAL;
>
> + /* we should stop duty ratio toggling if user wants to
> + * force change to a valid state.
> + */
> + freezer_check_duty_cycling(freezer, 0);

Doesn't that need to be under the lock?

> +
> if (!cgroup_lock_live_group(cgroup))
> return -ENODEV;
> retval = freezer_change_state(cgroup, goal_state);
> @@ -368,12 +468,61 @@ static int freezer_write(struct cgroup *cgroup,
> return retval;
> }
>

A kerneldoc comment would also be nice to have here.

> +static int freezer_write_param(struct cgroup *cgroup, struct cftype *cft,
> + u64 val)
> +{
> + struct freezer *freezer;
> + int ret = 0;
> +
> + freezer = cgroup_freezer(cgroup);
> + if (!cgroup_lock_live_group(cgroup))
> + return -ENODEV;
> +
> + spin_lock_irq(&freezer->lock);
> + switch (cft->private) {
> + case FREEZER_DUTY_RATIO:
> + if (val > 100) {
> + ret = -EINVAL;
> + goto exit;
> + }
> + freezer->duty.ratio = val;
> + break;
> + case FREEZER_PERIOD:
> + if (!val) {
> + ret = -EINVAL;
> + goto exit;
> + }
> + freezer->duty.period_pct_ms = val * 10;
> + break;
> + default:
> + BUG();

Rule: Use BUG() only if the consequences of not doing so would be _worse_
than a kernel crash.

> + }
> + spin_unlock_irq(&freezer->lock);
> + freezer_check_duty_cycling(freezer, 1);
> +exit:
> + cgroup_unlock();
> + return ret;
> +}
> +
> static struct cftype files[] = {
> {
> .name = "state",
> .read_seq_string = freezer_read,
> .write_string = freezer_write,
> },
> + {
> + .name = "frozen_time_percentage",
> + .private = FREEZER_DUTY_RATIO,
> + .read_u64 = freezer_read_duty_ratio,
> + .write_u64 = freezer_write_param,
> + },
> + {
> + .name = "period_second",
> + .private = FREEZER_PERIOD,
> + .read_u64 = freezer_read_period,
> + .write_u64 = freezer_write_param,
> + },
> +
> };
>
> static int freezer_populate(struct cgroup_subsys *ss, struct cgroup *cgroup)

Well, quite frankly, I'm not going to take this patch unless it gets an ACK
from the scheduler people (which I'm guessing is not going to happen before
hell freezes over).

IOW, please find a better way to address the issue at hand.

Thanks,
Rafael

2011-02-16 00:40:30

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH 1/1, v9] cgroup/freezer: add per freezer duty ratio control

On Wed, 16 Feb 2011 01:00:15 +0100
"Rafael J. Wysocki" <[email protected]> wrote:

> On Tuesday, February 15, 2011, [email protected] wrote:
> > From: Jacob Pan <[email protected]>
> >
> > Freezer subsystem is used to manage batch jobs which can start
> > stop at the same time. However, sometime it is desirable to let
> > the kernel manage the freezer state automatically with a given
> > duty ratio.
> > For example, if we want to reduce the time that backgroup apps
> > are allowed to run we can put them into a freezer subsystem and
> > set the kernel to turn them THAWED/FROZEN at given period and
> > percentage of frozen time.
> >
> > This patch introduces two file nodes under cgroup
> > freezer.period_second
> > freezer.frozen_time_percentage
> >
> > Usage example: set period to be 5 seconds and frozen duty ratio 90%
> > [root@localhost aoa]# echo 90 > freezer.frozen_time_percentage
> > [root@localhost aoa]# echo 5 > freezer.period_second
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > Documentation/cgroups/freezer-subsystem.txt | 33 ++++++
> > kernel/cgroup_freezer.c | 155
> > ++++++++++++++++++++++++++- 2 files changed, 185 insertions(+), 3
> > deletions(-)
> >
> > diff --git a/Documentation/cgroups/freezer-subsystem.txt
> > b/Documentation/cgroups/freezer-subsystem.txt index
> > 41f37fe..df0f998 100644 ---
> > a/Documentation/cgroups/freezer-subsystem.txt +++
> > b/Documentation/cgroups/freezer-subsystem.txt @@ -100,3 +100,36 @@
> > things happens: and returns EINVAL)
> > 3) The tasks that blocked the cgroup from entering the
> > "FROZEN" state disappear from the cgroup's set of tasks.
> > +
> > +In embedded systems, it is desirable to manage group of
> > applications +for power saving. E.g. tasks that are not in the
> > foreground may be +frozen and unfrozen periodically to save power
> > without affecting user +experience. In this case, user/management
> > software can attach tasks +into freezer cgroup then specify a
> > period and percentage of time the +tasks within the cgroup shall be
> > frozen. +
> > +100% frozen will result in the same behavior as setting
> > freezer.state +to FROZEN. Similarly, 0% is equivalent to setting
> > freezer.state to +THAWED.
> > +The granularity of the period is second, due to the fact that small
> > +period may result in greater overhead and causes stability issue
> > since +common workqueue is used here to perform the management.
> > +
> > +Zero value of period_second is not allowed, the default period is 1
> > +second.
> > +
> > +Usage example:
> > +Assuming freezer cgroup is already mounted, application being
> > managed +are included the "tasks" file node of the given freezer
> > cgroup. +To make the tasks frozen at 90% of the time every 5
> > seconds, do: +
> > +[root@localhost ]# echo 90 > freezer.frozen_time_percentage
> > +[root@localhost ]# echo 5 > freezer.period_second
>
> Well, are you trying to work around a problem with some applications
> that cause too many wakeups to occur?
>
not exactly. I am trying to use this feature to manage power thermal
aspect as well.
> Why do you need the kernel to do that for you on a per-cgroup basis?
> What about simply sending SIGSTOP/SIGCONT periodically from user
> space to the applications in question?
>
I thought Documentation/cgroups/freezer-subsystem.tx explains the
reason why "SIGSTOP and SIGCONT are not always sufficient for stopping
and resuming tasks in userspace".
> Moreover, what about having a user space process that will
> freeze/unfreeze the entire cgroup as needed?
>
Yes, it is doable in the userspace, but managing in userspace has
greater overhead.

> > +After that, the application in this freezer cgroup will only be
> > +allowed to run at the following pattern.
> > + __ __ __
> > + | |<-- 90% frozen -->| | | |
> > +____| |__________________| |__________________| |_____
> > +
> > + |<---- 5 seconds ---->|
>
> How are you going to handle dependencies between the "periodically
> frozen" applications and other applications that may need them to do
> some work?
>
I don't have an answer, but first of all when we launch an app, it
makes sure all its tasks (children) are in the same cgroup and the
common services are allowed to run all the time, this should reduce the
dependencies. Secondly, my goal is to keep the apps alive such that
certain power/thermal targets can be met, under some cases
such as when the screen is off or an app is not in the foreground.
Sometimes, the only goal is to keep them alive, perhaps we need some
tuning to find that frozen time percentage limit.

IMHO, w/o this patch, the dependency problem still exists, true?

> > diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> > index e7bebb7..32beacb 100644
> > --- a/kernel/cgroup_freezer.c
> > +++ b/kernel/cgroup_freezer.c
> > @@ -28,12 +28,31 @@ enum freezer_state {
> > CGROUP_FROZEN,
> > };
> >
> > +enum duty_ratio_params {
> > + FREEZER_DUTY_RATIO = 0,
> > + FREEZER_PERIOD,
> > +};
> > +
> > +struct freezer_duty {
> > + u32 ratio; /* percentage of time frozen */
> > + u32 period_pct_ms; /* one percent of the period in
> > miliseconds */
>
> What period?
>
> > + u32 enabled;
> > +};
> > +
> > struct freezer {
> > struct cgroup_subsys_state css;
> > enum freezer_state state;
> > + struct freezer_duty duty;
> > + struct delayed_work freezer_work; /* work to duty-cycle a
> > cgroup */ spinlock_t lock; /* protects _writes_ to state */
> > };
> >
> > +static int freezer_change_state(struct cgroup *cgroup,
> > + enum freezer_state goal_state);
> > +static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer
> > *freezer); +static int try_to_freeze_cgroup(struct cgroup *cgroup,
> > struct freezer *freezer); +static void update_if_frozen(struct
> > cgroup *cgroup, struct freezer *freezer); +
> > static inline struct freezer *cgroup_freezer(
> > struct cgroup *cgroup)
> > {
> > @@ -63,6 +82,35 @@ int cgroup_freezing_or_frozen(struct task_struct
> > *task) return result;
> > }
> >
> > +static void freezer_work_fn(struct work_struct *work)
> > +{
> > + struct freezer *freezer;
> > + unsigned long delay_jiffies = 0;
> > +
> > + freezer = container_of(work, struct freezer,
> > freezer_work.work);
> > + spin_lock_irq(&freezer->lock);
>
> So you switch interrupts off every once a while simply to
> freeze/unfreeze your cgroup. That's going to affect the scheduler
> quite a bit, isn't it?
>
consider we are only doing this at long interval, perhaps the impact is
less. on the other side, the race is harmless as I mentioned in the
previous patch review, i could do without the lock.
> > + /* toggle between THAWED and FROZEN state. */
> > + update_if_frozen(freezer->css.cgroup, freezer);
>
> What about using a local variable cgroup = freezer->css.cgroup ?
>
sure, i can change that.
> > + switch (freezer->state) {
> > + case CGROUP_FROZEN:
> > + case CGROUP_FREEZING:
> > + delay_jiffies = msecs_to_jiffies((100 -
> > freezer->duty.ratio) *
> > +
> > freezer->duty.period_pct_ms);
> > + unfreeze_cgroup(freezer->css.cgroup, freezer);
> > + break;
> > + case CGROUP_THAWED:
> > + delay_jiffies =
> > msecs_to_jiffies(freezer->duty.ratio *
> > +
> > freezer->duty.period_pct_ms);
> > + try_to_freeze_cgroup(freezer->css.cgroup, freezer);
> > + break;
> > + default:
> > + BUG();
>
> Don't crash the kernel if you don't have to.
>
ok. good point.
> > + }
> > +
> > + schedule_delayed_work(&freezer->freezer_work,
> > delay_jiffies);
>
> What if this work happens to run during system suspend/resume?
>
> > + spin_unlock_irq(&freezer->lock);
> > +}
> > +
> > /*
> > * cgroups_write_string() limits the size of freezer state strings
> > to
> > * CGROUP_LOCAL_BUFFER_SIZE
> > @@ -144,13 +192,19 @@ static struct cgroup_subsys_state
> > *freezer_create(struct cgroup_subsys *ss,
> > spin_lock_init(&freezer->lock);
> > freezer->state = CGROUP_THAWED;
> > + freezer->duty.period_pct_ms = 10; /* default to 1 second
> > period */
>
> Why this number?
>
just a default value, keep it non-zero. also in the range that not too
much overhead in managing the toggling between FROZEN and THAWED.

> > return &freezer->css;
> > }
> >
> > static void freezer_destroy(struct cgroup_subsys *ss,
> > struct cgroup *cgroup)
> > {
> > - kfree(cgroup_freezer(cgroup));
> > + struct freezer *freezer;
> > +
> > + freezer = cgroup_freezer(cgroup);
> > + if (freezer->duty.enabled)
> > + cancel_delayed_work_sync(&freezer->freezer_work);
> > + kfree(freezer);
> > }
> >
> > /*
> > @@ -282,6 +336,16 @@ static int freezer_read(struct cgroup *cgroup,
> > struct cftype *cft, return 0;
> > }
> >
> > +static u64 freezer_read_duty_ratio(struct cgroup *cgroup, struct
> > cftype *cft) +{
> > + return cgroup_freezer(cgroup)->duty.ratio;
> > +}
> > +
> > +static u64 freezer_read_period(struct cgroup *cgroup, struct
> > cftype *cft) +{
> > + return cgroup_freezer(cgroup)->duty.period_pct_ms / 10;
> > +}
> > +
> > static int try_to_freeze_cgroup(struct cgroup *cgroup, struct
> > freezer *freezer) {
> > struct cgroup_iter it;
> > @@ -347,20 +411,56 @@ out:
> > return retval;
> > }
> >
>
> It would be _nice_ to add a kerneldoc comment explaining the purpose
> of the new function, right?
>
> > +static void freezer_check_duty_cycling(struct freezer *freezer,
> > int set_state) +{
> > + if (freezer->duty.enabled) {
> > + if (freezer->duty.ratio && freezer->duty.ratio <
> > 100)
> > + return;
> > + pr_debug("stopping duty ratio mode\n");
> > + cancel_delayed_work_sync(&freezer->freezer_work);
> > + freezer->duty.enabled = 0;
> > + if (set_state) {
> > + update_if_frozen(freezer->css.cgroup,
> > freezer);
> > + if (freezer->duty.ratio == 100)
> > +
> > try_to_freeze_cgroup(freezer->css.cgroup,
> > + freezer);
> > + else if (freezer->duty.ratio == 0)
> > +
> > unfreeze_cgroup(freezer->css.cgroup, freezer);
> > + else
> > + BUG();
>
> Again, don't crash the kernel if you don't have to.
>
ditto.
> > + }
> > + } else if (freezer->duty.ratio < 100 &&
> > freezer->duty.ratio) {
> > + pr_debug("starting duty ratio mode\n");
> > + INIT_DELAYED_WORK(&freezer->freezer_work,
> > freezer_work_fn);
> > + freezer->duty.enabled = 1;
> > + schedule_delayed_work(&freezer->freezer_work, 0);
> > + }
> > +}
> > +
> > static int freezer_write(struct cgroup *cgroup,
> > struct cftype *cft,
> > const char *buffer)
> > {
> > int retval;
> > enum freezer_state goal_state;
> > + struct freezer *freezer;
> >
> > - if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) == 0)
> > + freezer = cgroup_freezer(cgroup);
> > + if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) ==
> > 0) { goal_state = CGROUP_THAWED;
> > - else if (strcmp(buffer, freezer_state_strs[CGROUP_FROZEN])
> > == 0)
> > + freezer->duty.ratio = 0;
> > + } else if (strcmp(buffer,
> > freezer_state_strs[CGROUP_FROZEN]) == 0) {
> > + freezer->duty.ratio = 100;
> > goal_state = CGROUP_FROZEN;
> > + }
> > else
> > return -EINVAL;
> >
> > + /* we should stop duty ratio toggling if user wants to
> > + * force change to a valid state.
> > + */
> > + freezer_check_duty_cycling(freezer, 0);
>
> Doesn't that need to be under the lock?
>
i guess so.
> > +
> > if (!cgroup_lock_live_group(cgroup))
> > return -ENODEV;
> > retval = freezer_change_state(cgroup, goal_state);
> > @@ -368,12 +468,61 @@ static int freezer_write(struct cgroup
> > *cgroup, return retval;
> > }
> >
>
> A kerneldoc comment would also be nice to have here.
>
ok, i will add that.
> > +static int freezer_write_param(struct cgroup *cgroup, struct
> > cftype *cft,
> > + u64 val)
> > +{
> > + struct freezer *freezer;
> > + int ret = 0;
> > +
> > + freezer = cgroup_freezer(cgroup);
> > + if (!cgroup_lock_live_group(cgroup))
> > + return -ENODEV;
> > +
> > + spin_lock_irq(&freezer->lock);
> > + switch (cft->private) {
> > + case FREEZER_DUTY_RATIO:
> > + if (val > 100) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > + freezer->duty.ratio = val;
> > + break;
> > + case FREEZER_PERIOD:
> > + if (!val) {
> > + ret = -EINVAL;
> > + goto exit;
> > + }
> > + freezer->duty.period_pct_ms = val * 10;
> > + break;
> > + default:
> > + BUG();
>
> Rule: Use BUG() only if the consequences of not doing so would be
> _worse_ than a kernel crash.
>
thanks for the tip, will fix.
> > + }
> > + spin_unlock_irq(&freezer->lock);
> > + freezer_check_duty_cycling(freezer, 1);
> > +exit:
> > + cgroup_unlock();
> > + return ret;
> > +}
> > +
> > static struct cftype files[] = {
> > {
> > .name = "state",
> > .read_seq_string = freezer_read,
> > .write_string = freezer_write,
> > },
> > + {
> > + .name = "frozen_time_percentage",
> > + .private = FREEZER_DUTY_RATIO,
> > + .read_u64 = freezer_read_duty_ratio,
> > + .write_u64 = freezer_write_param,
> > + },
> > + {
> > + .name = "period_second",
> > + .private = FREEZER_PERIOD,
> > + .read_u64 = freezer_read_period,
> > + .write_u64 = freezer_write_param,
> > + },
> > +
> > };
> >
> > static int freezer_populate(struct cgroup_subsys *ss, struct
> > cgroup *cgroup)
>
> Well, quite frankly, I'm not going to take this patch unless it gets
> an ACK from the scheduler people (which I'm guessing is not going to
> happen before hell freezes over).
>
> IOW, please find a better way to address the issue at hand.
>
We do have a real need that there is no exist feature in the kernel can
provide solution for. You want ACK from scheduler people because it has
impact on disabling irq? or you think scheduler should be the one that
provide the solution. I did try cpu subsystem, but it seems to be
limited to RT and certain scheduling policy e.g. RR and FIFO.

thanks,

Jacob

2011-02-16 03:23:34

by Matt Helsley

[permalink] [raw]
Subject: Re: [PATCH 1/1, v9] cgroup/freezer: add per freezer duty ratio control

On Tue, Feb 15, 2011 at 04:38:12PM -0800, Jacob Pan wrote:
> On Wed, 16 Feb 2011 01:00:15 +0100
> "Rafael J. Wysocki" <[email protected]> wrote:

<snip>

> > Well, quite frankly, I'm not going to take this patch unless it gets
> > an ACK from the scheduler people (which I'm guessing is not going to
> > happen before hell freezes over).
> >
> > IOW, please find a better way to address the issue at hand.
> >
> We do have a real need that there is no exist feature in the kernel can
> provide solution for. You want ACK from scheduler people because it has
> impact on disabling irq? or you think scheduler should be the one that
> provide the solution. I did try cpu subsystem, but it seems to be
> limited to RT and certain scheduling policy e.g. RR and FIFO.

I agree with Rafael. I think the scheduler should provide the solution
and it can be done via modifications to the cpu cgroup subsystem.
Yes, it only has the shares and rt-related files *right now*. However,
Kame replied earlier with a link to some patches for extending it
that introduce files with similar (granted: not the same) semantics:

http://linux.derkeiler.com/Mailing-Lists/Kernel/2010-10/msg04335.html

I think you could build on that and help push those patches towards
mainline along with your enhancements for reducing wakeups rather
than modify the freezer cgroup subsystem.

Cheers,
-Matt Helsley

2011-02-16 18:20:06

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH 1/1, v9] cgroup/freezer: add per freezer duty ratio control

On Tue, 15 Feb 2011 19:23:21 -0800
Matt Helsley <[email protected]> wrote:

> On Tue, Feb 15, 2011 at 04:38:12PM -0800, Jacob Pan wrote:
> > On Wed, 16 Feb 2011 01:00:15 +0100
> > "Rafael J. Wysocki" <[email protected]> wrote:
>
> <snip>
>
> > > Well, quite frankly, I'm not going to take this patch unless it
> > > gets an ACK from the scheduler people (which I'm guessing is not
> > > going to happen before hell freezes over).
> > >
> > > IOW, please find a better way to address the issue at hand.
> > >
> > We do have a real need that there is no exist feature in the kernel
> > can provide solution for. You want ACK from scheduler people
> > because it has impact on disabling irq? or you think scheduler
> > should be the one that provide the solution. I did try cpu
> > subsystem, but it seems to be limited to RT and certain scheduling
> > policy e.g. RR and FIFO.
>
> I agree with Rafael. I think the scheduler should provide the solution
> and it can be done via modifications to the cpu cgroup subsystem.
> Yes, it only has the shares and rt-related files *right now*. However,
> Kame replied earlier with a link to some patches for extending it
> that introduce files with similar (granted: not the same) semantics:
>
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2010-10/msg04335.html
>
> I think you could build on that and help push those patches towards
> mainline along with your enhancements for reducing wakeups rather
> than modify the freezer cgroup subsystem.
>
I will give CFS bandwidth control patches a try. See if I can be any
help. Thanks.

Jacob

2011-02-24 23:45:28

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH 1/1, v9] cgroup/freezer: add per freezer duty ratio control

On Tue, 15 Feb 2011 19:23:21 -0800
Matt Helsley <[email protected]> wrote:

> On Tue, Feb 15, 2011 at 04:38:12PM -0800, Jacob Pan wrote:
> > On Wed, 16 Feb 2011 01:00:15 +0100
> > "Rafael J. Wysocki" <[email protected]> wrote:
>
> <snip>
>
> > > Well, quite frankly, I'm not going to take this patch unless it
> > > gets an ACK from the scheduler people (which I'm guessing is not
> > > going to happen before hell freezes over).
> > >
> > > IOW, please find a better way to address the issue at hand.
> > >
> > We do have a real need that there is no exist feature in the kernel
> > can provide solution for. You want ACK from scheduler people
> > because it has impact on disabling irq? or you think scheduler
> > should be the one that provide the solution. I did try cpu
> > subsystem, but it seems to be limited to RT and certain scheduling
> > policy e.g. RR and FIFO.
>
> I agree with Rafael. I think the scheduler should provide the solution
> and it can be done via modifications to the cpu cgroup subsystem.
> Yes, it only has the shares and rt-related files *right now*. However,
> Kame replied earlier with a link to some patches for extending it
> that introduce files with similar (granted: not the same) semantics:
>
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2010-10/msg04335.html
>
> I think you could build on that and help push those patches towards
> mainline along with your enhancements for reducing wakeups rather
> than modify the freezer cgroup subsystem.
>
I played with v3 and v4 of the CFS bandwidth patch. When the cpu
cgroup exceeds its cfs_quota, it does have the same effect as this
patch in terms of freeze/thaw at given period and allowed runtime. But
when the cgroup cpu usage is below cfs_quota, it is not throttled.
Therefore, it cannot reduce wakeups.
I will continue look into the possibility of enhancing CFS BW patch for
our needs, But again, the fundamental difference here is bandwidth vs
idle time control. What we need for power saving is idle time not
bandwidth.

Thanks,

Jacob

2011-02-25 17:54:23

by Paul Menage

[permalink] [raw]
Subject: Re: [PATCH 1/1, v9] cgroup/freezer: add per freezer duty ratio control

On Thu, Feb 24, 2011 at 3:45 PM, jacob pan
<[email protected]> wrote:
> I played with v3 and v4 of the CFS bandwidth patch. When the cpu
> cgroup exceeds its cfs_quota, it does have the same effect as this
> patch in terms of freeze/thaw at given period and allowed runtime. But
> when the cgroup cpu usage is below cfs_quota, it is not throttled.
> Therefore, it cannot reduce wakeups.

How about a userspace daemon that periodically flips the CPU quota for
the cgroup between zero and the group's runnable level? Wouldn't that
achieve what you need pretty easily without having to introduce
additional complexity and threads into the kernel?

Paul

2011-02-25 18:25:10

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH 1/1, v9] cgroup/freezer: add per freezer duty ratio control

On Fri, 25 Feb 2011 09:53:59 -0800
Paul Menage <[email protected]> wrote:

> On Thu, Feb 24, 2011 at 3:45 PM, jacob pan
> <[email protected]> wrote:
> > I played with v3 and v4 of the CFS bandwidth patch. When the cpu
> > cgroup exceeds its cfs_quota, it does have the same effect as this
> > patch in terms of freeze/thaw at given period and allowed runtime.
> > But when the cgroup cpu usage is below cfs_quota, it is not
> > throttled. Therefore, it cannot reduce wakeups.
>
> How about a userspace daemon that periodically flips the CPU quota for
> the cgroup between zero and the group's runnable level? Wouldn't that
> achieve what you need pretty easily without having to introduce
> additional complexity and threads into the kernel?
I think it should work but with little bit more overhead than
doing the same in the kernel. It will also need one periodic timer per
cgroup. Two extra timer wake ups for each time slice to run.
Thanks for the great suggestion. I will do some experiment with it.