2015-04-10 14:12:06

by Preeti U Murthy

[permalink] [raw]
Subject: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

The cpus_allowed and mems_allowed masks of a cpuset get overwritten
after each hotplug operation on the legacy hierarchy of cgroups so as to
remain in sync with the online mask. But there are use cases which
expect user configured masks to remain unchanged.

For instance, when hotplugged out CPUs are brought back online, they
remain idle with none of the existing tasks allowed to run on them since
the cpus_allowed mask was overwritten to not include them when they were
offlined.

We cannot change the legacy hierarchy design now to keep the allowed
masks hotplug invariant since it is a user visible change. It was
suggested instead to add a knob in the root cpuset directory which
allows the user to specify if he wants the user configured masks to be
hotplug invariant [1]. This knob will enforce the choice throughout the
hierarchy. If the knob is set, the allowed maks will not be varied on
hotplug. It is also to to be noted that this knob will appear in the
root cgroup mounted on the legacy hierarchy alone since the default
hierarchy does not overwrite the allowed masks anyway.

Having said this, there are fair reasons to argue that the kernel is not
responsible for taking care of user configurations in the face of
hotplug. But one of the consequences of the current legacy hierarchy
design, is that CPUs are left out from being used at all on online
operations. The reason for this is not very obvious at first and several
users have raised the issue as a bug. Hence the patch was strongly
called for.

Moreover the default hierarchy keeps the allowed masks hotplug invariant
too. So the patch is not bringing about a fundamental change in the
design of cgroups.

[1] https://lkml.org/lkml/2015/4/6/196

Signed-off-by: Preeti U Murthy <[email protected]>
---
Changes from V1: https://lkml.org/lkml/2014/10/8/46
Add a knob to bring about the behavioral change

kernel/cpuset.c | 67 +++++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 50 insertions(+), 17 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index fc7f474..b14d15f 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -63,6 +63,9 @@

struct static_key cpusets_enabled_key __read_mostly = STATIC_KEY_INIT_FALSE;

+/* Knob to request hotplug invariant allowed masks on legacy hierarcy */
+static int cpuset_hotplug_invariant __read_mostly;
+
/* See "Frequency meter" comments, below. */

struct fmeter {
@@ -887,8 +890,8 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
cpumask_copy(cp->effective_cpus, new_cpus);
spin_unlock_irq(&callback_lock);

- WARN_ON(!cgroup_on_dfl(cp->css.cgroup) &&
- !cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
+ WARN_ON(!cgroup_on_dfl(cp->css.cgroup) && !cpuset_hotplug_invariant
+ && !cpumask_equal(cp->cpus_allowed, cp->effective_cpus));

update_tasks_cpumask(cp);

@@ -1143,8 +1146,8 @@ static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems)
cp->effective_mems = *new_mems;
spin_unlock_irq(&callback_lock);

- WARN_ON(!cgroup_on_dfl(cp->css.cgroup) &&
- !nodes_equal(cp->mems_allowed, cp->effective_mems));
+ WARN_ON(!cgroup_on_dfl(cp->css.cgroup) && !cpuset_hotplug_invariant
+ && !nodes_equal(cp->mems_allowed, cp->effective_mems));

update_tasks_nodemask(cp);

@@ -1429,11 +1432,17 @@ static int cpuset_can_attach(struct cgroup_subsys_state *css,

mutex_lock(&cpuset_mutex);

- /* allow moving tasks into an empty cpuset if on default hierarchy */
+ /*
+ * Allow moving tasks into an empty cpuset if on default hierarchy
+ * Use effective_cpus instead of cpus_allowed to check for empty
+ * cpusets since these masks are equivalent on legacy hierarchy
+ * and more importantly will enable cases where cpuset_hotplug_invariant
+ * is set to fall in place.
+ */
ret = -ENOSPC;
if (!cgroup_on_dfl(css->cgroup) &&
- (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed)))
- goto out_unlock;
+ (cpumask_empty(cs->effective_cpus) || nodes_empty(cs->effective_mems)))
+ goto out_unlock;

cgroup_taskset_for_each(task, tset) {
ret = task_can_attach(task, cs->cpus_allowed);
@@ -1551,6 +1560,7 @@ typedef enum {
FILE_MEMORY_PRESSURE,
FILE_SPREAD_PAGE,
FILE_SPREAD_SLAB,
+ FILE_HOTPLUG_INVARIANT,
} cpuset_filetype_t;

static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
@@ -1594,6 +1604,9 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
case FILE_SPREAD_SLAB:
retval = update_flag(CS_SPREAD_SLAB, cs, val);
break;
+ case FILE_HOTPLUG_INVARIANT:
+ cpuset_hotplug_invariant = !!val;
+ break;
default:
retval = -EINVAL;
break;
@@ -1752,6 +1765,8 @@ static u64 cpuset_read_u64(struct cgroup_subsys_state *css, struct cftype *cft)
return is_spread_page(cs);
case FILE_SPREAD_SLAB:
return is_spread_slab(cs);
+ case FILE_HOTPLUG_INVARIANT:
+ return cpuset_hotplug_invariant;
default:
BUG();
}
@@ -1881,6 +1896,14 @@ static struct cftype files[] = {
.private = FILE_MEMORY_PRESSURE_ENABLED,
},

+ {
+ .name = "hotplug_invariant",
+ .flags = CFTYPE_ONLY_ON_ROOT | __CFTYPE_NOT_ON_DFL,
+ .read_u64 = cpuset_read_u64,
+ .write_u64 = cpuset_write_u64,
+ .private = FILE_HOTPLUG_INVARIANT,
+ },
+
{ } /* terminate */
};

@@ -2096,8 +2119,14 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
* has online cpus, so can't be empty).
*/
parent = parent_cs(cs);
- while (cpumask_empty(parent->cpus_allowed) ||
- nodes_empty(parent->mems_allowed))
+ /*
+ * Use effective_cpus instead of cpus_allowed to check for empty
+ * cpusets since these masks are equivalent on legacy hierarchy
+ * and more importantly will enable cases where cpuset_hotplug_invariant
+ * is set to fall in place.
+ */
+ while (cpumask_empty(parent->effective_cpus) ||
+ nodes_empty(parent->effective_mems))
parent = parent_cs(parent);

if (cgroup_transfer_tasks(parent->css.cgroup, cs->css.cgroup)) {
@@ -2115,23 +2144,27 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
bool is_empty;

spin_lock_irq(&callback_lock);
- cpumask_copy(cs->cpus_allowed, new_cpus);
+ if (!cpuset_hotplug_invariant) {
+ cpumask_copy(cs->cpus_allowed, new_cpus);
+ cs->mems_allowed = *new_mems;
+ }
cpumask_copy(cs->effective_cpus, new_cpus);
- cs->mems_allowed = *new_mems;
cs->effective_mems = *new_mems;
spin_unlock_irq(&callback_lock);

/*
* Don't call update_tasks_cpumask() if the cpuset becomes empty,
- * as the tasks will be migratecd to an ancestor.
+ * as the tasks will be migrated to an ancestor. Check
+ * effective_cpus so that cases where cpuset_hotplug_invariant
+ * is set falls in place.
*/
- if (cpus_updated && !cpumask_empty(cs->cpus_allowed))
+ if (cpus_updated && !cpumask_empty(cs->effective_cpus))
update_tasks_cpumask(cs);
- if (mems_updated && !nodes_empty(cs->mems_allowed))
+ if (mems_updated && !nodes_empty(cs->effective_mems))
update_tasks_nodemask(cs);

- is_empty = cpumask_empty(cs->cpus_allowed) ||
- nodes_empty(cs->mems_allowed);
+ is_empty = cpumask_empty(cs->effective_cpus) ||
+ nodes_empty(cs->effective_mems);

mutex_unlock(&cpuset_mutex);

@@ -2246,7 +2279,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
/* synchronize cpus_allowed to cpu_active_mask */
if (cpus_updated) {
spin_lock_irq(&callback_lock);
- if (!on_dfl)
+ if (!on_dfl && !cpuset_hotplug_invariant)
cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
cpumask_copy(top_cpuset.effective_cpus, &new_cpus);
spin_unlock_irq(&callback_lock);


2015-04-11 08:35:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On Fri, Apr 10, 2015 at 07:41:52PM +0530, Preeti U Murthy wrote:
> The cpus_allowed and mems_allowed masks of a cpuset get overwritten
> after each hotplug operation on the legacy hierarchy of cgroups so as to
> remain in sync with the online mask. But there are use cases which
> expect user configured masks to remain unchanged.
>
> For instance, when hotplugged out CPUs are brought back online, they
> remain idle with none of the existing tasks allowed to run on them since
> the cpus_allowed mask was overwritten to not include them when they were
> offlined.
>
> We cannot change the legacy hierarchy design now to keep the allowed
> masks hotplug invariant since it is a user visible change. It was
> suggested instead to add a knob in the root cpuset directory which
> allows the user to specify if he wants the user configured masks to be
> hotplug invariant [1]. This knob will enforce the choice throughout the
> hierarchy. If the knob is set, the allowed maks will not be varied on
> hotplug. It is also to to be noted that this knob will appear in the
> root cgroup mounted on the legacy hierarchy alone since the default
> hierarchy does not overwrite the allowed masks anyway.
>
> Having said this, there are fair reasons to argue that the kernel is not
> responsible for taking care of user configurations in the face of
> hotplug. But one of the consequences of the current legacy hierarchy
> design, is that CPUs are left out from being used at all on online
> operations. The reason for this is not very obvious at first and several
> users have raised the issue as a bug. Hence the patch was strongly
> called for.
>
> Moreover the default hierarchy keeps the allowed masks hotplug invariant
> too. So the patch is not bringing about a fundamental change in the
> design of cgroups.

What you've not explained is why you can use this knob but not use the
other new mode?

2015-04-11 11:21:47

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On 04/11/2015 02:05 PM, Peter Zijlstra wrote:
> On Fri, Apr 10, 2015 at 07:41:52PM +0530, Preeti U Murthy wrote:
>> The cpus_allowed and mems_allowed masks of a cpuset get overwritten
>> after each hotplug operation on the legacy hierarchy of cgroups so as to
>> remain in sync with the online mask. But there are use cases which
>> expect user configured masks to remain unchanged.
>>
>> For instance, when hotplugged out CPUs are brought back online, they
>> remain idle with none of the existing tasks allowed to run on them since
>> the cpus_allowed mask was overwritten to not include them when they were
>> offlined.
>>
>> We cannot change the legacy hierarchy design now to keep the allowed
>> masks hotplug invariant since it is a user visible change. It was
>> suggested instead to add a knob in the root cpuset directory which
>> allows the user to specify if he wants the user configured masks to be
>> hotplug invariant [1]. This knob will enforce the choice throughout the
>> hierarchy. If the knob is set, the allowed maks will not be varied on
>> hotplug. It is also to to be noted that this knob will appear in the
>> root cgroup mounted on the legacy hierarchy alone since the default
>> hierarchy does not overwrite the allowed masks anyway.
>>
>> Having said this, there are fair reasons to argue that the kernel is not
>> responsible for taking care of user configurations in the face of
>> hotplug. But one of the consequences of the current legacy hierarchy
>> design, is that CPUs are left out from being used at all on online
>> operations. The reason for this is not very obvious at first and several
>> users have raised the issue as a bug. Hence the patch was strongly
>> called for.
>>
>> Moreover the default hierarchy keeps the allowed masks hotplug invariant
>> too. So the patch is not bringing about a fundamental change in the
>> design of cgroups.
>
> What you've not explained is why you can use this knob but not use the
> other new mode?

It is claimed that the existing software which work with the legacy
hierarchy break when cgroups are mounted on the default hierarchy due to
poor backward compatibility. I will include this point in the changelog.

Thanks

Regards
Preeti U Murthy

>

2015-04-11 13:42:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On Sat, Apr 11, 2015 at 04:51:32PM +0530, Preeti U Murthy wrote:
> On 04/11/2015 02:05 PM, Peter Zijlstra wrote:
> > On Fri, Apr 10, 2015 at 07:41:52PM +0530, Preeti U Murthy wrote:
> > What you've not explained is why you can use this knob but not use the
> > other new mode?
>
> It is claimed that the existing software which work with the legacy
> hierarchy break when cgroups are mounted on the default hierarchy due to
> poor backward compatibility. I will include this point in the changelog.

But you don't have to mount the new thing in the unified hierarchy
nonsense. You can mount if separate but still with the new semantics.

Unless of course system disease disallows this, but that's not a kernel
problem.

2015-04-13 07:01:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On Sat, Apr 11, 2015 at 10:35:37AM +0200, Peter Zijlstra wrote:
> On Fri, Apr 10, 2015 at 07:41:52PM +0530, Preeti U Murthy wrote:
> > The cpus_allowed and mems_allowed masks of a cpuset get overwritten
> > after each hotplug operation on the legacy hierarchy of cgroups so as to
> > remain in sync with the online mask. But there are use cases which
> > expect user configured masks to remain unchanged.
> >
> > For instance, when hotplugged out CPUs are brought back online, they
> > remain idle with none of the existing tasks allowed to run on them since
> > the cpus_allowed mask was overwritten to not include them when they were
> > offlined.
> >
> > We cannot change the legacy hierarchy design now to keep the allowed
> > masks hotplug invariant since it is a user visible change. It was
> > suggested instead to add a knob in the root cpuset directory which
> > allows the user to specify if he wants the user configured masks to be
> > hotplug invariant [1]. This knob will enforce the choice throughout the
> > hierarchy. If the knob is set, the allowed maks will not be varied on
> > hotplug. It is also to to be noted that this knob will appear in the
> > root cgroup mounted on the legacy hierarchy alone since the default
> > hierarchy does not overwrite the allowed masks anyway.
> >
> > Having said this, there are fair reasons to argue that the kernel is not
> > responsible for taking care of user configurations in the face of
> > hotplug. But one of the consequences of the current legacy hierarchy
> > design, is that CPUs are left out from being used at all on online
> > operations. The reason for this is not very obvious at first and several
> > users have raised the issue as a bug. Hence the patch was strongly
> > called for.
> >
> > Moreover the default hierarchy keeps the allowed masks hotplug invariant
> > too. So the patch is not bringing about a fundamental change in the
> > design of cgroups.
>
> What you've not explained is why you can use this knob but not use the
> other new mode?

Urgh, it looks like the new mode is only available through the default
hierarchy (whatever that is).

Would it not make sense to make that a mount option and limit the amount
of semantic variants of cpusets?

2015-04-13 08:25:56

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On 04/13/2015 12:31 PM, Peter Zijlstra wrote:
> On Sat, Apr 11, 2015 at 10:35:37AM +0200, Peter Zijlstra wrote:
>> On Fri, Apr 10, 2015 at 07:41:52PM +0530, Preeti U Murthy wrote:
>>> The cpus_allowed and mems_allowed masks of a cpuset get overwritten
>>> after each hotplug operation on the legacy hierarchy of cgroups so as to
>>> remain in sync with the online mask. But there are use cases which
>>> expect user configured masks to remain unchanged.
>>>
>>> For instance, when hotplugged out CPUs are brought back online, they
>>> remain idle with none of the existing tasks allowed to run on them since
>>> the cpus_allowed mask was overwritten to not include them when they were
>>> offlined.
>>>
>>> We cannot change the legacy hierarchy design now to keep the allowed
>>> masks hotplug invariant since it is a user visible change. It was
>>> suggested instead to add a knob in the root cpuset directory which
>>> allows the user to specify if he wants the user configured masks to be
>>> hotplug invariant [1]. This knob will enforce the choice throughout the
>>> hierarchy. If the knob is set, the allowed maks will not be varied on
>>> hotplug. It is also to to be noted that this knob will appear in the
>>> root cgroup mounted on the legacy hierarchy alone since the default
>>> hierarchy does not overwrite the allowed masks anyway.
>>>
>>> Having said this, there are fair reasons to argue that the kernel is not
>>> responsible for taking care of user configurations in the face of
>>> hotplug. But one of the consequences of the current legacy hierarchy
>>> design, is that CPUs are left out from being used at all on online
>>> operations. The reason for this is not very obvious at first and several
>>> users have raised the issue as a bug. Hence the patch was strongly
>>> called for.
>>>
>>> Moreover the default hierarchy keeps the allowed masks hotplug invariant
>>> too. So the patch is not bringing about a fundamental change in the
>>> design of cgroups.
>>
>> What you've not explained is why you can use this knob but not use the
>> other new mode?
>
> Urgh, it looks like the new mode is only available through the default
> hierarchy (whatever that is).
>
> Would it not make sense to make that a mount option and limit the amount
> of semantic variants of cpusets?

Makes sense. I will send out the next version that enforces the desired
behavior through a mount option.

Thank you

Regards
Preeti U Murthy
>

2015-04-13 12:16:53

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On 04/13/2015 12:31 PM, Peter Zijlstra wrote:
> On Sat, Apr 11, 2015 at 10:35:37AM +0200, Peter Zijlstra wrote:
>> On Fri, Apr 10, 2015 at 07:41:52PM +0530, Preeti U Murthy wrote:
>>> The cpus_allowed and mems_allowed masks of a cpuset get overwritten
>>> after each hotplug operation on the legacy hierarchy of cgroups so as to
>>> remain in sync with the online mask. But there are use cases which
>>> expect user configured masks to remain unchanged.
>>>
>>> For instance, when hotplugged out CPUs are brought back online, they
>>> remain idle with none of the existing tasks allowed to run on them since
>>> the cpus_allowed mask was overwritten to not include them when they were
>>> offlined.
>>>
>>> We cannot change the legacy hierarchy design now to keep the allowed
>>> masks hotplug invariant since it is a user visible change. It was
>>> suggested instead to add a knob in the root cpuset directory which
>>> allows the user to specify if he wants the user configured masks to be
>>> hotplug invariant [1]. This knob will enforce the choice throughout the
>>> hierarchy. If the knob is set, the allowed maks will not be varied on
>>> hotplug. It is also to to be noted that this knob will appear in the
>>> root cgroup mounted on the legacy hierarchy alone since the default
>>> hierarchy does not overwrite the allowed masks anyway.
>>>
>>> Having said this, there are fair reasons to argue that the kernel is not
>>> responsible for taking care of user configurations in the face of
>>> hotplug. But one of the consequences of the current legacy hierarchy
>>> design, is that CPUs are left out from being used at all on online
>>> operations. The reason for this is not very obvious at first and several
>>> users have raised the issue as a bug. Hence the patch was strongly
>>> called for.
>>>
>>> Moreover the default hierarchy keeps the allowed masks hotplug invariant
>>> too. So the patch is not bringing about a fundamental change in the
>>> design of cgroups.
>>
>> What you've not explained is why you can use this knob but not use the
>> other new mode?
>
> Urgh, it looks like the new mode is only available through the default
> hierarchy (whatever that is).
>
> Would it not make sense to make that a mount option and limit the amount
> of semantic variants of cpusets?
>

I spent some time analyzing if this would be a better option than the
sysfs knob and I think not for the following reasons:

1. Mount options tend to be generic across the controllers of a cgroup.
But use case addressed by this patch is specific to the cpuset controller.

2. The behavior that this patch is trying to bring about is not a
drastic one to call for a mount option equivalent to the __SANE_BEHAVIOR
one that existed earlier. This option was used to switch the legacy
design to the default one.

However this patch is not *wholly* mimicking the default hierarchy
behavior. The behavior when cpusets become empty is left untouched for
instance. The patch borrows one of the behaviors from the default
hierarchy only and hence just not justify the use of a mount flag.

3. cpuset controller already has flags that allow a runtime change in
the semantics of scheduling, memory allocation.
Eg: CS_SCHED_LOAD_BALANCE, CS_SPREAD_PAGE. This page intends to bring
about a runtime change in the semantics of maintaining the allowed cpus
and mem masks in the face of hotplug. So adding a flag to take care of
this looks better and the design falls in place.

4. The worry was around possible race conditions when the user tries to
change this sysfs knob during runtime after mounting a cgroup hierarchy.
But the sites are protected by the cpuset_lock. Moreover all of the
cpuset behavior enforced by its flags can be changed after mounting the
cgroup. So cpuset controller already expects twiddling of these knobs at
will by the user.

So it doesn't look like a good option to change the sysfs knob in this
patch to a mount flag.

Regards
Preeti U Murthy

2015-04-13 14:43:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On Mon, Apr 13, 2015 at 05:46:37PM +0530, Preeti U Murthy wrote:
> On 04/13/2015 12:31 PM, Peter Zijlstra wrote:

> > Would it not make sense to make that a mount option and limit the amount
> > of semantic variants of cpusets?
>
> I spent some time analyzing if this would be a better option than the
> sysfs knob and I think not for the following reasons:
>
> 1. Mount options tend to be generic across the controllers of a cgroup.
> But use case addressed by this patch is specific to the cpuset controller.

Surely we can get around that somehow.

> 2. The behavior that this patch is trying to bring about is not a
> drastic one to call for a mount option equivalent to the __SANE_BEHAVIOR
> one that existed earlier. This option was used to switch the legacy
> design to the default one.
>
> However this patch is not *wholly* mimicking the default hierarchy
> behavior. The behavior when cpusets become empty is left untouched for
> instance. The patch borrows one of the behaviors from the default
> hierarchy only and hence just not justify the use of a mount flag.

So the 'problem' I have is that you introduce a 3rd semantic for the
cpuset thing.

You also do not answer if you can live with the default hierarchy
behaviour, only that your patch mimicks a subset of it.

Why not all of it?

> 3. cpuset controller already has flags that allow a runtime change in
> the semantics of scheduling, memory allocation.
> Eg: CS_SCHED_LOAD_BALANCE, CS_SPREAD_PAGE. This page intends to bring
> about a runtime change in the semantics of maintaining the allowed cpus
> and mem masks in the face of hotplug. So adding a flag to take care of
> this looks better and the design falls in place.

s/page/patch/

I disagree, you change global semantics, the ONLY_ON_ROOT is a hint
here. If such a file really is the only option, then at least take the
entire semantics from default hierarchy, not only part of it.

> 4. The worry was around possible race conditions when the user tries to
> change this sysfs knob during runtime after mounting a cgroup hierarchy.
> But the sites are protected by the cpuset_lock. Moreover all of the
> cpuset behavior enforced by its flags can be changed after mounting the
> cgroup. So cpuset controller already expects twiddling of these knobs at
> will by the user.

That just doesn't parse. cpuset_lock doesn't protect anything userspace.

You cannot atomically read this flag and do something else. It being
ONLY_ON_ROOT might mean you don't even have access to the flag.

2015-04-13 14:51:01

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On Fri, Apr 10, 2015 at 07:41:52PM +0530, Preeti U Murthy wrote:
> Moreover the default hierarchy keeps the allowed masks hotplug invariant
> too. So the patch is not bringing about a fundamental change in the
> design of cgroups.

This is gonna be a new behavior anyway and I don't follow why you just
can't use the default hierarchy (well, of course except for that it
isn't ready yet but no matter what we do this is gonna take another
devel cycle so what's the difference?). It's not like the default
hierarchy requires invasive changes. I'm not convinced this calls for
yet another operation mode.

Thanks.

--
tejun

2015-04-13 15:05:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On Mon, Apr 13, 2015 at 10:50:52AM -0400, Tejun Heo wrote:
> On Fri, Apr 10, 2015 at 07:41:52PM +0530, Preeti U Murthy wrote:
> > Moreover the default hierarchy keeps the allowed masks hotplug invariant
> > too. So the patch is not bringing about a fundamental change in the
> > design of cgroups.
>
> This is gonna be a new behavior anyway and I don't follow why you just
> can't use the default hierarchy (well, of course except for that it
> isn't ready yet but no matter what we do this is gonna take another
> devel cycle so what's the difference?). It's not like the default
> hierarchy requires invasive changes. I'm not convinced this calls for
> yet another operation mode.

So TJ and me don't often seem to agree on cgroup stuffs, but here we do
;-)

2015-04-15 11:12:53

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On 04/13/2015 08:20 PM, Tejun Heo wrote:
> On Fri, Apr 10, 2015 at 07:41:52PM +0530, Preeti U Murthy wrote:
>> Moreover the default hierarchy keeps the allowed masks hotplug invariant
>> too. So the patch is not bringing about a fundamental change in the
>> design of cgroups.
>
> This is gonna be a new behavior anyway and I don't follow why you just
> can't use the default hierarchy (well, of course except for that it
> isn't ready yet but no matter what we do this is gonna take another
> devel cycle so what's the difference?). It's not like the default

Is it possible today to mount a specific controller on the default
hierarchy ? Or will this capability be going in the next merge window ?
I see that there are no mount options except the __SANE__ one that the
default hierarchy takes.

We could mount the rest of the controllers in the legacy hierarchy while
mounting the cpuset controller alone in the default hierarchy and see if
this works fine.

Regards
Preeti U Murthy

> hierarchy requires invasive changes. I'm not convinced this calls for
> yet another operation mode.
>
> Thanks.
>

2015-04-15 11:41:13

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On 04/13/2015 08:13 PM, Peter Zijlstra wrote:
> On Mon, Apr 13, 2015 at 05:46:37PM +0530, Preeti U Murthy wrote:
>> On 04/13/2015 12:31 PM, Peter Zijlstra wrote:
>
>>> Would it not make sense to make that a mount option and limit the amount
>>> of semantic variants of cpusets?
>>
>> I spent some time analyzing if this would be a better option than the
>> sysfs knob and I think not for the following reasons:
>>
>> 1. Mount options tend to be generic across the controllers of a cgroup.
>> But use case addressed by this patch is specific to the cpuset controller.
>
> Surely we can get around that somehow.
>
>> 2. The behavior that this patch is trying to bring about is not a
>> drastic one to call for a mount option equivalent to the __SANE_BEHAVIOR
>> one that existed earlier. This option was used to switch the legacy
>> design to the default one.
>>
>> However this patch is not *wholly* mimicking the default hierarchy
>> behavior. The behavior when cpusets become empty is left untouched for
>> instance. The patch borrows one of the behaviors from the default
>> hierarchy only and hence just not justify the use of a mount flag.
>
> So the 'problem' I have is that you introduce a 3rd semantic for the
> cpuset thing.
>
> You also do not answer if you can live with the default hierarchy
> behaviour, only that your patch mimicks a subset of it.
>
> Why not all of it?

This was assuming that the existing software will break if we mimick the
entire design given that we were informed that it does not work well
with the default hierarchy. But I think now, that its worth finding out
why if so and switch over to the new design, atleast for cpusets.

Regards
Preeti U Murthy

2015-04-15 15:03:17

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On Wed, Apr 15, 2015 at 05:10:49PM +0530, Preeti U Murthy wrote:
> On 04/13/2015 08:13 PM, Peter Zijlstra wrote:
> > On Mon, Apr 13, 2015 at 05:46:37PM +0530, Preeti U Murthy wrote:
> >> On 04/13/2015 12:31 PM, Peter Zijlstra wrote:
> >
> >>> Would it not make sense to make that a mount option and limit the amount
> >>> of semantic variants of cpusets?
> >>
> >> I spent some time analyzing if this would be a better option than the
> >> sysfs knob and I think not for the following reasons:
> >>
> >> 1. Mount options tend to be generic across the controllers of a cgroup.
> >> But use case addressed by this patch is specific to the cpuset controller.
> >
> > Surely we can get around that somehow.
> >
> >> 2. The behavior that this patch is trying to bring about is not a
> >> drastic one to call for a mount option equivalent to the __SANE_BEHAVIOR
> >> one that existed earlier. This option was used to switch the legacy
> >> design to the default one.
> >>
> >> However this patch is not *wholly* mimicking the default hierarchy
> >> behavior. The behavior when cpusets become empty is left untouched for
> >> instance. The patch borrows one of the behaviors from the default
> >> hierarchy only and hence just not justify the use of a mount flag.
> >
> > So the 'problem' I have is that you introduce a 3rd semantic for the
> > cpuset thing.
> >
> > You also do not answer if you can live with the default hierarchy
> > behaviour, only that your patch mimicks a subset of it.
> >
> > Why not all of it?
>
> This was assuming that the existing software will break if we mimick the
> entire design given that we were informed that it does not work well
> with the default hierarchy. But I think now, that its worth finding out
> why if so and switch over to the new design, atleast for cpusets.

Peter, is the question "why can't we just use the unified hierarchy for
cpusets"?

2015-04-15 15:19:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On Wed, Apr 15, 2015 at 10:03:02AM -0500, Serge E. Hallyn wrote:

> Peter, is the question "why can't we just use the unified hierarchy for
> cpusets"?

No, the question is why can't you use the unified hierarchy cpuset
semantics -- without the actual unified hierarchy stuff.

2015-04-15 15:30:45

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On Wed, Apr 15, 2015 at 05:19:26PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 15, 2015 at 10:03:02AM -0500, Serge E. Hallyn wrote:
>
> > Peter, is the question "why can't we just use the unified hierarchy for
> > cpusets"?
>
> No, the question is why can't you use the unified hierarchy cpuset
> semantics -- without the actual unified hierarchy stuff.

Ok - that I don't know the answer to. Does sound like it would
simplify things, reduce # of sets of semantics.

2015-04-15 15:48:16

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On Wed, Apr 15, 2015 at 10:30:35AM -0500, Serge E. Hallyn wrote:
> On Wed, Apr 15, 2015 at 05:19:26PM +0200, Peter Zijlstra wrote:
> > On Wed, Apr 15, 2015 at 10:03:02AM -0500, Serge E. Hallyn wrote:
> >
> > > Peter, is the question "why can't we just use the unified hierarchy for
> > > cpusets"?
> >
> > No, the question is why can't you use the unified hierarchy cpuset
> > semantics -- without the actual unified hierarchy stuff.
>
> Ok - that I don't know the answer to. Does sound like it would
> simplify things, reduce # of sets of semantics.

What'd be the reason for not using the unified hierarchy tho? The
adaptable required isn't that big and the patchset already proposes
significant amount of behavior change. Sure it'd be a bit more hassle
but does that really justify introducing yet another operation mode?
This is why unified hierarchy is being added in the first place. One
can argue "but *I* just need this specific part changed" but allowing
combinations of all those little variations is gonna lead us to a
hellish place.

Thanks.

--
tejun

2015-04-15 16:15:45

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On Wed, Apr 15, 2015 at 11:48:02AM -0400, Tejun Heo wrote:
> On Wed, Apr 15, 2015 at 10:30:35AM -0500, Serge E. Hallyn wrote:
> > On Wed, Apr 15, 2015 at 05:19:26PM +0200, Peter Zijlstra wrote:
> > > On Wed, Apr 15, 2015 at 10:03:02AM -0500, Serge E. Hallyn wrote:
> > >
> > > > Peter, is the question "why can't we just use the unified hierarchy for
> > > > cpusets"?
> > >
> > > No, the question is why can't you use the unified hierarchy cpuset
> > > semantics -- without the actual unified hierarchy stuff.
> >
> > Ok - that I don't know the answer to. Does sound like it would
> > simplify things, reduce # of sets of semantics.
>
> What'd be the reason for not using the unified hierarchy tho? The

The reason would be because it breaks "legacy" software. So that
would only matter if Preeti needs to run such software.

A particularly pernicious example would be needing to run nested
containers, with an older (stable release) container wanting to run
containers inside itself. The host may have all-new software
capable of working with unifiied hierarchy, but the stable release
container will not. (Honestly I don't yet know what we're going to
do about that, but it's going to be great fun for the next few months)

> adaptable required isn't that big and the patchset already proposes
> significant amount of behavior change. Sure it'd be a bit more hassle
> but does that really justify introducing yet another operation mode?
> This is why unified hierarchy is being added in the first place. One
> can argue "but *I* just need this specific part changed" but allowing
> combinations of all those little variations is gonna lead us to a
> hellish place.

Agreed, these things quickly get out of hand. This patch looks
straightforward enough in itself, though and the question is, is this
a special enough case for an exception.

-serge

2015-04-15 16:18:27

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On Wed, Apr 15, 2015 at 11:15:35AM -0500, Serge E. Hallyn wrote:
> The reason would be because it breaks "legacy" software. So that
> would only matter if Preeti needs to run such software.

Sure, I get that argument but this is changing how the contorller
behaves in a major way. There are specifics which may make this
particular case more justifiable but overall the combination of
arguments is pretty weird.

Thanks.

--
tejun

2015-04-15 16:36:34

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On Wed, Apr 15, 2015 at 12:18:11PM -0400, Tejun Heo wrote:
> On Wed, Apr 15, 2015 at 11:15:35AM -0500, Serge E. Hallyn wrote:
> > The reason would be because it breaks "legacy" software. So that
> > would only matter if Preeti needs to run such software.
>
> Sure, I get that argument but this is changing how the contorller
> behaves in a major way.

It is. My main counter to that would be that it is how cpusets
should always have worked :)

> There are specifics which may make this
> particular case more justifiable but overall the combination of
> arguments is pretty weird.

And becomes harder to reason about and review/maintain. I agree
there.

>From userspace, I suppose one approach (though note it is racy) to
solving this would be to have udev rules which

. On cpu unplug, record all cgroups which were using that cpu
. on cpu plug, re-add the cpu to all recorded cgroups for that
cpu (if any), as well as to any cgroups marked (in some /etc
file) as using "all" or a percentage of all cpus.

-serge

2015-04-16 11:58:05

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On 04/15/2015 10:06 PM, Serge E. Hallyn wrote:
> On Wed, Apr 15, 2015 at 12:18:11PM -0400, Tejun Heo wrote:
>> On Wed, Apr 15, 2015 at 11:15:35AM -0500, Serge E. Hallyn wrote:
>>> The reason would be because it breaks "legacy" software. So that
>>> would only matter if Preeti needs to run such software.
>>
>> Sure, I get that argument but this is changing how the contorller
>> behaves in a major way.
>
> It is. My main counter to that would be that it is how cpusets
> should always have worked :)
>
>> There are specifics which may make this
>> particular case more justifiable but overall the combination of
>> arguments is pretty weird.
>
> And becomes harder to reason about and review/maintain. I agree
> there.
>
> From userspace, I suppose one approach (though note it is racy) to
> solving this would be to have udev rules which
>
> . On cpu unplug, record all cgroups which were using that cpu
> . on cpu plug, re-add the cpu to all recorded cgroups for that
> cpu (if any), as well as to any cgroups marked (in some /etc
> file) as using "all" or a percentage of all cpus.

Yes this is the best way to move forward. But the stacks in userspace
that manage cgroups need to each have such a daemon managing the cpusets
like is pointed above. It would have been fair to expect them to do this
except that this is one issue that userspace is expecting the kernel to
handle. The side effects of this behavior of the legacy hierarchy is
showing up in weird ways.

Another approach as Tejun pointed is to switch over to the unified
hierarchy. I can test this out (How do we mount specific controllers in
the unified hierarchy Tejun? It does not allow any mount options today),
but I don't think I can exhaustively test this out by myself across
distros. So I am not confident of this approach.

The last approach would be as Peter pointed, mimick the entire unified
hierarchy through a mount option, similar to the SANE_BEHAVIOR option
that was present earlier and was removed. Tejun ? Can we bring back that
mount option ?

Regards
Preeti U Murthy
>
> -serge
>

2015-04-20 03:29:51

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH V2] cpuset: Add knob to make allowed masks hotplug invariant on legacy hierarchy

On 2015/4/16 19:57, Preeti U Murthy wrote:
> On 04/15/2015 10:06 PM, Serge E. Hallyn wrote:
>> On Wed, Apr 15, 2015 at 12:18:11PM -0400, Tejun Heo wrote:
>>> On Wed, Apr 15, 2015 at 11:15:35AM -0500, Serge E. Hallyn wrote:
>>>> The reason would be because it breaks "legacy" software. So that
>>>> would only matter if Preeti needs to run such software.
>>>
>>> Sure, I get that argument but this is changing how the contorller
>>> behaves in a major way.
>>
>> It is. My main counter to that would be that it is how cpusets
>> should always have worked :)
>>
>>> There are specifics which may make this
>>> particular case more justifiable but overall the combination of
>>> arguments is pretty weird.
>>
>> And becomes harder to reason about and review/maintain. I agree
>> there.
>>
>> From userspace, I suppose one approach (though note it is racy) to
>> solving this would be to have udev rules which
>>
>> . On cpu unplug, record all cgroups which were using that cpu
>> . on cpu plug, re-add the cpu to all recorded cgroups for that
>> cpu (if any), as well as to any cgroups marked (in some /etc
>> file) as using "all" or a percentage of all cpus.
>
> Yes this is the best way to move forward. But the stacks in userspace
> that manage cgroups need to each have such a daemon managing the cpusets
> like is pointed above. It would have been fair to expect them to do this
> except that this is one issue that userspace is expecting the kernel to
> handle. The side effects of this behavior of the legacy hierarchy is
> showing up in weird ways.
>
> Another approach as Tejun pointed is to switch over to the unified
> hierarchy. I can test this out (How do we mount specific controllers in
> the unified hierarchy Tejun? It does not allow any mount options today),

You just can't. All controllers are bound to unified hierarchy, except those
already mounted in the old way. That said, in the unified hierarchy you're able
to use cpuset only, so other controllers can still be mounted seperated.

Here is an example:

# use blkio cgroup in legacy hierachy
mount -t cgroup -o blkio xxx old/

# now mount the unified hierarchy
mount -t cgroup -o __DEVEL__sane_behavior xxx unified/

# use cpuset only
mkdir unified/cpuset
echo +cpuset > cgroup.subtree_control

# use memory cgroup in another legacy hierarchy
mount -t cgroup -o memory xxx old2/

You may refer to Documentation/cgroups/unified-hierarchy.txt

> but I don't think I can exhaustively test this out by myself across
> distros. So I am not confident of this approach.
>
> The last approach would be as Peter pointed, mimick the entire unified
> hierarchy through a mount option, similar to the SANE_BEHAVIOR option
> that was present earlier and was removed. Tejun ? Can we bring back that
> mount option ?
>

I don't think it's an option. We want users move to unified hierarchy.