This patcheset introduces behavior changes, but only for default hierarchy
- We introduce new interfaces cpuset.effective_cpus and cpuset.effective_mems,
while cpuset.cpus and cpuset.mems will be configured masks.
- The configured masks can be changed by writing cpuset.cpus/mems only. They
won't be changed when hotplug happens.
- Users can config cpus and mems without restrictions from the parent cpuset.
effective masks will enforce the hierarchical behavior.
- Users can also config cpus and mems to have already offlined CPU/nodes.
- When a CPU/node is onlined, it will be brought back to the effective masks
if it's in the configured masks.
- We build sched domains based on effective cpumask but not configured cpumask.
v3:
- rebased against "cgroup: remove sane_behavior support on non-default hierarchies"
- addressed previous review comments
- adjusted some code, comment and changelog slightly
v2:
- fixed two bugs
- made changelogs more verbose
- added more comments
- changed cs->real_{mems,cpus}_allowed to cs->effective_{mems, cpus}
- splitted "cpuset: enable onlined cpu/node in effective masks" into 2 patches
- exported cpuset.effective_{cpus,mems} unconditionally
Li Zefan (12):
cpuset: add cs->effective_cpus and cs->effective_mems
cpuset: update cpuset->effective_{cpus,mems} at hotplug
cpuset: update cs->effective_{cpus,mems} when config changes
cpuset: inherit ancestor's masks if effective_{cpus,mems} becomes
empty
cpuset: use effective cpumask to build sched domains
cpuset: initialize top_cpuset's configured masks at mount
cpuset: apply cs->effective_{cpus,mems}
cpuset: make cs->{cpus,mems}_allowed as user-configured masks
cpuset: refactor cpuset_hotplug_update_tasks()
cpuset: enable onlined cpu/node in effective masks
cpuset: allow writing offlined masks to cpuset.cpus/mems
cpuset: export effective masks to userspace
kernel/cpuset.c | 493 ++++++++++++++++++++++++++++++++++----------------------
1 file changed, 304 insertions(+), 189 deletions(-)
--
1.8.0.2
We're going to have separate user-configured masks and effective ones.
Eventually configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction,
and these are the real masks that apply to the tasks in the cpuset.
We calculate effective mask this way:
- top cpuset's effective_mask == online_mask, otherwise
- cpuset's effective_mask == configured_mask & parent effective_mask,
if the result is empty, it inherits parent effective mask.
Those behavior changes are for default hierarchy only. For legacy
hierachy, effective_mask and configured_mask are the same, so we won't
break old interfaces.
This patch adds the effective masks to struct cpuset and initializes
them. The effective masks of the top cpuset is the same with configured
masks, and a child cpuset inherits its parent's effective masks.
This won't introduce behavior change.
v2:
- s/real_{mems,cpus}_allowed/effective_{mems,cpus}, suggested by Tejun.
- don't init effective masks in cpuset_css_online() if !cgroup_on_dfl.
Signed-off-by: Li Zefan <[email protected]>
---
kernel/cpuset.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 48 insertions(+), 11 deletions(-)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index f9d4807..ef0974c 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -76,8 +76,14 @@ struct cpuset {
struct cgroup_subsys_state css;
unsigned long flags; /* "unsigned long" so bitops work */
- cpumask_var_t cpus_allowed; /* CPUs allowed to tasks in cpuset */
- nodemask_t mems_allowed; /* Memory Nodes allowed to tasks */
+
+ /* user-configured CPUs and Memory Nodes allow to tasks */
+ cpumask_var_t cpus_allowed;
+ nodemask_t mems_allowed;
+
+ /* effective CPUs and Memory Nodes allow to tasks */
+ cpumask_var_t effective_cpus;
+ nodemask_t effective_mems;
/*
* This is old Memory Nodes tasks took on.
@@ -376,13 +382,20 @@ static struct cpuset *alloc_trial_cpuset(struct cpuset *cs)
if (!trial)
return NULL;
- if (!alloc_cpumask_var(&trial->cpus_allowed, GFP_KERNEL)) {
- kfree(trial);
- return NULL;
- }
- cpumask_copy(trial->cpus_allowed, cs->cpus_allowed);
+ if (!alloc_cpumask_var(&trial->cpus_allowed, GFP_KERNEL))
+ goto free_cs;
+ if (!alloc_cpumask_var(&trial->effective_cpus, GFP_KERNEL))
+ goto free_cpus;
+ cpumask_copy(trial->cpus_allowed, cs->cpus_allowed);
+ cpumask_copy(trial->effective_cpus, cs->effective_cpus);
return trial;
+
+free_cpus:
+ free_cpumask_var(trial->cpus_allowed);
+free_cs:
+ kfree(trial);
+ return NULL;
}
/**
@@ -391,6 +404,7 @@ static struct cpuset *alloc_trial_cpuset(struct cpuset *cs)
*/
static void free_trial_cpuset(struct cpuset *trial)
{
+ free_cpumask_var(trial->effective_cpus);
free_cpumask_var(trial->cpus_allowed);
kfree(trial);
}
@@ -1848,18 +1862,26 @@ cpuset_css_alloc(struct cgroup_subsys_state *parent_css)
cs = kzalloc(sizeof(*cs), GFP_KERNEL);
if (!cs)
return ERR_PTR(-ENOMEM);
- if (!alloc_cpumask_var(&cs->cpus_allowed, GFP_KERNEL)) {
- kfree(cs);
- return ERR_PTR(-ENOMEM);
- }
+ if (!alloc_cpumask_var(&cs->cpus_allowed, GFP_KERNEL))
+ goto free_cs;
+ if (!alloc_cpumask_var(&cs->effective_cpus, GFP_KERNEL))
+ goto free_cpus;
set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
cpumask_clear(cs->cpus_allowed);
nodes_clear(cs->mems_allowed);
+ cpumask_clear(cs->effective_cpus);
+ nodes_clear(cs->effective_mems);
fmeter_init(&cs->fmeter);
cs->relax_domain_level = -1;
return &cs->css;
+
+free_cpus:
+ free_cpumask_var(cs->cpus_allowed);
+free_cs:
+ kfree(cs);
+ return ERR_PTR(-ENOMEM);
}
static int cpuset_css_online(struct cgroup_subsys_state *css)
@@ -1882,6 +1904,13 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
cpuset_inc();
+ mutex_lock(&callback_mutex);
+ if (cgroup_on_dfl(cs->css.cgroup)) {
+ cpumask_copy(cs->effective_cpus, parent->effective_cpus);
+ cs->effective_mems = parent->effective_mems;
+ }
+ mutex_unlock(&callback_mutex);
+
if (!test_bit(CGRP_CPUSET_CLONE_CHILDREN, &css->cgroup->flags))
goto out_unlock;
@@ -1941,6 +1970,7 @@ static void cpuset_css_free(struct cgroup_subsys_state *css)
{
struct cpuset *cs = css_cs(css);
+ free_cpumask_var(cs->effective_cpus);
free_cpumask_var(cs->cpus_allowed);
kfree(cs);
}
@@ -1969,9 +1999,13 @@ int __init cpuset_init(void)
if (!alloc_cpumask_var(&top_cpuset.cpus_allowed, GFP_KERNEL))
BUG();
+ if (!alloc_cpumask_var(&top_cpuset.effective_cpus, GFP_KERNEL))
+ BUG();
cpumask_setall(top_cpuset.cpus_allowed);
nodes_setall(top_cpuset.mems_allowed);
+ cpumask_setall(top_cpuset.effective_cpus);
+ nodes_setall(top_cpuset.effective_mems);
fmeter_init(&top_cpuset.fmeter);
set_bit(CS_SCHED_LOAD_BALANCE, &top_cpuset.flags);
@@ -2207,6 +2241,9 @@ void __init cpuset_init_smp(void)
top_cpuset.mems_allowed = node_states[N_MEMORY];
top_cpuset.old_mems_allowed = top_cpuset.mems_allowed;
+ cpumask_copy(top_cpuset.effective_cpus, cpu_active_mask);
+ top_cpuset.effective_mems = node_states[N_MEMORY];
+
register_hotmemory_notifier(&cpuset_track_online_nodes_nb);
}
--
1.8.0.2
We're going to have separate user-configured masks and effective ones.
Eventually configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction,
and these are the real masks that apply to the tasks in the cpuset.
We calculate effective mask this way:
- top cpuset's effective_mask == online_mask, otherwise
- cpuset's effective_mask == configured_mask & parent effective_mask,
if the result is empty, it inherits parent effective mask.
Those behavior changes are for default hierarchy only. For legacy
hierarchy, effective_mask and configured_mask are the same, so we won't
break old interfaces.
To make cs->effective_{cpus,mems} to be effective masks, we need to
- update the effective masks at hotplug
- update the effective masks at config change
- take on ancestor's mask when the effective mask is empty
The first item is done here.
This won't introduce behavior change.
Signed-off-by: Li Zefan <[email protected]>
---
kernel/cpuset.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index ef0974c..94f651d 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2082,6 +2082,7 @@ retry:
mutex_lock(&callback_mutex);
cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, &off_cpus);
+ cpumask_andnot(cs->effective_cpus, cs->effective_cpus, &off_cpus);
mutex_unlock(&callback_mutex);
/*
@@ -2096,6 +2097,7 @@ retry:
mutex_lock(&callback_mutex);
nodes_andnot(cs->mems_allowed, cs->mems_allowed, off_mems);
+ nodes_andnot(cs->effective_mems, cs->effective_mems, off_mems);
mutex_unlock(&callback_mutex);
/*
@@ -2159,6 +2161,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
if (cpus_updated) {
mutex_lock(&callback_mutex);
cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
+ cpumask_copy(top_cpuset.effective_cpus, &new_cpus);
mutex_unlock(&callback_mutex);
/* we don't mess with cpumasks of tasks in top_cpuset */
}
@@ -2167,6 +2170,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
if (mems_updated) {
mutex_lock(&callback_mutex);
top_cpuset.mems_allowed = new_mems;
+ top_cpuset.effective_mems = new_mems;
mutex_unlock(&callback_mutex);
update_tasks_nodemask(&top_cpuset);
}
--
1.8.0.2
We're going to have separate user-configured masks and effective ones.
Eventually configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction,
and these are the real masks that apply to the tasks in the cpuset.
We calculate effective mask this way:
- top cpuset's effective_mask == online_mask, otherwise
- cpuset's effective_mask == configured_mask & parent effective_mask,
if the result is empty, it inherits parent effective mask.
Those behavior changes are for default hierarchy only. For legacy
hierarchy, effective_mask and configured_mask are the same, so we won't
break old interfaces.
To make cs->effective_{cpus,mems} to be effective masks, we need to
- update the effective masks at hotplug
- update the effective masks at config change
- take on ancestor's mask when the effective mask is empty
The second item is done here. We don't need to treat root_cs specially
in update_cpumasks_hier().
This won't introduce behavior change.
v3:
- add a WARN_ON() to check if effective masks are the same with configured
masks on legacy hierarchy.
- pass trialcs->cpus_allowed to update_cpumasks_hier() and add a comment for
it. Similar change for update_nodemasks_hier(). Suggested by Tejun.
v2:
- revise the comment in update_{cpu,node}masks_hier(), suggested by Tejun.
- fix to use @cp instead of @cs in these two functions.
Signed-off-by: Li Zefan <[email protected]>
---
kernel/cpuset.c | 88 +++++++++++++++++++++++++++++++++++----------------------
1 file changed, 54 insertions(+), 34 deletions(-)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 94f651d..da766c3 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -855,36 +855,45 @@ static void update_tasks_cpumask(struct cpuset *cs)
}
/*
- * update_tasks_cpumask_hier - Update the cpumasks of tasks in the hierarchy.
- * @root_cs: the root cpuset of the hierarchy
- * @update_root: update root cpuset or not?
+ * update_cpumasks_hier - Update effective cpumasks and tasks in the subtree
+ * @cs: the cpuset to consider
+ * @new_cpus: temp variable for calculating new effective_cpus
+ *
+ * When congifured cpumask is changed, the effective cpumasks of this cpuset
+ * and all its descendants need to be updated.
*
- * This will update cpumasks of tasks in @root_cs and all other empty cpusets
- * which take on cpumask of @root_cs.
+ * On legacy hierachy, effective_cpus will be the same with cpu_allowed.
*
* Called with cpuset_mutex held
*/
-static void update_tasks_cpumask_hier(struct cpuset *root_cs, bool update_root)
+static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
{
struct cpuset *cp;
struct cgroup_subsys_state *pos_css;
rcu_read_lock();
- cpuset_for_each_descendant_pre(cp, pos_css, root_cs) {
- if (cp == root_cs) {
- if (!update_root)
- continue;
- } else {
- /* skip the whole subtree if @cp have some CPU */
- if (!cpumask_empty(cp->cpus_allowed)) {
- pos_css = css_rightmost_descendant(pos_css);
- continue;
- }
+ cpuset_for_each_descendant_pre(cp, pos_css, cs) {
+ struct cpuset *parent = parent_cs(cp);
+
+ cpumask_and(new_cpus, cp->cpus_allowed, parent->effective_cpus);
+
+ /* Skip the whole subtree if the cpumask remains the same. */
+ if (cpumask_equal(new_cpus, cp->effective_cpus)) {
+ pos_css = css_rightmost_descendant(pos_css);
+ continue;
}
+
if (!css_tryget_online(&cp->css))
continue;
rcu_read_unlock();
+ mutex_lock(&callback_mutex);
+ cpumask_copy(cp->effective_cpus, new_cpus);
+ mutex_unlock(&callback_mutex);
+
+ WARN_ON(!cgroup_on_dfl(cp->css.cgroup) &&
+ !cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
+
update_tasks_cpumask(cp);
rcu_read_lock();
@@ -940,7 +949,8 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
mutex_unlock(&callback_mutex);
- update_tasks_cpumask_hier(cs, true);
+ /* use trialcs->cpus_allowed as a temp variable */
+ update_cpumasks_hier(cs, trialcs->cpus_allowed);
if (is_load_balanced)
rebuild_sched_domains_locked();
@@ -1091,36 +1101,45 @@ static void update_tasks_nodemask(struct cpuset *cs)
}
/*
- * update_tasks_nodemask_hier - Update the nodemasks of tasks in the hierarchy.
- * @cs: the root cpuset of the hierarchy
- * @update_root: update the root cpuset or not?
+ * update_nodemasks_hier - Update effective nodemasks and tasks in the subtree
+ * @cs: the cpuset to consider
+ * @new_mems: a temp variable for calculating new effective_mems
+ *
+ * When configured nodemask is changed, the effective nodemasks of this cpuset
+ * and all its descendants need to be updated.
*
- * This will update nodemasks of tasks in @root_cs and all other empty cpusets
- * which take on nodemask of @root_cs.
+ * On legacy hiearchy, effective_mems will be the same with mems_allowed.
*
* Called with cpuset_mutex held
*/
-static void update_tasks_nodemask_hier(struct cpuset *root_cs, bool update_root)
+static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems)
{
struct cpuset *cp;
struct cgroup_subsys_state *pos_css;
rcu_read_lock();
- cpuset_for_each_descendant_pre(cp, pos_css, root_cs) {
- if (cp == root_cs) {
- if (!update_root)
- continue;
- } else {
- /* skip the whole subtree if @cp have some CPU */
- if (!nodes_empty(cp->mems_allowed)) {
- pos_css = css_rightmost_descendant(pos_css);
- continue;
- }
+ cpuset_for_each_descendant_pre(cp, pos_css, cs) {
+ struct cpuset *parent = parent_cs(cp);
+
+ nodes_and(*new_mems, cp->mems_allowed, parent->effective_mems);
+
+ /* Skip the whole subtree if the nodemask remains the same. */
+ if (nodes_equal(*new_mems, cp->effective_mems)) {
+ pos_css = css_rightmost_descendant(pos_css);
+ continue;
}
+
if (!css_tryget_online(&cp->css))
continue;
rcu_read_unlock();
+ mutex_lock(&callback_mutex);
+ cp->effective_mems = *new_mems;
+ mutex_unlock(&callback_mutex);
+
+ WARN_ON(!cgroup_on_dfl(cp->css.cgroup) &&
+ nodes_equal(cp->mems_allowed, cp->effective_mems));
+
update_tasks_nodemask(cp);
rcu_read_lock();
@@ -1188,7 +1207,8 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
cs->mems_allowed = trialcs->mems_allowed;
mutex_unlock(&callback_mutex);
- update_tasks_nodemask_hier(cs, true);
+ /* use trialcs->mems_allowed as a temp variable */
+ update_nodemasks_hier(cs, &cs->mems_allowed);
done:
return retval;
}
--
1.8.0.2
We're going to have separate user-configured masks and effective ones.
Eventually configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction,
and these are the real masks that apply to the tasks in the cpuset.
We calculate effective mask this way:
- top cpuset's effective_mask == online_mask, otherwise
- cpuset's effective_mask == configured_mask & parent effective_mask,
if the result is empty, it inherits parent effective mask.
Those behavior changes are for default hierarchy only. For legacy
hierarchy, effective_mask and configured_mask are the same, so we won't
break old interfaces.
To make cs->effective_{cpus,mems} to be effective masks, we need to
- update the effective masks at hotplug
- update the effective masks at config change
- take on ancestor's mask when the effective mask is empty
The last item is done here.
This won't introduce behavior change.
Signed-off-by: Li Zefan <[email protected]>
---
kernel/cpuset.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index da766c3..f834002 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -877,6 +877,13 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
cpumask_and(new_cpus, cp->cpus_allowed, parent->effective_cpus);
+ /*
+ * If it becomes empty, inherit the effective mask of the
+ * parent, which is guaranteed to have some CPUs.
+ */
+ if (cpumask_empty(new_cpus))
+ cpumask_copy(new_cpus, parent->effective_cpus);
+
/* Skip the whole subtree if the cpumask remains the same. */
if (cpumask_equal(new_cpus, cp->effective_cpus)) {
pos_css = css_rightmost_descendant(pos_css);
@@ -1123,6 +1130,13 @@ static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems)
nodes_and(*new_mems, cp->mems_allowed, parent->effective_mems);
+ /*
+ * If it becomes empty, inherit the effective mask of the
+ * parent, which is guaranteed to have some MEMs.
+ */
+ if (nodes_empty(*new_mems))
+ *new_mems = parent->effective_mems;
+
/* Skip the whole subtree if the nodemask remains the same. */
if (nodes_equal(*new_mems, cp->effective_mems)) {
pos_css = css_rightmost_descendant(pos_css);
@@ -2102,7 +2116,11 @@ retry:
mutex_lock(&callback_mutex);
cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, &off_cpus);
+
+ /* Inherit the effective mask of the parent, if it becomes empty. */
cpumask_andnot(cs->effective_cpus, cs->effective_cpus, &off_cpus);
+ if (on_dfl && cpumask_empty(cs->effective_cpus))
+ cpumask_copy(cs->effective_cpus, parent_cs(cs)->effective_cpus);
mutex_unlock(&callback_mutex);
/*
@@ -2117,7 +2135,11 @@ retry:
mutex_lock(&callback_mutex);
nodes_andnot(cs->mems_allowed, cs->mems_allowed, off_mems);
+
+ /* Inherit the effective mask of the parent, if it becomes empty */
nodes_andnot(cs->effective_mems, cs->effective_mems, off_mems);
+ if (on_dfl && nodes_empty(cs->effective_mems))
+ cs->effective_mems = parent_cs(cs)->effective_mems;
mutex_unlock(&callback_mutex);
/*
--
1.8.0.2
We're going to have separate user-configured masks and effective ones.
Eventually configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction,
and these are the real masks that apply to the tasks in the cpuset.
We calculate effective mask this way:
- top cpuset's effective_mask == online_mask, otherwise
- cpuset's effective_mask == configured_mask & parent effective_mask,
if the result is empty, it inherits parent effective mask.
Those behavior changes are for default hierarchy only. For legacy
hierarchy, effective_mask and configured_mask are the same, so we won't
break old interfaces.
We should partition sched domains according to effective_cpus, which
is the real cpulist that takes effects on tasks in the cpuset.
This won't introduce behavior change.
v2:
- Add a comment for the call of rebuild_sched_domains(), suggested
by Tejun.
Signed-off-by: Li Zefan <[email protected]>
---
kernel/cpuset.c | 28 +++++++++++++++++-----------
1 file changed, 17 insertions(+), 11 deletions(-)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index f834002..60577cc 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -494,11 +494,11 @@ out:
#ifdef CONFIG_SMP
/*
* Helper routine for generate_sched_domains().
- * Do cpusets a, b have overlapping cpus_allowed masks?
+ * Do cpusets a, b have overlapping effective cpus_allowed masks?
*/
static int cpusets_overlap(struct cpuset *a, struct cpuset *b)
{
- return cpumask_intersects(a->cpus_allowed, b->cpus_allowed);
+ return cpumask_intersects(a->effective_cpus, b->effective_cpus);
}
static void
@@ -615,7 +615,7 @@ static int generate_sched_domains(cpumask_var_t **domains,
*dattr = SD_ATTR_INIT;
update_domain_attr_tree(dattr, &top_cpuset);
}
- cpumask_copy(doms[0], top_cpuset.cpus_allowed);
+ cpumask_copy(doms[0], top_cpuset.effective_cpus);
goto done;
}
@@ -719,7 +719,7 @@ restart:
struct cpuset *b = csa[j];
if (apn == b->pn) {
- cpumask_or(dp, dp, b->cpus_allowed);
+ cpumask_or(dp, dp, b->effective_cpus);
if (dattr)
update_domain_attr_tree(dattr + nslot, b);
@@ -771,7 +771,7 @@ static void rebuild_sched_domains_locked(void)
* passing doms with offlined cpu to partition_sched_domains().
* Anyways, hotplug work item will rebuild sched domains.
*/
- if (!cpumask_equal(top_cpuset.cpus_allowed, cpu_active_mask))
+ if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
goto out;
/* Generate domain masks and attrs */
@@ -870,6 +870,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
{
struct cpuset *cp;
struct cgroup_subsys_state *pos_css;
+ bool need_rebuild_sched_domains = false;
rcu_read_lock();
cpuset_for_each_descendant_pre(cp, pos_css, cs) {
@@ -903,10 +904,21 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
update_tasks_cpumask(cp);
+ /*
+ * If the effective cpumask of any non-empty cpuset is changed,
+ * we need to rebuild sched domains.
+ */
+ if (!cpumask_empty(cp->cpus_allowed) &&
+ is_sched_load_balance(cp))
+ need_rebuild_sched_domains = true;
+
rcu_read_lock();
css_put(&cp->css);
}
rcu_read_unlock();
+
+ if (need_rebuild_sched_domains)
+ rebuild_sched_domains_locked();
}
/**
@@ -919,7 +931,6 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
const char *buf)
{
int retval;
- int is_load_balanced;
/* top_cpuset.cpus_allowed tracks cpu_online_mask; it's read-only */
if (cs == &top_cpuset)
@@ -950,17 +961,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
if (retval < 0)
return retval;
- is_load_balanced = is_sched_load_balance(trialcs);
-
mutex_lock(&callback_mutex);
cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
mutex_unlock(&callback_mutex);
/* use trialcs->cpus_allowed as a temp variable */
update_cpumasks_hier(cs, trialcs->cpus_allowed);
-
- if (is_load_balanced)
- rebuild_sched_domains_locked();
return 0;
}
--
1.8.0.2
We now have to support different behaviors for default hierachy and
legacy hiearchy, top_cpuset's configured masks need to be initialized
accordingly.
Suppose we've offlined cpu1.
On default hierarchy:
# mount -t cgroup -o __DEVEL__sane_behavior xxx /cpuset
# cat /cpuset/cpuset.cpus
0-15
On legacy hierarchy:
# mount -t cgroup xxx /cpuset
# cat /cpuset/cpuset.cpus
0,2-15
Signed-off-by: Li Zefan <[email protected]>
---
kernel/cpuset.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 60577cc..e4c31e6 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2015,16 +2015,35 @@ static void cpuset_css_free(struct cgroup_subsys_state *css)
kfree(cs);
}
+static void cpuset_bind(struct cgroup_subsys_state *root_css)
+{
+ mutex_lock(&cpuset_mutex);
+ mutex_lock(&callback_mutex);
+
+ if (cgroup_on_dfl(root_css->cgroup)) {
+ cpumask_copy(top_cpuset.cpus_allowed, cpu_possible_mask);
+ top_cpuset.mems_allowed = node_possible_map;
+ } else {
+ cpumask_copy(top_cpuset.cpus_allowed,
+ top_cpuset.effective_cpus);
+ top_cpuset.mems_allowed = top_cpuset.effective_mems;
+ }
+
+ mutex_unlock(&callback_mutex);
+ mutex_unlock(&cpuset_mutex);
+}
+
struct cgroup_subsys cpuset_cgrp_subsys = {
- .css_alloc = cpuset_css_alloc,
- .css_online = cpuset_css_online,
- .css_offline = cpuset_css_offline,
- .css_free = cpuset_css_free,
- .can_attach = cpuset_can_attach,
- .cancel_attach = cpuset_cancel_attach,
- .attach = cpuset_attach,
- .base_cftypes = files,
- .early_init = 1,
+ .css_alloc = cpuset_css_alloc,
+ .css_online = cpuset_css_online,
+ .css_offline = cpuset_css_offline,
+ .css_free = cpuset_css_free,
+ .can_attach = cpuset_can_attach,
+ .cancel_attach = cpuset_cancel_attach,
+ .attach = cpuset_attach,
+ .bind = cpuset_bind,
+ .base_cftypes = files,
+ .early_init = 1,
};
/**
--
1.8.0.2
Now we can use cs->effective_{cpus,mems} as effective masks. It's
used whenever:
- we update tasks' cpus_allowed/mems_allowed,
- we want to retrieve tasks_cs(tsk)'s cpus_allowed/mems_allowed.
They actually replace effective_{cpu,node}mask_cpuset().
effective_mask == configured_mask & parent effective_mask except when
the reault is empty, in which case it inherits parent effective_mask.
The result equals the mask computed from effective_{cpu,node}mask_cpuset().
This won't affect the original legacy hierarchy, because in this case we
make sure the effective masks are always the same with user-configured
masks.
Signed-off-by: Li Zefan <[email protected]>
---
kernel/cpuset.c | 83 ++++++++++-----------------------------------------------
1 file changed, 14 insertions(+), 69 deletions(-)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index e4c31e6..820870a 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -313,9 +313,9 @@ static struct file_system_type cpuset_fs_type = {
*/
static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
{
- while (!cpumask_intersects(cs->cpus_allowed, cpu_online_mask))
+ while (!cpumask_intersects(cs->effective_cpus, cpu_online_mask))
cs = parent_cs(cs);
- cpumask_and(pmask, cs->cpus_allowed, cpu_online_mask);
+ cpumask_and(pmask, cs->effective_cpus, cpu_online_mask);
}
/*
@@ -331,9 +331,9 @@ static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
*/
static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask)
{
- while (!nodes_intersects(cs->mems_allowed, node_states[N_MEMORY]))
+ while (!nodes_intersects(cs->effective_mems, node_states[N_MEMORY]))
cs = parent_cs(cs);
- nodes_and(*pmask, cs->mems_allowed, node_states[N_MEMORY]);
+ nodes_and(*pmask, cs->effective_mems, node_states[N_MEMORY]);
}
/*
@@ -795,45 +795,6 @@ void rebuild_sched_domains(void)
mutex_unlock(&cpuset_mutex);
}
-/*
- * effective_cpumask_cpuset - return nearest ancestor with non-empty cpus
- * @cs: the cpuset in interest
- *
- * A cpuset's effective cpumask is the cpumask of the nearest ancestor
- * with non-empty cpus. We use effective cpumask whenever:
- * - we update tasks' cpus_allowed. (they take on the ancestor's cpumask
- * if the cpuset they reside in has no cpus)
- * - we want to retrieve task_cs(tsk)'s cpus_allowed.
- *
- * Called with cpuset_mutex held. cpuset_cpus_allowed_fallback() is an
- * exception. See comments there.
- */
-static struct cpuset *effective_cpumask_cpuset(struct cpuset *cs)
-{
- while (cpumask_empty(cs->cpus_allowed))
- cs = parent_cs(cs);
- return cs;
-}
-
-/*
- * effective_nodemask_cpuset - return nearest ancestor with non-empty mems
- * @cs: the cpuset in interest
- *
- * A cpuset's effective nodemask is the nodemask of the nearest ancestor
- * with non-empty memss. We use effective nodemask whenever:
- * - we update tasks' mems_allowed. (they take on the ancestor's nodemask
- * if the cpuset they reside in has no mems)
- * - we want to retrieve task_cs(tsk)'s mems_allowed.
- *
- * Called with cpuset_mutex held.
- */
-static struct cpuset *effective_nodemask_cpuset(struct cpuset *cs)
-{
- while (nodes_empty(cs->mems_allowed))
- cs = parent_cs(cs);
- return cs;
-}
-
/**
* update_tasks_cpumask - Update the cpumasks of tasks in the cpuset.
* @cs: the cpuset in which each task's cpus_allowed mask needs to be changed
@@ -844,13 +805,12 @@ static struct cpuset *effective_nodemask_cpuset(struct cpuset *cs)
*/
static void update_tasks_cpumask(struct cpuset *cs)
{
- struct cpuset *cpus_cs = effective_cpumask_cpuset(cs);
struct css_task_iter it;
struct task_struct *task;
css_task_iter_start(&cs->css, &it);
while ((task = css_task_iter_next(&it)))
- set_cpus_allowed_ptr(task, cpus_cs->cpus_allowed);
+ set_cpus_allowed_ptr(task, cs->effective_cpus);
css_task_iter_end(&it);
}
@@ -988,15 +948,13 @@ static void cpuset_migrate_mm(struct mm_struct *mm, const nodemask_t *from,
const nodemask_t *to)
{
struct task_struct *tsk = current;
- struct cpuset *mems_cs;
tsk->mems_allowed = *to;
do_migrate_pages(mm, from, to, MPOL_MF_MOVE_ALL);
rcu_read_lock();
- mems_cs = effective_nodemask_cpuset(task_cs(tsk));
- guarantee_online_mems(mems_cs, &tsk->mems_allowed);
+ guarantee_online_mems(task_cs(tsk), &tsk->mems_allowed);
rcu_read_unlock();
}
@@ -1065,13 +1023,12 @@ static void *cpuset_being_rebound;
static void update_tasks_nodemask(struct cpuset *cs)
{
static nodemask_t newmems; /* protected by cpuset_mutex */
- struct cpuset *mems_cs = effective_nodemask_cpuset(cs);
struct css_task_iter it;
struct task_struct *task;
cpuset_being_rebound = cs; /* causes mpol_dup() rebind */
- guarantee_online_mems(mems_cs, &newmems);
+ guarantee_online_mems(cs, &newmems);
/*
* The mpol_rebind_mm() call takes mmap_sem, which we couldn't
@@ -1497,8 +1454,6 @@ static void cpuset_attach(struct cgroup_subsys_state *css,
struct task_struct *leader = cgroup_taskset_first(tset);
struct cpuset *cs = css_cs(css);
struct cpuset *oldcs = cpuset_attach_old_cs;
- struct cpuset *cpus_cs = effective_cpumask_cpuset(cs);
- struct cpuset *mems_cs = effective_nodemask_cpuset(cs);
mutex_lock(&cpuset_mutex);
@@ -1506,9 +1461,9 @@ static void cpuset_attach(struct cgroup_subsys_state *css,
if (cs == &top_cpuset)
cpumask_copy(cpus_attach, cpu_possible_mask);
else
- guarantee_online_cpus(cpus_cs, cpus_attach);
+ guarantee_online_cpus(cs, cpus_attach);
- guarantee_online_mems(mems_cs, &cpuset_attach_nodemask_to);
+ guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
cgroup_taskset_for_each(task, tset) {
/*
@@ -1525,11 +1480,9 @@ static void cpuset_attach(struct cgroup_subsys_state *css,
* Change mm, possibly for multiple threads in a threadgroup. This is
* expensive and may sleep.
*/
- cpuset_attach_nodemask_to = cs->mems_allowed;
+ cpuset_attach_nodemask_to = cs->effective_mems;
mm = get_task_mm(leader);
if (mm) {
- struct cpuset *mems_oldcs = effective_nodemask_cpuset(oldcs);
-
mpol_rebind_mm(mm, &cpuset_attach_nodemask_to);
/*
@@ -1540,7 +1493,7 @@ static void cpuset_attach(struct cgroup_subsys_state *css,
* mm from.
*/
if (is_memory_migrate(cs)) {
- cpuset_migrate_mm(mm, &mems_oldcs->old_mems_allowed,
+ cpuset_migrate_mm(mm, &oldcs->old_mems_allowed,
&cpuset_attach_nodemask_to);
}
mmput(mm);
@@ -2331,23 +2284,17 @@ void __init cpuset_init_smp(void)
void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
{
- struct cpuset *cpus_cs;
-
mutex_lock(&callback_mutex);
rcu_read_lock();
- cpus_cs = effective_cpumask_cpuset(task_cs(tsk));
- guarantee_online_cpus(cpus_cs, pmask);
+ guarantee_online_cpus(task_cs(tsk), pmask);
rcu_read_unlock();
mutex_unlock(&callback_mutex);
}
void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
{
- struct cpuset *cpus_cs;
-
rcu_read_lock();
- cpus_cs = effective_cpumask_cpuset(task_cs(tsk));
- do_set_cpus_allowed(tsk, cpus_cs->cpus_allowed);
+ do_set_cpus_allowed(tsk, task_cs(tsk)->effective_cpus);
rcu_read_unlock();
/*
@@ -2386,13 +2333,11 @@ void cpuset_init_current_mems_allowed(void)
nodemask_t cpuset_mems_allowed(struct task_struct *tsk)
{
- struct cpuset *mems_cs;
nodemask_t mask;
mutex_lock(&callback_mutex);
rcu_read_lock();
- mems_cs = effective_nodemask_cpuset(task_cs(tsk));
- guarantee_online_mems(mems_cs, &mask);
+ guarantee_online_mems(task_cs(tsk), &mask);
rcu_read_unlock();
mutex_unlock(&callback_mutex);
--
1.8.0.2
Now we've used effective cpumasks to enforce hierarchical manner,
we can use cs->{cpus,mems}_allowed as configured masks.
Configured masks can be changed by writing cpuset.cpus and cpuset.mems
only. The new behaviors are:
- They won't be changed by hotplug anymore.
- They won't be limited by its parent's masks.
This ia a behavior change, but won't take effect unless mount with
sane_behavior.
v2:
- Add comments to explain the differences between configured masks and
effective masks.
Signed-off-by: Li Zefan <[email protected]>
---
kernel/cpuset.c | 35 +++++++++++++++++++++++++++++------
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 820870a..4b409d2 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -77,6 +77,26 @@ struct cpuset {
unsigned long flags; /* "unsigned long" so bitops work */
+ /*
+ * On default hierarchy:
+ *
+ * The user-configured masks can only be changed by writing to
+ * cpuset.cpus and cpuset.mems, and won't be limited by the
+ * parent masks.
+ *
+ * The effective masks is the real masks that apply to the tasks
+ * in the cpuset. They may be changed if the configured masks are
+ * changed or hotplug happens.
+ *
+ * effective_mask == configured_mask & parent's effective_mask,
+ * and if it ends up empty, it will inherit the parent's mask.
+ *
+ *
+ * On legacy hierachy:
+ *
+ * The user-configured masks are always the same with effective masks.
+ */
+
/* user-configured CPUs and Memory Nodes allow to tasks */
cpumask_var_t cpus_allowed;
nodemask_t mems_allowed;
@@ -450,9 +470,9 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
par = parent_cs(cur);
- /* We must be a subset of our parent cpuset */
+ /* On legacy hiearchy, we must be a subset of our parent cpuset. */
ret = -EACCES;
- if (!is_cpuset_subset(trial, par))
+ if (!cgroup_on_dfl(cur->css.cgroup) && !is_cpuset_subset(trial, par))
goto out;
/*
@@ -2167,6 +2187,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
static cpumask_t new_cpus;
static nodemask_t new_mems;
bool cpus_updated, mems_updated;
+ bool on_dfl = cgroup_on_dfl(top_cpuset.css.cgroup);
mutex_lock(&cpuset_mutex);
@@ -2174,13 +2195,14 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
cpumask_copy(&new_cpus, cpu_active_mask);
new_mems = node_states[N_MEMORY];
- cpus_updated = !cpumask_equal(top_cpuset.cpus_allowed, &new_cpus);
- mems_updated = !nodes_equal(top_cpuset.mems_allowed, new_mems);
+ cpus_updated = !cpumask_equal(top_cpuset.effective_cpus, &new_cpus);
+ mems_updated = !nodes_equal(top_cpuset.effective_mems, new_mems);
/* synchronize cpus_allowed to cpu_active_mask */
if (cpus_updated) {
mutex_lock(&callback_mutex);
- cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
+ if (!on_dfl)
+ cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
cpumask_copy(top_cpuset.effective_cpus, &new_cpus);
mutex_unlock(&callback_mutex);
/* we don't mess with cpumasks of tasks in top_cpuset */
@@ -2189,7 +2211,8 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
/* synchronize mems_allowed to N_MEMORY */
if (mems_updated) {
mutex_lock(&callback_mutex);
- top_cpuset.mems_allowed = new_mems;
+ if (!on_dfl)
+ top_cpuset.mems_allowed = new_mems;
top_cpuset.effective_mems = new_mems;
mutex_unlock(&callback_mutex);
update_tasks_nodemask(&top_cpuset);
--
1.8.0.2
Firstly offline cpu1:
# echo 0-1 > cpuset.cpus
# echo 0 > /sys/devices/system/cpu/cpu1/online
# cat cpuset.cpus
0-1
# cat cpuset.effective_cpus
0
Then online it:
# echo 1 > /sys/devices/system/cpu/cpu1/online
# cat cpuset.cpus
0-1
# cat cpuset.effective_cpus
0-1
And cpuset will bring it back to the effective mask.
The implementation is quite straightforward. Instead of calculating the
offlined cpus/mems and do updates, we just set the new effective_mask
to online_mask & congifured_mask.
This is a behavior change for default hierarchy, so legacy hierarchy
won't be affected.
v2:
- make refactoring of cpuset_hotplug_update_tasks() as seperate patch,
suggested by Tejun.
- make hotplug_update_tasks_insane() use @new_cpus and @new_mems as
hotplug_update_tasks_sane() does.
Signed-off-by: Li Zefan <[email protected]>
---
kernel/cpuset.c | 65 ++++++++++++++++++++++++++++++++-------------------------
1 file changed, 36 insertions(+), 29 deletions(-)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 41822e2..c47cb94 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2080,26 +2080,27 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
}
}
-static void hotplug_update_tasks_legacy(struct cpuset *cs,
- struct cpumask *off_cpus,
- nodemask_t *off_mems)
+static void
+hotplug_update_tasks_legacy(struct cpuset *cs,
+ struct cpumask *new_cpus, nodemask_t *new_mems,
+ bool cpus_updated, bool mems_updated)
{
bool is_empty;
mutex_lock(&callback_mutex);
- cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, off_cpus);
- cpumask_andnot(cs->effective_cpus, cs->effective_cpus, off_cpus);
- nodes_andnot(cs->mems_allowed, cs->mems_allowed, *off_mems);
- nodes_andnot(cs->effective_mems, cs->effective_mems, *off_mems);
+ cpumask_copy(cs->cpus_allowed, new_cpus);
+ cpumask_copy(cs->effective_cpus, new_cpus);
+ cs->mems_allowed = *new_mems;
+ cs->effective_mems = *new_mems;
mutex_unlock(&callback_mutex);
/*
* Don't call update_tasks_cpumask() if the cpuset becomes empty,
* as the tasks will be migratecd to an ancestor.
*/
- if (!cpumask_empty(off_cpus) && !cpumask_empty(cs->cpus_allowed))
+ if (cpus_updated && !cpumask_empty(cs->cpus_allowed))
update_tasks_cpumask(cs);
- if (!nodes_empty(*off_mems) && !nodes_empty(cs->mems_allowed))
+ if (mems_updated && !nodes_empty(cs->mems_allowed))
update_tasks_nodemask(cs);
is_empty = cpumask_empty(cs->cpus_allowed) ||
@@ -2118,24 +2119,24 @@ static void hotplug_update_tasks_legacy(struct cpuset *cs,
mutex_lock(&cpuset_mutex);
}
-static void hotplug_update_tasks(struct cpuset *cs,
- struct cpumask *off_cpus,
- nodemask_t *off_mems)
+static void
+hotplug_update_tasks(struct cpuset *cs,
+ struct cpumask *new_cpus, nodemask_t *new_mems,
+ bool cpus_updated, bool mems_updated)
{
+ if (cpumask_empty(new_cpus))
+ cpumask_copy(new_cpus, parent_cs(cs)->effective_cpus);
+ if (nodes_empty(*new_mems))
+ *new_mems = parent_cs(cs)->effective_mems;
+
mutex_lock(&callback_mutex);
- cpumask_andnot(cs->effective_cpus, cs->effective_cpus, off_cpus);
- if (cpumask_empty(cs->effective_cpus))
- cpumask_copy(cs->effective_cpus,
- parent_cs(cs)->effective_cpus);
-
- nodes_andnot(cs->effective_mems, cs->effective_mems, *off_mems);
- if (nodes_empty(cs->effective_mems))
- cs->effective_mems = parent_cs(cs)->effective_mems;
+ cpumask_copy(cs->effective_cpus, new_cpus);
+ cs->effective_mems = *new_mems;
mutex_unlock(&callback_mutex);
- if (!cpumask_empty(off_cpus))
+ if (cpus_updated)
update_tasks_cpumask(cs);
- if (!nodes_empty(*off_mems))
+ if (mems_updated)
update_tasks_nodemask(cs);
}
@@ -2149,8 +2150,10 @@ static void hotplug_update_tasks(struct cpuset *cs,
*/
static void cpuset_hotplug_update_tasks(struct cpuset *cs)
{
- static cpumask_t off_cpus;
- static nodemask_t off_mems;
+ static cpumask_t new_cpus;
+ static nodemask_t new_mems;
+ bool cpus_updated;
+ bool mems_updated;
retry:
wait_event(cpuset_attach_wq, cs->attach_in_progress == 0);
@@ -2165,14 +2168,18 @@ retry:
goto retry;
}
- cpumask_andnot(&off_cpus, cs->effective_cpus,
- top_cpuset.effective_cpus);
- nodes_andnot(off_mems, cs->effective_mems, top_cpuset.effective_mems);
+ cpumask_and(&new_cpus, cs->cpus_allowed, parent_cs(cs)->effective_cpus);
+ nodes_and(new_mems, cs->mems_allowed, parent_cs(cs)->effective_mems);
+
+ cpus_updated = !cpumask_equal(&new_cpus, cs->effective_cpus);
+ mems_updated = !nodes_equal(new_mems, cs->effective_mems);
if (cgroup_on_dfl(cs->css.cgroup))
- hotplug_update_tasks(cs, &off_cpus, &off_mems);
+ hotplug_update_tasks(cs, &new_cpus, &new_mems,
+ cpus_updated, mems_updated);
else
- hotplug_update_tasks_legacy(cs, &off_cpus, &off_mems);
+ hotplug_update_tasks_legacy(cs, &new_cpus, &new_mems,
+ cpus_updated, mems_updated);
mutex_unlock(&cpuset_mutex);
}
--
1.8.0.2
As the configured masks won't be limited by its parent, and the top
cpuset's masks won't change when hotplug happens, it's natural to
allow writing offlined masks to the configured masks.
If on default hierarchy:
# echo 0 > /sys/devices/system/cpu/cpu1/online
# mkdir /cpuset/sub
# echo 1 > /cpuset/sub/cpuset.cpus
# cat /cpuset/sub/cpuset.cpus
1
If on legacy hierarchy:
# echo 0 > /sys/devices/system/cpu/cpu1/online
# mkdir /cpuset/sub
# echo 1 > /cpuset/sub/cpuset.cpus
-bash: echo: write error: Invalid argument
Note the checks don't need to be gated by cgroup_on_dfl, because we've
initialized top_cpuset.{cpus,mems}_allowed accordingly in cpuset_bind().
Signed-off-by: Li Zefan <[email protected]>
---
kernel/cpuset.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index c47cb94..65878a7 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -929,7 +929,8 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
if (retval < 0)
return retval;
- if (!cpumask_subset(trialcs->cpus_allowed, cpu_active_mask))
+ if (!cpumask_subset(trialcs->cpus_allowed,
+ top_cpuset.cpus_allowed))
return -EINVAL;
}
@@ -1186,8 +1187,8 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
goto done;
if (!nodes_subset(trialcs->mems_allowed,
- node_states[N_MEMORY])) {
- retval = -EINVAL;
+ top_cpuset.mems_allowed)) {
+ retval = -EINVAL;
goto done;
}
}
--
1.8.0.2
cpuset.cpus and cpuset.mems are the configured masks, and we need
to export effective masks to userspace, so users know the real
cpus_allowed and mems_allowed that apply to the tasks in a cpuset.
v2:
- export those masks unconditionally, suggested by Tejun.
Signed-off-by: Li Zefan <[email protected]>
---
kernel/cpuset.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 65878a7..53a9bbf 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1535,6 +1535,8 @@ typedef enum {
FILE_MEMORY_MIGRATE,
FILE_CPULIST,
FILE_MEMLIST,
+ FILE_EFFECTIVE_CPULIST,
+ FILE_EFFECTIVE_MEMLIST,
FILE_CPU_EXCLUSIVE,
FILE_MEM_EXCLUSIVE,
FILE_MEM_HARDWALL,
@@ -1701,6 +1703,12 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
case FILE_MEMLIST:
s += nodelist_scnprintf(s, count, cs->mems_allowed);
break;
+ case FILE_EFFECTIVE_CPULIST:
+ s += cpulist_scnprintf(s, count, cs->effective_cpus);
+ break;
+ case FILE_EFFECTIVE_MEMLIST:
+ s += nodelist_scnprintf(s, count, cs->effective_mems);
+ break;
default:
ret = -EINVAL;
goto out_unlock;
@@ -1786,6 +1794,18 @@ static struct cftype files[] = {
},
{
+ .name = "effective_cpus",
+ .seq_show = cpuset_common_seq_show,
+ .private = FILE_EFFECTIVE_CPULIST,
+ },
+
+ {
+ .name = "effective_mems",
+ .seq_show = cpuset_common_seq_show,
+ .private = FILE_EFFECTIVE_MEMLIST,
+ },
+
+ {
.name = "cpu_exclusive",
.read_u64 = cpuset_read_u64,
.write_u64 = cpuset_write_u64,
--
1.8.0.2
We mix the handling for both default hierarchy and legacy hierarchy in
the same function, and it's quite messy, so split into two functions.
Signed-off-by: Li Zefan <[email protected]>
---
kernel/cpuset.c | 121 ++++++++++++++++++++++++++++++--------------------------
1 file changed, 66 insertions(+), 55 deletions(-)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 4b409d2..41822e2 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2080,6 +2080,65 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
}
}
+static void hotplug_update_tasks_legacy(struct cpuset *cs,
+ struct cpumask *off_cpus,
+ nodemask_t *off_mems)
+{
+ bool is_empty;
+
+ mutex_lock(&callback_mutex);
+ cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, off_cpus);
+ cpumask_andnot(cs->effective_cpus, cs->effective_cpus, off_cpus);
+ nodes_andnot(cs->mems_allowed, cs->mems_allowed, *off_mems);
+ nodes_andnot(cs->effective_mems, cs->effective_mems, *off_mems);
+ mutex_unlock(&callback_mutex);
+
+ /*
+ * Don't call update_tasks_cpumask() if the cpuset becomes empty,
+ * as the tasks will be migratecd to an ancestor.
+ */
+ if (!cpumask_empty(off_cpus) && !cpumask_empty(cs->cpus_allowed))
+ update_tasks_cpumask(cs);
+ if (!nodes_empty(*off_mems) && !nodes_empty(cs->mems_allowed))
+ update_tasks_nodemask(cs);
+
+ is_empty = cpumask_empty(cs->cpus_allowed) ||
+ nodes_empty(cs->mems_allowed);
+
+ mutex_unlock(&cpuset_mutex);
+
+ /*
+ * Move tasks to the nearest ancestor with execution resources,
+ * This is full cgroup operation which will also call back into
+ * cpuset. Should be done outside any lock.
+ */
+ if (is_empty)
+ remove_tasks_in_empty_cpuset(cs);
+
+ mutex_lock(&cpuset_mutex);
+}
+
+static void hotplug_update_tasks(struct cpuset *cs,
+ struct cpumask *off_cpus,
+ nodemask_t *off_mems)
+{
+ mutex_lock(&callback_mutex);
+ cpumask_andnot(cs->effective_cpus, cs->effective_cpus, off_cpus);
+ if (cpumask_empty(cs->effective_cpus))
+ cpumask_copy(cs->effective_cpus,
+ parent_cs(cs)->effective_cpus);
+
+ nodes_andnot(cs->effective_mems, cs->effective_mems, *off_mems);
+ if (nodes_empty(cs->effective_mems))
+ cs->effective_mems = parent_cs(cs)->effective_mems;
+ mutex_unlock(&callback_mutex);
+
+ if (!cpumask_empty(off_cpus))
+ update_tasks_cpumask(cs);
+ if (!nodes_empty(*off_mems))
+ update_tasks_nodemask(cs);
+}
+
/**
* cpuset_hotplug_update_tasks - update tasks in a cpuset for hotunplug
* @cs: cpuset in interest
@@ -2092,9 +2151,6 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs)
{
static cpumask_t off_cpus;
static nodemask_t off_mems;
- bool is_empty;
- bool on_dfl = cgroup_on_dfl(cs->css.cgroup);
-
retry:
wait_event(cpuset_attach_wq, cs->attach_in_progress == 0);
@@ -2109,61 +2165,16 @@ retry:
goto retry;
}
- cpumask_andnot(&off_cpus, cs->cpus_allowed, top_cpuset.cpus_allowed);
- nodes_andnot(off_mems, cs->mems_allowed, top_cpuset.mems_allowed);
-
- mutex_lock(&callback_mutex);
- cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, &off_cpus);
-
- /* Inherit the effective mask of the parent, if it becomes empty. */
- cpumask_andnot(cs->effective_cpus, cs->effective_cpus, &off_cpus);
- if (on_dfl && cpumask_empty(cs->effective_cpus))
- cpumask_copy(cs->effective_cpus, parent_cs(cs)->effective_cpus);
- mutex_unlock(&callback_mutex);
-
- /*
- * If on_dfl, we need to update tasks' cpumask for empty cpuset to
- * take on ancestor's cpumask. Otherwise, don't call
- * update_tasks_cpumask() if the cpuset becomes empty, as the tasks
- * in it will be migrated to an ancestor.
- */
- if ((on_dfl && cpumask_empty(cs->cpus_allowed)) ||
- (!cpumask_empty(&off_cpus) && !cpumask_empty(cs->cpus_allowed)))
- update_tasks_cpumask(cs);
-
- mutex_lock(&callback_mutex);
- nodes_andnot(cs->mems_allowed, cs->mems_allowed, off_mems);
+ cpumask_andnot(&off_cpus, cs->effective_cpus,
+ top_cpuset.effective_cpus);
+ nodes_andnot(off_mems, cs->effective_mems, top_cpuset.effective_mems);
- /* Inherit the effective mask of the parent, if it becomes empty */
- nodes_andnot(cs->effective_mems, cs->effective_mems, off_mems);
- if (on_dfl && nodes_empty(cs->effective_mems))
- cs->effective_mems = parent_cs(cs)->effective_mems;
- mutex_unlock(&callback_mutex);
-
- /*
- * If on_dfl, we need to update tasks' nodemask for empty cpuset to
- * take on ancestor's nodemask. Otherwise, don't call
- * update_tasks_nodemask() if the cpuset becomes empty, as the
- * tasks in it will be migratd to an ancestor.
- */
- if ((on_dfl && nodes_empty(cs->mems_allowed)) ||
- (!nodes_empty(off_mems) && !nodes_empty(cs->mems_allowed)))
- update_tasks_nodemask(cs);
-
- is_empty = cpumask_empty(cs->cpus_allowed) ||
- nodes_empty(cs->mems_allowed);
+ if (cgroup_on_dfl(cs->css.cgroup))
+ hotplug_update_tasks(cs, &off_cpus, &off_mems);
+ else
+ hotplug_update_tasks_legacy(cs, &off_cpus, &off_mems);
mutex_unlock(&cpuset_mutex);
-
- /*
- * If on_dfl, we'll keep tasks in empty cpusets.
- *
- * Otherwise move tasks to the nearest ancestor with execution
- * resources. This is full cgroup operation which will
- * also call back into cpuset. Should be done outside any lock.
- */
- if (!on_dfl && is_empty)
- remove_tasks_in_empty_cpuset(cs);
}
/**
--
1.8.0.2
On Wed, Jul 09, 2014 at 04:47:03PM +0800, Li Zefan wrote:
> + /* user-configured CPUs and Memory Nodes allow to tasks */
allowed
> + cpumask_var_t cpus_allowed;
> + nodemask_t mems_allowed;
> +
> + /* effective CPUs and Memory Nodes allow to tasks */
ditto.
Thanks.
--
tejun
On Wed, Jul 09, 2014 at 04:47:50PM +0800, Li Zefan wrote:
> + /*
> + * If the effective cpumask of any non-empty cpuset is changed,
> + * we need to rebuild sched domains.
> + */
> + if (!cpumask_empty(cp->cpus_allowed) &&
> + is_sched_load_balance(cp))
> + need_rebuild_sched_domains = true;
Hmmm... is this because if cpus_allowed is empty, the effective always
equals the parent? If so, let's update the comment so that it
explains why the code is like that. The current comment just explains
what it does which isn't very helpful.
Thanks.
--
tejun
On Wed, Jul 09, 2014 at 04:47:29PM +0800, Li Zefan wrote:
> +static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
Let's please use a variable name which clearly indicates that it's a
temp place. Something like @cpus_buf? Ditto with mems.
Thanks.
--
tejun
Applied to cgroup/for-3.17. For the review points, please follow up
with additional patches.
Thanks.
--
tejun
On Wed, Jul 09, 2014 at 04:49:25PM +0800, Li Zefan wrote:
> cpuset.cpus and cpuset.mems are the configured masks, and we need
> to export effective masks to userspace, so users know the real
> cpus_allowed and mems_allowed that apply to the tasks in a cpuset.
>
> v2:
> - export those masks unconditionally, suggested by Tejun.
>
> Signed-off-by: Li Zefan <[email protected]>
I applied this patch but there's a pending patchset to split legacy
and dfl cftype arrays, so maybe doing it separately makes more sense
now, I'm not sure. Anyways, we need to review cpuset interface for
the default hierarchy anyway. At least the memory pressure knobs
should go. It's measuring something which is completely
implementation dependent.
Thanks.
--
tejun
On Wed, Jul 09, 2014 at 04:15:49PM -0400, Tejun Heo wrote:
> On Wed, Jul 09, 2014 at 04:49:25PM +0800, Li Zefan wrote:
> > cpuset.cpus and cpuset.mems are the configured masks, and we need
> > to export effective masks to userspace, so users know the real
> > cpus_allowed and mems_allowed that apply to the tasks in a cpuset.
> >
> > v2:
> > - export those masks unconditionally, suggested by Tejun.
> >
> > Signed-off-by: Li Zefan <[email protected]>
>
> I applied this patch but there's a pending patchset to split legacy
> and dfl cftype arrays, so maybe doing it separately makes more sense
> now, I'm not sure. Anyways, we need to review cpuset interface for
> the default hierarchy anyway. At least the memory pressure knobs
> should go. It's measuring something which is completely
> implementation dependent.
The exclusive knobs too. These are purely to aid configuration.
memory_migrate is questionable too given that we're moving towards the
model where the controllers are set up before the cgroup gets
populated. Also, if necessary, this should be implementable from
userland with migrate_pages(2).
Thanks.
--
tejun