2015-07-24 18:43:59

by Tejun Heo

[permalink] [raw]
Subject: [PATCHSET block/for-4.3] blkcg: implement interface for the unified hierarchy

Hello,

blkcg interface grew to be the biggest of all controllers and
unfortunately most inconsistent too. The interface files are
inconsistent with a number of cloes duplicates. Some files have
recursive variants while others don't. There's distinction between
normal and leaf weights which isn't intuitive and there are a lot of
stat knobs which don't make much sense outside of debugging and expose
too much implementation details to userland.

In the unified hierarchy, everything is always hierarchical and
internal nodes can't have tasks rendering the two structural issues
twisting the current interface. The interface has to be updated in a
significant anyway and this is a good chance to revamp it as a whole.
This patchset implements blkcg interface for the unified hierarchy.

* blkcg is identified by "io" instead of "blkio" on the unified
hierarchy. Given that the whole interface is updated anyway, the
rename shouldn't carry noticeable conversion overhead.

* The original interface consisted of 27 files is replaced with the
following three files.

blkio.stat : per-blkcg stats
blkio.weight : per-cgroup and per-cgroup-queue weight settings
blkio.max : per-cgroup-queue bps and iops max limits

For more details, please refer to
Documentation/cgroups/unified-hierarchy.txt.

This patchset contains the following 10 patches.

0001-cgroup-don-t-print-subsystems-for-the-default-hierar.patch
0002-cgroup-introduce-cgroup_subsys-legacy_name.patch
0003-blkcg-remove-unnecessary-NULL-checks-from-__cfqg_set.patch
0004-blkcg-refine-error-codes-returned-during-blkcg-confi.patch
0005-blkcg-rename-subsystem-name-from-blkio-to-io.patch
0006-blkcg-mark-existing-cftypes-as-legacy.patch
0007-blkcg-move-body-parsing-from-blkg_conf_prep-to-its-c.patch
0008-blkcg-separate-out-tg_conf_updated-from-tg_set_conf.patch
0009-blkcg-misc-preparations-for-unified-hierarchy-interf.patch
0010-blkcg-implement-interface-for-the-unified-hierarchy.patch

0001-0002 are cgroup prep patches. 0003-0004 are misc prep patches.
0005 renames blkio to io on the unified hierarchy. 0006-0010
implement the new interface.

This patchset is also available in the following git branch.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-blkcg-unified-hier

and is on top of

block/for-linus f3f5da624e0a ("block: Do a full clone when splitting discard bios")
+ [1] [PATCHSET block/for-4.3] writeback: cgroup writeback updates
+ [2] [PATCHSET v2 block/for-4.3] block, cgroup: make cfq charge async IOs to the appropriate blkcgs
+ [3] [PATCHSET v3 block/for-4.3] blkcg: blkcg policy methods and data handling cleanup
+ [4] [PATCHSET v2 block/for-4.3] blkcg: blkcg stats cleanup

diffstat follows. Thanks.

Documentation/cgroups/unified-hierarchy.txt | 57 ++++++++-
block/bio.c | 2
block/blk-cgroup.c | 105 +++++++++++++---
block/blk-throttle.c | 176 +++++++++++++++++++++++-----
block/cfq-iosched.c | 91 +++++++++++---
include/linux/backing-dev.h | 2
include/linux/blk-cgroup.h | 12 +
include/linux/cgroup-defs.h | 3
include/linux/cgroup_subsys.h | 2
kernel/cgroup.c | 41 ++++--
mm/backing-dev.c | 4
11 files changed, 399 insertions(+), 96 deletions(-)

--
tejun

[L] http://lkml.kernel.org/g/[email protected]
[1] http://lkml.kernel.org/g/[email protected]
[2] http://lkml.kernel.org/g/[email protected]
[3] http://lkml.kernel.org/g/[email protected]
[4] http://lkml.kernel.org/g/[email protected]


2015-07-24 18:47:15

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 01/10] cgroup: don't print subsystems for the default hierarchy

It doesn't make sense to print subsystems on mount option or
/proc/PID/cgroup for the default hierarchy.

* cgroup.controllers file at the root of the default hierarchy lists
the currently attached controllers.

* The default hierarchy is catch-all for unmounted subsystems.

* The default hierarchy doesn't accept any mount options.

Suppress subsystem printing on mount options and /proc/PID/cgroup for
the default hierarchy.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: [email protected]
---
kernel/cgroup.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f89d929..6c85e6d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1332,9 +1332,10 @@ static int cgroup_show_options(struct seq_file *seq,
struct cgroup_subsys *ss;
int ssid;

- for_each_subsys(ss, ssid)
- if (root->subsys_mask & (1 << ssid))
- seq_printf(seq, ",%s", ss->name);
+ if (root != &cgrp_dfl_root)
+ for_each_subsys(ss, ssid)
+ if (root->subsys_mask & (1 << ssid))
+ seq_printf(seq, ",%s", ss->name);
if (root->flags & CGRP_ROOT_NOPREFIX)
seq_puts(seq, ",noprefix");
if (root->flags & CGRP_ROOT_XATTR)
@@ -5136,9 +5137,11 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
continue;

seq_printf(m, "%d:", root->hierarchy_id);
- for_each_subsys(ss, ssid)
- if (root->subsys_mask & (1 << ssid))
- seq_printf(m, "%s%s", count++ ? "," : "", ss->name);
+ if (root != &cgrp_dfl_root)
+ for_each_subsys(ss, ssid)
+ if (root->subsys_mask & (1 << ssid))
+ seq_printf(m, "%s%s", count++ ? "," : "",
+ ss->name);
if (strlen(root->name))
seq_printf(m, "%sname=%s", count ? "," : "",
root->name);
--
2.4.3

2015-07-24 18:44:01

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 02/10] cgroup: introduce cgroup_subsys->legacy_name

This allows cgroup subsystems to use a different name on the unified
hierarchy. cgroup_subsys->name is used on the unified hierarchy,
->legacy_name elsewhere. If ->legacy_name is not explicitly set, it's
automatically set to ->name and the userland visible behavior remains
unchanged.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: [email protected]
---
include/linux/cgroup-defs.h | 3 +++
kernel/cgroup.c | 30 +++++++++++++++++++-----------
2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 8f5770a..7d0bb53 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -434,6 +434,9 @@ struct cgroup_subsys {
int id;
const char *name;

+ /* optional, initialized automatically during boot if not set */
+ const char *legacy_name;
+
/* link to parent, protected by cgroup_lock() */
struct cgroup_root *root;

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 6c85e6d..0da2efa 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1027,10 +1027,13 @@ static const struct file_operations proc_cgroupstats_operations;
static char *cgroup_file_name(struct cgroup *cgrp, const struct cftype *cft,
char *buf)
{
+ struct cgroup_subsys *ss = cft->ss;
+
if (cft->ss && !(cft->flags & CFTYPE_NO_PREFIX) &&
!(cgrp->root->flags & CGRP_ROOT_NOPREFIX))
snprintf(buf, CGROUP_FILE_NAME_MAX, "%s.%s",
- cft->ss->name, cft->name);
+ cgroup_on_dfl(cgrp) ? ss->name : ss->legacy_name,
+ cft->name);
else
strncpy(buf, cft->name, CGROUP_FILE_NAME_MAX);
return buf;
@@ -1335,7 +1338,7 @@ static int cgroup_show_options(struct seq_file *seq,
if (root != &cgrp_dfl_root)
for_each_subsys(ss, ssid)
if (root->subsys_mask & (1 << ssid))
- seq_printf(seq, ",%s", ss->name);
+ seq_printf(seq, ",%s", ss->legacy_name);
if (root->flags & CGRP_ROOT_NOPREFIX)
seq_puts(seq, ",noprefix");
if (root->flags & CGRP_ROOT_XATTR)
@@ -1448,7 +1451,8 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
}

for_each_subsys(ss, i) {
- if (strcmp(token, ss->name))
+ if (strcmp(token, ss->name) &&
+ strcmp(token, ss->legacy_name))
continue;
if (ss->disabled)
continue;
@@ -4994,6 +4998,8 @@ int __init cgroup_init_early(void)

ss->id = i;
ss->name = cgroup_subsys_name[i];
+ if (!ss->legacy_name)
+ ss->legacy_name = cgroup_subsys_name[i];

if (ss->early_init)
cgroup_init_subsys(ss, true);
@@ -5141,7 +5147,7 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
for_each_subsys(ss, ssid)
if (root->subsys_mask & (1 << ssid))
seq_printf(m, "%s%s", count++ ? "," : "",
- ss->name);
+ ss->legacy_name);
if (strlen(root->name))
seq_printf(m, "%sname=%s", count ? "," : "",
root->name);
@@ -5181,7 +5187,7 @@ static int proc_cgroupstats_show(struct seq_file *m, void *v)

for_each_subsys(ss, i)
seq_printf(m, "%s\t%d\t%d\t%d\n",
- ss->name, ss->root->hierarchy_id,
+ ss->legacy_name, ss->root->hierarchy_id,
atomic_read(&ss->root->nr_cgrps), !ss->disabled);

mutex_unlock(&cgroup_mutex);
@@ -5403,12 +5409,14 @@ static int __init cgroup_disable(char *str)
continue;

for_each_subsys(ss, i) {
- if (!strcmp(token, ss->name)) {
- ss->disabled = 1;
- printk(KERN_INFO "Disabling %s control group"
- " subsystem\n", ss->name);
- break;
- }
+ if (strcmp(token, ss->name) &&
+ strcmp(token, ss->legacy_name))
+ continue;
+
+ ss->disabled = 1;
+ printk(KERN_INFO "Disabling %s control group subsystem\n",
+ ss->name);
+ break;
}
}
return 1;
--
2.4.3

2015-07-24 18:46:32

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 03/10] blkcg: remove unnecessary NULL checks from __cfqg_set_weight_device()

blkg_to_cfqg() and blkcg_to_cfqgd() on a valid blkg with the policy
enabled are guaranteed to return non-NULL and the counterpart in
blk-throttle doesn't have these checks either. Remove the spurious
NULL checks.

Signed-off-by: Tejun Heo <[email protected]>
---
block/cfq-iosched.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 395476a..bcf4026 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1752,12 +1752,10 @@ static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of,
if (ret)
return ret;

- ret = -EINVAL;
cfqg = blkg_to_cfqg(ctx.blkg);
cfqgd = blkcg_to_cfqgd(blkcg);
- if (!cfqg || !cfqgd)
- goto err;

+ ret = -EINVAL;
if (!ctx.v || (ctx.v >= CFQ_WEIGHT_MIN && ctx.v <= CFQ_WEIGHT_MAX)) {
if (!is_leaf_weight) {
cfqg->dev_weight = ctx.v;
@@ -1769,7 +1767,6 @@ static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of,
ret = 0;
}

-err:
blkg_conf_finish(&ctx);
return ret ?: nbytes;
}
--
2.4.3

2015-07-24 18:44:05

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 04/10] blkcg: refine error codes returned during blkcg configuration

blkcg currently returns -EINVAL for most errors which can be pretty
confusing given that the failure modes are quite varied. Update the
error returns so that

* -EINVAL only for syntactic errors.
* -ERANGE if the value is out of range.
* -ENODEV if the target device can't be found.
* -EOPNOTSUPP if the policy is not enabled on the target device.

Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-cgroup.c | 12 ++++++------
block/cfq-iosched.c | 2 +-
2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 63c0914..a192f98 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -179,7 +179,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,

/* blkg holds a reference to blkcg */
if (!css_tryget_online(&blkcg->css)) {
- ret = -EINVAL;
+ ret = -ENODEV;
goto err_free_blkg;
}

@@ -205,7 +205,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
if (blkcg_parent(blkcg)) {
blkg->parent = __blkg_lookup(blkcg_parent(blkcg), q, false);
if (WARN_ON_ONCE(!blkg->parent)) {
- ret = -EINVAL;
+ ret = -ENODEV;
goto err_put_congested;
}
blkg_get(blkg->parent);
@@ -279,7 +279,7 @@ struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
* we shouldn't allow anything to go through for a bypassing queue.
*/
if (unlikely(blk_queue_bypass(q)))
- return ERR_PTR(blk_queue_dying(q) ? -EINVAL : -EBUSY);
+ return ERR_PTR(blk_queue_dying(q) ? -ENODEV : -EBUSY);

blkg = __blkg_lookup(blkcg, q, true);
if (blkg)
@@ -792,10 +792,10 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,

disk = get_gendisk(MKDEV(major, minor), &part);
if (!disk)
- return -EINVAL;
+ return -ENODEV;
if (part) {
put_disk(disk);
- return -EINVAL;
+ return -ENODEV;
}

rcu_read_lock();
@@ -804,7 +804,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
if (blkcg_policy_enabled(disk->queue, pol))
blkg = blkg_lookup_create(blkcg, disk->queue);
else
- blkg = ERR_PTR(-EINVAL);
+ blkg = ERR_PTR(-EOPNOTSUPP);

if (IS_ERR(blkg)) {
ret = PTR_ERR(blkg);
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index bcf4026..38277e3 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1755,7 +1755,7 @@ static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of,
cfqg = blkg_to_cfqg(ctx.blkg);
cfqgd = blkcg_to_cfqgd(blkcg);

- ret = -EINVAL;
+ ret = -ERANGE;
if (!ctx.v || (ctx.v >= CFQ_WEIGHT_MIN && ctx.v <= CFQ_WEIGHT_MAX)) {
if (!is_leaf_weight) {
cfqg->dev_weight = ctx.v;
--
2.4.3

2015-07-24 18:45:50

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 05/10] blkcg: rename subsystem name from blkio to io

blkio interface has become messy over time and is currently the
largest. In addition to the inconsistent naming scheme, it has
multiple stat files which report more or less the same thing, a number
of debug stat files which expose internal details which shouldn't have
been part of the public interface in the first place, recursive and
non-recursive stats and leaf and non-leaf knobs.

Both recursive vs. non-recursive and leaf vs. non-leaf distinctions
don't make any sense on the unified hierarchy as only leaf cgroups can
contain processes. cgroups is going through a major interface
revision with the unified hierarchy involving significant fundamental
usage changes and given that a significant portion of the interface
doesn't make sense anymore, it's a good time to reorganize the
interface.

As the first step, this patch renames the external visible subsystem
name from "blkio" to "io". This is more concise, matches the other
two major subsystem names, "cpu" and "memory", and better suited as
blkcg will be involved in anything writeback related too whether an
actual block device is involved or not.

As the subsystem legacy_name is set to "blkio", the only userland
visible change outside the unified hierarchy is that blkcg is reported
as "io" instead of "blkio" in the subsystem initialized message during
boot. On the unified hierarchy, blkcg now appears as "io".

Signed-off-by: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: [email protected]
---
block/bio.c | 2 +-
block/blk-cgroup.c | 7 ++++---
include/linux/backing-dev.h | 2 +-
include/linux/blk-cgroup.h | 4 ++--
include/linux/cgroup_subsys.h | 2 +-
mm/backing-dev.c | 4 ++--
6 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index d6e5ba3..c52222c 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -2046,7 +2046,7 @@ int bio_associate_current(struct bio *bio)

get_io_context_active(ioc);
bio->bi_ioc = ioc;
- bio->bi_css = task_get_css(current, blkio_cgrp_id);
+ bio->bi_css = task_get_css(current, io_cgrp_id);
return 0;
}
EXPORT_SYMBOL_GPL(bio_associate_current);
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index a192f98..c2fb867 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1088,12 +1088,13 @@ static int blkcg_can_attach(struct cgroup_subsys_state *css,
return ret;
}

-struct cgroup_subsys blkio_cgrp_subsys = {
+struct cgroup_subsys io_cgrp_subsys = {
.css_alloc = blkcg_css_alloc,
.css_offline = blkcg_css_offline,
.css_free = blkcg_css_free,
.can_attach = blkcg_can_attach,
.legacy_cftypes = blkcg_files,
+ .legacy_name = "blkio",
#ifdef CONFIG_MEMCG
/*
* This ensures that, if available, memcg is automatically enabled
@@ -1103,7 +1104,7 @@ struct cgroup_subsys blkio_cgrp_subsys = {
.depends_on = 1 << memory_cgrp_id,
#endif
};
-EXPORT_SYMBOL_GPL(blkio_cgrp_subsys);
+EXPORT_SYMBOL_GPL(io_cgrp_subsys);

/**
* blkcg_activate_policy - activate a blkcg policy on a request_queue
@@ -1264,7 +1265,7 @@ int blkcg_policy_register(struct blkcg_policy *pol)

/* everything is in place, add intf files for the new policy */
if (pol->cftypes)
- WARN_ON(cgroup_add_legacy_cftypes(&blkio_cgrp_subsys,
+ WARN_ON(cgroup_add_legacy_cftypes(&io_cgrp_subsys,
pol->cftypes));
mutex_unlock(&blkcg_pol_register_mutex);
return 0;
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 23ebb94..5a5d79e 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -286,7 +286,7 @@ static inline struct bdi_writeback *wb_find_current(struct backing_dev_info *bdi
* %current's blkcg equals the effective blkcg of its memcg. No
* need to use the relatively expensive cgroup_get_e_css().
*/
- if (likely(wb && wb->blkcg_css == task_css(current, blkio_cgrp_id)))
+ if (likely(wb && wb->blkcg_css == task_css(current, io_cgrp_id)))
return wb;
return NULL;
}
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 286e1bd..db89acd 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -221,7 +221,7 @@ static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css)

static inline struct blkcg *task_blkcg(struct task_struct *tsk)
{
- return css_to_blkcg(task_css(tsk, blkio_cgrp_id));
+ return css_to_blkcg(task_css(tsk, io_cgrp_id));
}

static inline struct blkcg *bio_blkcg(struct bio *bio)
@@ -234,7 +234,7 @@ static inline struct blkcg *bio_blkcg(struct bio *bio)
static inline struct cgroup_subsys_state *
task_get_blkcg_css(struct task_struct *task)
{
- return task_get_css(task, blkio_cgrp_id);
+ return task_get_css(task, io_cgrp_id);
}

/**
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index e4a96fb..86b5056 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -16,7 +16,7 @@ SUBSYS(cpuacct)
#endif

#if IS_ENABLED(CONFIG_BLK_CGROUP)
-SUBSYS(blkio)
+SUBSYS(io)
#endif

#if IS_ENABLED(CONFIG_MEMCG)
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index dac5bf5..d0ee90e 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -523,7 +523,7 @@ static int cgwb_create(struct backing_dev_info *bdi,
int ret = 0;

memcg = mem_cgroup_from_css(memcg_css);
- blkcg_css = cgroup_get_e_css(memcg_css->cgroup, &blkio_cgrp_subsys);
+ blkcg_css = cgroup_get_e_css(memcg_css->cgroup, &io_cgrp_subsys);
blkcg = css_to_blkcg(blkcg_css);
memcg_cgwb_list = mem_cgroup_cgwb_list(memcg);
blkcg_cgwb_list = &blkcg->cgwb_list;
@@ -645,7 +645,7 @@ struct bdi_writeback *wb_get_create(struct backing_dev_info *bdi,

/* see whether the blkcg association has changed */
blkcg_css = cgroup_get_e_css(memcg_css->cgroup,
- &blkio_cgrp_subsys);
+ &io_cgrp_subsys);
if (unlikely(wb->blkcg_css != blkcg_css ||
!wb_tryget(wb)))
wb = NULL;
--
2.4.3

2015-07-24 18:44:10

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 06/10] blkcg: mark existing cftypes as legacy

blkcg is about to grow interface for the unified hierarchy. Add
legacy to existing cftypes.

* blkcg_policy->cftypes -> blkcg_policy->legacy_cftypes
* blk-cgroup.c:blkcg_files -> blkcg_legacy_files
* cfq-iosched.c:cfq_blkcg_files -> cfq_blkcg_legacy_files
* blk-throttle.c:throtl_files -> throtl_legacy_files

Pure renames. No functional change.

Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-cgroup.c | 12 ++++++------
block/blk-throttle.c | 4 ++--
block/cfq-iosched.c | 4 ++--
include/linux/blk-cgroup.h | 2 +-
4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index c2fb867..db7b3cc 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -847,7 +847,7 @@ void blkg_conf_finish(struct blkg_conf_ctx *ctx)
}
EXPORT_SYMBOL_GPL(blkg_conf_finish);

-struct cftype blkcg_files[] = {
+struct cftype blkcg_legacy_files[] = {
{
.name = "reset_stats",
.write_u64 = blkcg_reset_stats,
@@ -1093,7 +1093,7 @@ struct cgroup_subsys io_cgrp_subsys = {
.css_offline = blkcg_css_offline,
.css_free = blkcg_css_free,
.can_attach = blkcg_can_attach,
- .legacy_cftypes = blkcg_files,
+ .legacy_cftypes = blkcg_legacy_files,
.legacy_name = "blkio",
#ifdef CONFIG_MEMCG
/*
@@ -1264,9 +1264,9 @@ int blkcg_policy_register(struct blkcg_policy *pol)
mutex_unlock(&blkcg_pol_mutex);

/* everything is in place, add intf files for the new policy */
- if (pol->cftypes)
+ if (pol->legacy_cftypes)
WARN_ON(cgroup_add_legacy_cftypes(&io_cgrp_subsys,
- pol->cftypes));
+ pol->legacy_cftypes));
mutex_unlock(&blkcg_pol_register_mutex);
return 0;

@@ -1303,8 +1303,8 @@ void blkcg_policy_unregister(struct blkcg_policy *pol)
goto out_unlock;

/* kill the intf files first */
- if (pol->cftypes)
- cgroup_rm_cftypes(pol->cftypes);
+ if (pol->legacy_cftypes)
+ cgroup_rm_cftypes(pol->legacy_cftypes);

/* remove cpds and unregister */
mutex_lock(&blkcg_pol_mutex);
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index bd3e4b2..8b4f6b8 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1217,7 +1217,7 @@ static ssize_t tg_set_conf_uint(struct kernfs_open_file *of,
return tg_set_conf(of, buf, nbytes, off, false);
}

-static struct cftype throtl_files[] = {
+static struct cftype throtl_legacy_files[] = {
{
.name = "throttle.read_bps_device",
.private = offsetof(struct throtl_grp, bps[READ]),
@@ -1263,7 +1263,7 @@ static void throtl_shutdown_wq(struct request_queue *q)
}

static struct blkcg_policy blkcg_policy_throtl = {
- .cftypes = throtl_files,
+ .legacy_cftypes = throtl_legacy_files,

.pd_alloc_fn = throtl_pd_alloc,
.pd_init_fn = throtl_pd_init,
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 38277e3..baa8459 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1944,7 +1944,7 @@ static int cfqg_print_avg_queue_size(struct seq_file *sf, void *v)
}
#endif /* CONFIG_DEBUG_BLK_CGROUP */

-static struct cftype cfq_blkcg_files[] = {
+static struct cftype cfq_blkcg_legacy_files[] = {
/* on root, weight is mapped to leaf_weight */
{
.name = "weight_device",
@@ -4654,7 +4654,7 @@ static struct elevator_type iosched_cfq = {

#ifdef CONFIG_CFQ_GROUP_IOSCHED
static struct blkcg_policy blkcg_policy_cfq = {
- .cftypes = cfq_blkcg_files,
+ .legacy_cftypes = cfq_blkcg_legacy_files,

.cpd_alloc_fn = cfq_cpd_alloc,
.cpd_init_fn = cfq_cpd_init,
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index db89acd..6e016e6 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -148,7 +148,7 @@ typedef void (blkcg_pol_reset_pd_stats_fn)(struct blkg_policy_data *pd);
struct blkcg_policy {
int plid;
/* cgroup files for the policy */
- struct cftype *cftypes;
+ struct cftype *legacy_cftypes;

/* operations */
blkcg_pol_alloc_cpd_fn *cpd_alloc_fn;
--
2.4.3

2015-07-24 18:45:17

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 07/10] blkcg: move body parsing from blkg_conf_prep() to its callers

Currently, blkg_conf_prep() expects input to be of the following form

MAJ:MIN NUM

and reads the NUM part into blkg_conf_ctx->v. This is quite
restrictive and gets in the way in implementing blkcg interface for
the unified hierarchy. This patch updates blkg_conf_prep() so that it
expects

MAJ:MIN BODY_STR

where BODY_STR is an arbitrary string. blkg_conf_ctx->v is replaced
with ->body which is a char pointer pointing to the start of BODY_STR.
Parsing of the body is moved to blkg_conf_prep()'s callers.

To allow using, for example, strsep() on blkg_conf_ctx->val, it is a
non-const pointer and to accommodate that const is dropped from @input
too.

This doesn't cause any behavior changes.

Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-cgroup.c | 22 ++++++++++++++--------
block/blk-throttle.c | 18 ++++++++++++------
block/cfq-iosched.c | 17 +++++++++++------
include/linux/blk-cgroup.h | 4 ++--
4 files changed, 39 insertions(+), 22 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index db7b3cc..2ea3a2a 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -24,6 +24,7 @@
#include <linux/genhd.h>
#include <linux/delay.h>
#include <linux/atomic.h>
+#include <linux/ctype.h>
#include <linux/blk-cgroup.h>
#include "blk.h"

@@ -773,23 +774,28 @@ EXPORT_SYMBOL_GPL(blkg_rwstat_recursive_sum);
* @ctx: blkg_conf_ctx to be filled
*
* Parse per-blkg config update from @input and initialize @ctx with the
- * result. @ctx->blkg points to the blkg to be updated and @ctx->v the new
- * value. This function returns with RCU read lock and queue lock held and
- * must be paired with blkg_conf_finish().
+ * result. @ctx->blkg points to the blkg to be updated and @ctx->body the
+ * part of @input following MAJ:MIN. This function returns with RCU read
+ * lock and queue lock held and must be paired with blkg_conf_finish().
*/
int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
- const char *input, struct blkg_conf_ctx *ctx)
+ char *input, struct blkg_conf_ctx *ctx)
__acquires(rcu) __acquires(disk->queue->queue_lock)
{
struct gendisk *disk;
struct blkcg_gq *blkg;
unsigned int major, minor;
- unsigned long long v;
- int part, ret;
+ int key_len, part, ret;
+ char *body;

- if (sscanf(input, "%u:%u %llu", &major, &minor, &v) != 3)
+ if (sscanf(input, "%u:%u%n", &major, &minor, &key_len) != 2)
return -EINVAL;

+ body = input + key_len;
+ if (!isspace(*body))
+ return -EINVAL;
+ body = skip_spaces(body);
+
disk = get_gendisk(MKDEV(major, minor), &part);
if (!disk)
return -ENODEV;
@@ -826,7 +832,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,

ctx->disk = disk;
ctx->blkg = blkg;
- ctx->v = v;
+ ctx->body = body;
return 0;
}
EXPORT_SYMBOL_GPL(blkg_conf_prep);
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 8b4f6b8..0e17c8f 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1154,21 +1154,25 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of,
struct blkcg_gq *blkg;
struct cgroup_subsys_state *pos_css;
int ret;
+ u64 v;

ret = blkg_conf_prep(blkcg, &blkcg_policy_throtl, buf, &ctx);
if (ret)
return ret;

+ ret = -EINVAL;
+ if (sscanf(ctx.body, "%llu", &v) != 1)
+ goto out_finish;
+ if (!v)
+ v = -1;
+
tg = blkg_to_tg(ctx.blkg);
sq = &tg->service_queue;

- if (!ctx.v)
- ctx.v = -1;
-
if (is_u64)
- *(u64 *)((void *)tg + of_cft(of)->private) = ctx.v;
+ *(u64 *)((void *)tg + of_cft(of)->private) = v;
else
- *(unsigned int *)((void *)tg + of_cft(of)->private) = ctx.v;
+ *(unsigned int *)((void *)tg + of_cft(of)->private) = v;

throtl_log(&tg->service_queue,
"limit change rbps=%llu wbps=%llu riops=%u wiops=%u",
@@ -1201,8 +1205,10 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of,
throtl_schedule_next_dispatch(sq->parent_sq, true);
}

+ ret = 0;
+out_finish:
blkg_conf_finish(&ctx);
- return nbytes;
+ return ret ?: nbytes;
}

static ssize_t tg_set_conf_u64(struct kernfs_open_file *of,
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index baa8459..ea88d89 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1747,26 +1747,31 @@ static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of,
struct cfq_group *cfqg;
struct cfq_group_data *cfqgd;
int ret;
+ u64 v;

ret = blkg_conf_prep(blkcg, &blkcg_policy_cfq, buf, &ctx);
if (ret)
return ret;

+ ret = -EINVAL;
+ if (sscanf(ctx.body, "%llu", &v) != 1)
+ goto out_finish;
+
cfqg = blkg_to_cfqg(ctx.blkg);
cfqgd = blkcg_to_cfqgd(blkcg);

ret = -ERANGE;
- if (!ctx.v || (ctx.v >= CFQ_WEIGHT_MIN && ctx.v <= CFQ_WEIGHT_MAX)) {
+ if (!v || (v >= CFQ_WEIGHT_MIN && v <= CFQ_WEIGHT_MAX)) {
if (!is_leaf_weight) {
- cfqg->dev_weight = ctx.v;
- cfqg->new_weight = ctx.v ?: cfqgd->weight;
+ cfqg->dev_weight = v;
+ cfqg->new_weight = v ?: cfqgd->weight;
} else {
- cfqg->dev_leaf_weight = ctx.v;
- cfqg->new_leaf_weight = ctx.v ?: cfqgd->leaf_weight;
+ cfqg->dev_leaf_weight = v;
+ cfqg->new_leaf_weight = v ?: cfqgd->leaf_weight;
}
ret = 0;
}
-
+out_finish:
blkg_conf_finish(&ctx);
return ret ?: nbytes;
}
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 6e016e6..85a4d98 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -206,11 +206,11 @@ struct blkg_rwstat blkg_rwstat_recursive_sum(struct blkcg_gq *blkg,
struct blkg_conf_ctx {
struct gendisk *disk;
struct blkcg_gq *blkg;
- u64 v;
+ char *body;
};

int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
- const char *input, struct blkg_conf_ctx *ctx);
+ char *input, struct blkg_conf_ctx *ctx);
void blkg_conf_finish(struct blkg_conf_ctx *ctx);


--
2.4.3

2015-07-24 18:44:55

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 08/10] blkcg: separate out tg_conf_updated() from tg_set_conf()

tg_set_conf() is largely consisted of parsing and setting the new
config and the follow-up application and propagation. This patch
separates out the latter part into tg_conf_updated(). This will be
used to implement interface for the unified hierarchy.

Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-throttle.c | 60 ++++++++++++++++++++++++++++------------------------
1 file changed, 32 insertions(+), 28 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 0e17c8f..a8bb2fd 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1144,35 +1144,11 @@ static int tg_print_conf_uint(struct seq_file *sf, void *v)
return 0;
}

-static ssize_t tg_set_conf(struct kernfs_open_file *of,
- char *buf, size_t nbytes, loff_t off, bool is_u64)
+static void tg_conf_updated(struct throtl_grp *tg)
{
- struct blkcg *blkcg = css_to_blkcg(of_css(of));
- struct blkg_conf_ctx ctx;
- struct throtl_grp *tg;
- struct throtl_service_queue *sq;
- struct blkcg_gq *blkg;
+ struct throtl_service_queue *sq = &tg->service_queue;
struct cgroup_subsys_state *pos_css;
- int ret;
- u64 v;
-
- ret = blkg_conf_prep(blkcg, &blkcg_policy_throtl, buf, &ctx);
- if (ret)
- return ret;
-
- ret = -EINVAL;
- if (sscanf(ctx.body, "%llu", &v) != 1)
- goto out_finish;
- if (!v)
- v = -1;
-
- tg = blkg_to_tg(ctx.blkg);
- sq = &tg->service_queue;
-
- if (is_u64)
- *(u64 *)((void *)tg + of_cft(of)->private) = v;
- else
- *(unsigned int *)((void *)tg + of_cft(of)->private) = v;
+ struct blkcg_gq *blkg;

throtl_log(&tg->service_queue,
"limit change rbps=%llu wbps=%llu riops=%u wiops=%u",
@@ -1186,7 +1162,7 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of,
* restrictions in the whole hierarchy and allows them to bypass
* blk-throttle.
*/
- blkg_for_each_descendant_pre(blkg, pos_css, ctx.blkg)
+ blkg_for_each_descendant_pre(blkg, pos_css, tg_to_blkg(tg))
tg_update_has_rules(blkg_to_tg(blkg));

/*
@@ -1204,7 +1180,35 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of,
tg_update_disptime(tg);
throtl_schedule_next_dispatch(sq->parent_sq, true);
}
+}
+
+static ssize_t tg_set_conf(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off, bool is_u64)
+{
+ struct blkcg *blkcg = css_to_blkcg(of_css(of));
+ struct blkg_conf_ctx ctx;
+ struct throtl_grp *tg;
+ int ret;
+ u64 v;
+
+ ret = blkg_conf_prep(blkcg, &blkcg_policy_throtl, buf, &ctx);
+ if (ret)
+ return ret;
+
+ ret = -EINVAL;
+ if (sscanf(ctx.body, "%llu", &v) != 1)
+ goto out_finish;
+ if (!v)
+ v = -1;
+
+ tg = blkg_to_tg(ctx.blkg);
+
+ if (is_u64)
+ *(u64 *)((void *)tg + of_cft(of)->private) = v;
+ else
+ *(unsigned int *)((void *)tg + of_cft(of)->private) = v;

+ tg_conf_updated(tg);
ret = 0;
out_finish:
blkg_conf_finish(&ctx);
--
2.4.3

2015-07-24 18:44:34

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 09/10] blkcg: misc preparations for unified hierarchy interface

* Export blkg_dev_name()

* Drop unnecessary @cft from __cfq_set_weight().

Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-cgroup.c | 3 ++-
block/cfq-iosched.c | 8 ++++----
include/linux/blk-cgroup.h | 1 +
3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 2ea3a2a..b9d511b 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -462,13 +462,14 @@ static int blkcg_reset_stats(struct cgroup_subsys_state *css,
return 0;
}

-static const char *blkg_dev_name(struct blkcg_gq *blkg)
+const char *blkg_dev_name(struct blkcg_gq *blkg)
{
/* some drivers (floppy) instantiate a queue w/o disk registered */
if (blkg->q->backing_dev_info.dev)
return dev_name(blkg->q->backing_dev_info.dev);
return NULL;
}
+EXPORT_SYMBOL_GPL(blkg_dev_name);

/**
* blkcg_print_blkgs - helper for printing per-blkg data
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index ea88d89..7a72301 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1788,8 +1788,8 @@ static ssize_t cfqg_set_leaf_weight_device(struct kernfs_open_file *of,
return __cfqg_set_weight_device(of, buf, nbytes, off, true);
}

-static int __cfq_set_weight(struct cgroup_subsys_state *css, struct cftype *cft,
- u64 val, bool is_leaf_weight)
+static int __cfq_set_weight(struct cgroup_subsys_state *css, u64 val,
+ bool is_leaf_weight)
{
struct blkcg *blkcg = css_to_blkcg(css);
struct blkcg_gq *blkg;
@@ -1834,13 +1834,13 @@ static int __cfq_set_weight(struct cgroup_subsys_state *css, struct cftype *cft,
static int cfq_set_weight(struct cgroup_subsys_state *css, struct cftype *cft,
u64 val)
{
- return __cfq_set_weight(css, cft, val, false);
+ return __cfq_set_weight(css, val, false);
}

static int cfq_set_leaf_weight(struct cgroup_subsys_state *css,
struct cftype *cft, u64 val)
{
- return __cfq_set_weight(css, cft, val, true);
+ return __cfq_set_weight(css, val, true);
}

static int cfqg_print_stat(struct seq_file *sf, void *v)
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 85a4d98..b270aef 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -182,6 +182,7 @@ int blkcg_activate_policy(struct request_queue *q,
void blkcg_deactivate_policy(struct request_queue *q,
const struct blkcg_policy *pol);

+const char *blkg_dev_name(struct blkcg_gq *blkg);
void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg,
u64 (*prfill)(struct seq_file *,
struct blkg_policy_data *, int),
--
2.4.3

2015-07-24 18:44:13

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 10/10] blkcg: implement interface for the unified hierarchy

blkcg interface grew to be the biggest of all controllers and
unfortunately most inconsistent too. The interface files are
inconsistent with a number of cloes duplicates. Some files have
recursive variants while others don't. There's distinction between
normal and leaf weights which isn't intuitive and there are a lot of
stat knobs which don't make much sense outside of debugging and expose
too much implementation details to userland.

In the unified hierarchy, everything is always hierarchical and
internal nodes can't have tasks rendering the two structural issues
twisting the current interface. The interface has to be updated in a
significant anyway and this is a good chance to revamp it as a whole.
This patch implements blkcg interface for the unified hierarchy.

* (from a previous patch) blkcg is identified by "io" instead of
"blkio" on the unified hierarchy. Given that the whole interface is
updated anyway, the rename shouldn't carry noticeable conversion
overhead.

* The original interface consisted of 27 files is replaced with the
following three files.

blkio.stat : per-blkcg stats
blkio.weight : per-cgroup and per-cgroup-queue weight settings
blkio.max : per-cgroup-queue bps and iops max limits

Documentation/cgroups/unified-hierarchy.txt updated accordingly.

Signed-off-by: Tejun Heo <[email protected]>
---
Documentation/cgroups/unified-hierarchy.txt | 57 +++++++++++++-
block/blk-cgroup.c | 51 +++++++++++++
block/blk-throttle.c | 112 ++++++++++++++++++++++++++++
block/cfq-iosched.c | 61 +++++++++++++--
include/linux/blk-cgroup.h | 1 +
5 files changed, 275 insertions(+), 7 deletions(-)

diff --git a/Documentation/cgroups/unified-hierarchy.txt b/Documentation/cgroups/unified-hierarchy.txt
index 86847a7..4e23d4c 100644
--- a/Documentation/cgroups/unified-hierarchy.txt
+++ b/Documentation/cgroups/unified-hierarchy.txt
@@ -374,9 +374,62 @@ supported and the interface files "release_agent" and

5-3. Per-Controller Changes

-5-3-1. blkio
+5-3-1. io

-- blk-throttle becomes properly hierarchical.
+- blkio is renamed to io. The interface is overhauled anyway. The
+ new name is more in line with the other two major controllers, cpu
+ and memory, and better suited given that it may be used for cgroup
+ writeback without involving block layer.
+
+- Everything including stat is always hierarchical making separate
+ recursive stat files pointless and, as no internal node can have
+ tasks, leaf weights are meaningless. The operation model is
+ simplified and the interface is overhauled accordingly.
+
+ io.stat
+
+ The stat file. The reported stats are from the point where
+ bio's are issued to request_queue. The stats are counted
+ independent of which policies are enabled. Each line in the
+ file follows the following format. More fields may later be
+ added at the end.
+
+ $MAJ:$MIN rbytes=$RBYTES wbytes=$WBYTES rios=$RIOS wrios=$WIOS
+
+ io.weight
+
+ The weight setting, currently only available and effective if
+ cfq-iosched is in use for the target device. The weight is
+ between 10 and 1000 and defaults to 500. The first line
+ always contains the default weight in the following format to
+ use when per-device setting is missing.
+
+ default $WEIGHT
+
+ Subsequent lines list per-device weights of the following
+ format.
+
+ $MAJ:$MIN $WEIGHT
+
+ Writing "$WEIGHT" or "default $WEIGHT" changes the default
+ setting. Writing "$MAJ:$MIN $WEIGHT" sets per-device weight
+ while "$MAJ:$MIN default" clears it.
+
+ This file is available only on non-root cgroups.
+
+ io.max
+
+ The maximum bandwidth and/or iops setting, only available if
+ blk-throttle is enabled. The file is of the following format.
+
+ $MAJ:$MIN rbps=$RBPS wbps=$WBPS riops=$RIOPS wiops=$WIOPS
+
+ ${R|W}BPS are read/write bytes per second and ${R|W}IOPS are
+ read/write IOs per second. "max" indicates no limit. Writing
+ to the file follows the same format but the individual
+ settings may be ommitted or specified in any order.
+
+ This file is available only on non-root cgroups.


5-3-2. cpuset
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b9d511b..b97a075 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -854,6 +854,53 @@ void blkg_conf_finish(struct blkg_conf_ctx *ctx)
}
EXPORT_SYMBOL_GPL(blkg_conf_finish);

+static int blkcg_print_stat(struct seq_file *sf, void *v)
+{
+ struct blkcg *blkcg = css_to_blkcg(seq_css(sf));
+ struct blkcg_gq *blkg;
+
+ rcu_read_lock();
+
+ hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) {
+ const char *dname;
+ struct blkg_rwstat rwstat;
+ u64 rbytes, wbytes, rios, wios;
+
+ dname = blkg_dev_name(blkg);
+ if (!dname)
+ continue;
+
+ spin_lock_irq(blkg->q->queue_lock);
+
+ rwstat = blkg_rwstat_recursive_sum(blkg, NULL,
+ offsetof(struct blkcg_gq, stat_bytes));
+ rbytes = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_READ]);
+ wbytes = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_WRITE]);
+
+ rwstat = blkg_rwstat_recursive_sum(blkg, NULL,
+ offsetof(struct blkcg_gq, stat_ios));
+ rios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_READ]);
+ wios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_WRITE]);
+
+ spin_unlock_irq(blkg->q->queue_lock);
+
+ if (rbytes || wbytes || rios || wios)
+ seq_printf(sf, "%s rbytes=%llu wbytes=%llu rios=%llu wios=%llu\n",
+ dname, rbytes, wbytes, rios, wios);
+ }
+
+ rcu_read_unlock();
+ return 0;
+}
+
+struct cftype blkcg_files[] = {
+ {
+ .name = "stat",
+ .seq_show = blkcg_print_stat,
+ },
+ { } /* terminate */
+};
+
struct cftype blkcg_legacy_files[] = {
{
.name = "reset_stats",
@@ -1100,6 +1147,7 @@ struct cgroup_subsys io_cgrp_subsys = {
.css_offline = blkcg_css_offline,
.css_free = blkcg_css_free,
.can_attach = blkcg_can_attach,
+ .dfl_cftypes = blkcg_files,
.legacy_cftypes = blkcg_legacy_files,
.legacy_name = "blkio",
#ifdef CONFIG_MEMCG
@@ -1271,6 +1319,9 @@ int blkcg_policy_register(struct blkcg_policy *pol)
mutex_unlock(&blkcg_pol_mutex);

/* everything is in place, add intf files for the new policy */
+ if (pol->dfl_cftypes)
+ WARN_ON(cgroup_add_dfl_cftypes(&io_cgrp_subsys,
+ pol->dfl_cftypes));
if (pol->legacy_cftypes)
WARN_ON(cgroup_add_legacy_cftypes(&io_cgrp_subsys,
pol->legacy_cftypes));
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index a8bb2fd..c75a263 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1265,6 +1265,117 @@ static struct cftype throtl_legacy_files[] = {
{ } /* terminate */
};

+static u64 tg_prfill_max(struct seq_file *sf, struct blkg_policy_data *pd,
+ int off)
+{
+ struct throtl_grp *tg = pd_to_tg(pd);
+ const char *dname = blkg_dev_name(pd->blkg);
+ char bufs[4][21] = { "max", "max", "max", "max" };
+
+ if (!dname)
+ return 0;
+ if (tg->bps[READ] == -1 && tg->bps[WRITE] == -1 &&
+ tg->iops[READ] == -1 && tg->iops[WRITE] == -1)
+ return 0;
+
+ if (tg->bps[READ] != -1)
+ snprintf(bufs[0], sizeof(bufs[0]), "%llu", tg->bps[READ]);
+ if (tg->bps[WRITE] != -1)
+ snprintf(bufs[1], sizeof(bufs[1]), "%llu", tg->bps[WRITE]);
+ if (tg->iops[READ] != -1)
+ snprintf(bufs[2], sizeof(bufs[2]), "%u", tg->iops[READ]);
+ if (tg->iops[WRITE] != -1)
+ snprintf(bufs[3], sizeof(bufs[3]), "%u", tg->iops[WRITE]);
+
+ seq_printf(sf, "%s rbps=%s wbps=%s riops=%s wiops=%s\n",
+ dname, bufs[0], bufs[1], bufs[2], bufs[3]);
+ return 0;
+}
+
+static int tg_print_max(struct seq_file *sf, void *v)
+{
+ blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), tg_prfill_max,
+ &blkcg_policy_throtl, seq_cft(sf)->private, false);
+ return 0;
+}
+
+static ssize_t tg_set_max(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ struct blkcg *blkcg = css_to_blkcg(of_css(of));
+ struct blkg_conf_ctx ctx;
+ struct throtl_grp *tg;
+ u64 v[4];
+ int ret;
+
+ ret = blkg_conf_prep(blkcg, &blkcg_policy_throtl, buf, &ctx);
+ if (ret)
+ return ret;
+
+ tg = blkg_to_tg(ctx.blkg);
+
+ v[0] = tg->bps[READ];
+ v[1] = tg->bps[WRITE];
+ v[2] = tg->iops[READ];
+ v[3] = tg->iops[WRITE];
+
+ while (true) {
+ char tok[27]; /* wiops=18446744073709551616 */
+ char *p;
+ u64 val = -1;
+ int len;
+
+ if (sscanf(ctx.body, "%26s%n", tok, &len) != 1)
+ break;
+ if (tok[0] == '\0')
+ break;
+ ctx.body += len;
+
+ ret = -EINVAL;
+ p = tok;
+ strsep(&p, "=");
+ if (!p || (sscanf(p, "%llu", &val) != 1 && strcmp(p, "max")))
+ goto out_finish;
+
+ ret = -ERANGE;
+ if (!val)
+ goto out_finish;
+
+ ret = -EINVAL;
+ if (!strcmp(tok, "rbps"))
+ v[0] = val;
+ else if (!strcmp(tok, "wbps"))
+ v[1] = val;
+ else if (!strcmp(tok, "riops"))
+ v[2] = min_t(u64, val, UINT_MAX);
+ else if (!strcmp(tok, "wiops"))
+ v[3] = min_t(u64, val, UINT_MAX);
+ else
+ goto out_finish;
+ }
+
+ tg->bps[READ] = v[0];
+ tg->bps[WRITE] = v[1];
+ tg->iops[READ] = v[2];
+ tg->iops[WRITE] = v[3];
+
+ tg_conf_updated(tg);
+ ret = 0;
+out_finish:
+ blkg_conf_finish(&ctx);
+ return ret ?: nbytes;
+}
+
+static struct cftype throtl_files[] = {
+ {
+ .name = "max",
+ .flags = CFTYPE_NOT_ON_ROOT,
+ .seq_show = tg_print_max,
+ .write = tg_set_max,
+ },
+ { } /* terminate */
+};
+
static void throtl_shutdown_wq(struct request_queue *q)
{
struct throtl_data *td = q->td;
@@ -1273,6 +1384,7 @@ static void throtl_shutdown_wq(struct request_queue *q)
}

static struct blkcg_policy blkcg_policy_throtl = {
+ .dfl_cftypes = throtl_files,
.legacy_cftypes = throtl_legacy_files,

.pd_alloc_fn = throtl_pd_alloc,
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 7a72301..97da571 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1740,7 +1740,7 @@ static int cfq_print_leaf_weight(struct seq_file *sf, void *v)

static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off,
- bool is_leaf_weight)
+ bool on_dfl, bool is_leaf_weight)
{
struct blkcg *blkcg = css_to_blkcg(of_css(of));
struct blkg_conf_ctx ctx;
@@ -1753,9 +1753,17 @@ static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of,
if (ret)
return ret;

- ret = -EINVAL;
- if (sscanf(ctx.body, "%llu", &v) != 1)
+ if (sscanf(ctx.body, "%llu", &v) == 1) {
+ /* require "default" on dfl */
+ ret = -ERANGE;
+ if (!v && on_dfl)
+ goto out_finish;
+ } else if (!strcmp(strim(ctx.body), "default")) {
+ v = 0;
+ } else {
+ ret = -EINVAL;
goto out_finish;
+ }

cfqg = blkg_to_cfqg(ctx.blkg);
cfqgd = blkcg_to_cfqgd(blkcg);
@@ -1779,13 +1787,13 @@ static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of,
static ssize_t cfqg_set_weight_device(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
- return __cfqg_set_weight_device(of, buf, nbytes, off, false);
+ return __cfqg_set_weight_device(of, buf, nbytes, off, false, false);
}

static ssize_t cfqg_set_leaf_weight_device(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
- return __cfqg_set_weight_device(of, buf, nbytes, off, true);
+ return __cfqg_set_weight_device(of, buf, nbytes, off, false, true);
}

static int __cfq_set_weight(struct cgroup_subsys_state *css, u64 val,
@@ -2103,6 +2111,48 @@ static struct cftype cfq_blkcg_legacy_files[] = {
#endif /* CONFIG_DEBUG_BLK_CGROUP */
{ } /* terminate */
};
+
+static int cfq_print_weight_on_dfl(struct seq_file *sf, void *v)
+{
+ struct blkcg *blkcg = css_to_blkcg(seq_css(sf));
+ struct cfq_group_data *cgd = blkcg_to_cfqgd(blkcg);
+
+ seq_printf(sf, "default %u\n", cgd->weight);
+ blkcg_print_blkgs(sf, blkcg, cfqg_prfill_weight_device,
+ &blkcg_policy_cfq, 0, false);
+ return 0;
+}
+
+static ssize_t cfq_set_weight_on_dfl(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ char *endp;
+ int ret;
+ u64 v;
+
+ buf = strim(buf);
+
+ /* "WEIGHT" or "default WEIGHT" sets the default weight */
+ v = simple_strtoull(buf, &endp, 0);
+ if (*endp == '\0' || sscanf(buf, "default %llu", &v) == 1) {
+ ret = __cfq_set_weight(of_css(of), v, false);
+ return ret ?: nbytes;
+ }
+
+ /* "MAJ:MIN WEIGHT" */
+ return __cfqg_set_weight_device(of, buf, nbytes, off, true, false);
+}
+
+static struct cftype cfq_blkcg_files[] = {
+ {
+ .name = "weight",
+ .flags = CFTYPE_NOT_ON_ROOT,
+ .seq_show = cfq_print_weight_on_dfl,
+ .write = cfq_set_weight_on_dfl,
+ },
+ { } /* terminate */
+};
+
#else /* GROUP_IOSCHED */
static struct cfq_group *cfq_lookup_cfqg(struct cfq_data *cfqd,
struct blkcg *blkcg)
@@ -4659,6 +4709,7 @@ static struct elevator_type iosched_cfq = {

#ifdef CONFIG_CFQ_GROUP_IOSCHED
static struct blkcg_policy blkcg_policy_cfq = {
+ .dfl_cftypes = cfq_blkcg_files,
.legacy_cftypes = cfq_blkcg_legacy_files,

.cpd_alloc_fn = cfq_cpd_alloc,
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index b270aef..9a7c4bd 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -148,6 +148,7 @@ typedef void (blkcg_pol_reset_pd_stats_fn)(struct blkg_policy_data *pd);
struct blkcg_policy {
int plid;
/* cgroup files for the policy */
+ struct cftype *dfl_cftypes;
struct cftype *legacy_cftypes;

/* operations */
--
2.4.3

2015-07-27 16:36:07

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCHSET block/for-4.3] blkcg: implement interface for the unified hierarchy

On Fri, Jul 24, 2015 at 02:43:44PM -0400, Tejun Heo wrote:
> Hello,
>
> blkcg interface grew to be the biggest of all controllers and
> unfortunately most inconsistent too. The interface files are
> inconsistent with a number of cloes duplicates. Some files have
> recursive variants while others don't. There's distinction between
> normal and leaf weights which isn't intuitive and there are a lot of
> stat knobs which don't make much sense outside of debugging and expose
> too much implementation details to userland.
>
> In the unified hierarchy, everything is always hierarchical and
> internal nodes can't have tasks rendering the two structural issues
> twisting the current interface. The interface has to be updated in a
> significant anyway and this is a good chance to revamp it as a whole.
> This patchset implements blkcg interface for the unified hierarchy.
>
> * blkcg is identified by "io" instead of "blkio" on the unified
> hierarchy. Given that the whole interface is updated anyway, the
> rename shouldn't carry noticeable conversion overhead.
>
> * The original interface consisted of 27 files is replaced with the
> following three files.
>
> blkio.stat : per-blkcg stats
> blkio.weight : per-cgroup and per-cgroup-queue weight settings
> blkio.max : per-cgroup-queue bps and iops max limits
>

Hi Tejun,

I browsed though the details of above knobs and it sounds great. It is
clean and much less number of knobs and files. You got rid of all the
debug CFQ knobs which is good. I was not happy with these either. Glad
to see that all the magic about leaf weight is gone. That was really
mind bending. Knob for reset stats is gone and instead of mutiple files
for configuration now we are using single file for R/W BPS/IOPS
configuration.

I will do some basic testing and see if something pops up.

Userspace will need to understand these new files but that's the
understanding anyway that unified hierarchy is different and needs
to be handled differently.

Thanks
Vivek

2015-07-27 18:13:24

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCHSET block/for-4.3] blkcg: implement interface for the unified hierarchy

Hello,

On Mon, Jul 27, 2015 at 12:12:09PM -0400, Vivek Goyal wrote:
> I browsed though the details of above knobs and it sounds great. It is
> clean and much less number of knobs and files. You got rid of all the
> debug CFQ knobs which is good. I was not happy with these either. Glad

Yeah, especially given that CFQ may be replaced by BFQ.

> to see that all the magic about leaf weight is gone. That was really
> mind bending. Knob for reset stats is gone and instead of mutiple files
> for configuration now we are using single file for R/W BPS/IOPS
> configuration.
>
> I will do some basic testing and see if something pops up.
>
> Userspace will need to understand these new files but that's the
> understanding anyway that unified hierarchy is different and needs
> to be handled differently.

For blkcg (and memory too), the usage model changes so drastically
that I don't think blkcg interface changes would matter much.
Everything is changed anyway.

Thanks.

--
tejun

2015-07-28 06:40:26

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH 02/10] cgroup: introduce cgroup_subsys->legacy_name

> @@ -1448,7 +1451,8 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
> }
>
> for_each_subsys(ss, i) {
> - if (strcmp(token, ss->name))
> + if (strcmp(token, ss->name) &&
> + strcmp(token, ss->legacy_name))
> continue;

As mounting with specified subsystems is only allowed in legacy hierarchy,
I think we should allow using leagcy name only?

> if (ss->disabled)
> continue;

2015-07-28 15:23:17

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 02/10] cgroup: introduce cgroup_subsys->legacy_name

On Tue, Jul 28, 2015 at 02:39:36PM +0800, Zefan Li wrote:
> > @@ -1448,7 +1451,8 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
> > }
> >
> > for_each_subsys(ss, i) {
> > - if (strcmp(token, ss->name))
> > + if (strcmp(token, ss->name) &&
> > + strcmp(token, ss->legacy_name))
> > continue;
>
> As mounting with specified subsystems is only allowed in legacy hierarchy,
> I think we should allow using leagcy name only?

Yeah, good point. Will update the patch.

Thanks.

--
tejun

2015-07-28 17:56:53

by Tejun Heo

[permalink] [raw]
Subject: [PATCH v2 02/10] cgroup: introduce cgroup_subsys->legacy_name

This allows cgroup subsystems to use a different name on the unified
hierarchy. cgroup_subsys->name is used on the unified hierarchy,
->legacy_name elsewhere. If ->legacy_name is not explicitly set, it's
automatically set to ->name and the userland visible behavior remains
unchanged.

v2: Make parse_cgroupfs_options() only consider ->legacy_name as mount
options are used only on legacy hierarchies. Suggested by Li
Zefan.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: [email protected]
---
git branch updated accordingly. Thanks.

include/linux/cgroup-defs.h | 3 +++
kernel/cgroup.c | 29 ++++++++++++++++++-----------
2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 8f5770a..7d0bb53 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -434,6 +434,9 @@ struct cgroup_subsys {
int id;
const char *name;

+ /* optional, initialized automatically during boot if not set */
+ const char *legacy_name;
+
/* link to parent, protected by cgroup_lock() */
struct cgroup_root *root;

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 6c85e6d..1276569 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1027,10 +1027,13 @@ static const struct file_operations proc_cgroupstats_operations;
static char *cgroup_file_name(struct cgroup *cgrp, const struct cftype *cft,
char *buf)
{
+ struct cgroup_subsys *ss = cft->ss;
+
if (cft->ss && !(cft->flags & CFTYPE_NO_PREFIX) &&
!(cgrp->root->flags & CGRP_ROOT_NOPREFIX))
snprintf(buf, CGROUP_FILE_NAME_MAX, "%s.%s",
- cft->ss->name, cft->name);
+ cgroup_on_dfl(cgrp) ? ss->name : ss->legacy_name,
+ cft->name);
else
strncpy(buf, cft->name, CGROUP_FILE_NAME_MAX);
return buf;
@@ -1335,7 +1338,7 @@ static int cgroup_show_options(struct seq_file *seq,
if (root != &cgrp_dfl_root)
for_each_subsys(ss, ssid)
if (root->subsys_mask & (1 << ssid))
- seq_printf(seq, ",%s", ss->name);
+ seq_printf(seq, ",%s", ss->legacy_name);
if (root->flags & CGRP_ROOT_NOPREFIX)
seq_puts(seq, ",noprefix");
if (root->flags & CGRP_ROOT_XATTR)
@@ -1448,7 +1451,7 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
}

for_each_subsys(ss, i) {
- if (strcmp(token, ss->name))
+ if (strcmp(token, ss->legacy_name))
continue;
if (ss->disabled)
continue;
@@ -4994,6 +4997,8 @@ int __init cgroup_init_early(void)

ss->id = i;
ss->name = cgroup_subsys_name[i];
+ if (!ss->legacy_name)
+ ss->legacy_name = cgroup_subsys_name[i];

if (ss->early_init)
cgroup_init_subsys(ss, true);
@@ -5141,7 +5146,7 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
for_each_subsys(ss, ssid)
if (root->subsys_mask & (1 << ssid))
seq_printf(m, "%s%s", count++ ? "," : "",
- ss->name);
+ ss->legacy_name);
if (strlen(root->name))
seq_printf(m, "%sname=%s", count ? "," : "",
root->name);
@@ -5181,7 +5186,7 @@ static int proc_cgroupstats_show(struct seq_file *m, void *v)

for_each_subsys(ss, i)
seq_printf(m, "%s\t%d\t%d\t%d\n",
- ss->name, ss->root->hierarchy_id,
+ ss->legacy_name, ss->root->hierarchy_id,
atomic_read(&ss->root->nr_cgrps), !ss->disabled);

mutex_unlock(&cgroup_mutex);
@@ -5403,12 +5408,14 @@ static int __init cgroup_disable(char *str)
continue;

for_each_subsys(ss, i) {
- if (!strcmp(token, ss->name)) {
- ss->disabled = 1;
- printk(KERN_INFO "Disabling %s control group"
- " subsystem\n", ss->name);
- break;
- }
+ if (strcmp(token, ss->name) &&
+ strcmp(token, ss->legacy_name))
+ continue;
+
+ ss->disabled = 1;
+ printk(KERN_INFO "Disabling %s control group subsystem\n",
+ ss->name);
+ break;
}
}
return 1;
--
2.4.3

2015-07-29 01:18:30

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH 01/10] cgroup: don't print subsystems for the default hierarchy

On 2015/7/25 2:43, Tejun Heo wrote:
> It doesn't make sense to print subsystems on mount option or
> /proc/PID/cgroup for the default hierarchy.
>
> * cgroup.controllers file at the root of the default hierarchy lists
> the currently attached controllers.
>
> * The default hierarchy is catch-all for unmounted subsystems.
>
> * The default hierarchy doesn't accept any mount options.
>
> Suppress subsystem printing on mount options and /proc/PID/cgroup for
> the default hierarchy.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Li Zefan <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: [email protected]

Acked

2015-07-29 01:18:43

by Zefan Li

[permalink] [raw]
Subject: Re: [PATCH 02/10] cgroup: introduce cgroup_subsys->legacy_name

On 2015/7/25 2:43, Tejun Heo wrote:
> This allows cgroup subsystems to use a different name on the unified
> hierarchy. cgroup_subsys->name is used on the unified hierarchy,
> ->legacy_name elsewhere. If ->legacy_name is not explicitly set, it's
> automatically set to ->name and the userland visible behavior remains
> unchanged.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Li Zefan <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: [email protected]

Acked

2015-07-30 22:58:29

by Tejun Heo

[permalink] [raw]
Subject: [PATCH v2 10/10] blkcg: implement interface for the unified hierarchy

>From 1618deedf2fb0788ae11ac544f45671ceb6e43ec Mon Sep 17 00:00:00 2001
From: Tejun Heo <[email protected]>
Date: Thu, 30 Jul 2015 18:51:53 -0400

blkcg interface grew to be the biggest of all controllers and
unfortunately most inconsistent too. The interface files are
inconsistent with a number of cloes duplicates. Some files have
recursive variants while others don't. There's distinction between
normal and leaf weights which isn't intuitive and there are a lot of
stat knobs which don't make much sense outside of debugging and expose
too much implementation details to userland.

In the unified hierarchy, everything is always hierarchical and
internal nodes can't have tasks rendering the two structural issues
twisting the current interface. The interface has to be updated in a
significant anyway and this is a good chance to revamp it as a whole.
This patch implements blkcg interface for the unified hierarchy.

* (from a previous patch) blkcg is identified by "io" instead of
"blkio" on the unified hierarchy. Given that the whole interface is
updated anyway, the rename shouldn't carry noticeable conversion
overhead.

* The original interface consisted of 27 files is replaced with the
following three files.

blkio.stat : per-blkcg stats
blkio.weight : per-cgroup and per-cgroup-queue weight settings
blkio.max : per-cgroup-queue bps and iops max limits

Documentation/cgroups/unified-hierarchy.txt updated accordingly.

v2: blkcg_policy->dfl_cftypes wasn't removed on
blkcg_policy_unregister() corrupting the cftypes list. Fixed.

Signed-off-by: Tejun Heo <[email protected]>
---
git branch updated accordingly.

Thanks.

Documentation/cgroups/unified-hierarchy.txt | 57 +++++++++++++-
block/blk-cgroup.c | 53 +++++++++++++
block/blk-throttle.c | 112 ++++++++++++++++++++++++++++
block/cfq-iosched.c | 61 +++++++++++++--
include/linux/blk-cgroup.h | 1 +
5 files changed, 277 insertions(+), 7 deletions(-)

diff --git a/Documentation/cgroups/unified-hierarchy.txt b/Documentation/cgroups/unified-hierarchy.txt
index 86847a7..4e23d4c 100644
--- a/Documentation/cgroups/unified-hierarchy.txt
+++ b/Documentation/cgroups/unified-hierarchy.txt
@@ -374,9 +374,62 @@ supported and the interface files "release_agent" and

5-3. Per-Controller Changes

-5-3-1. blkio
+5-3-1. io

-- blk-throttle becomes properly hierarchical.
+- blkio is renamed to io. The interface is overhauled anyway. The
+ new name is more in line with the other two major controllers, cpu
+ and memory, and better suited given that it may be used for cgroup
+ writeback without involving block layer.
+
+- Everything including stat is always hierarchical making separate
+ recursive stat files pointless and, as no internal node can have
+ tasks, leaf weights are meaningless. The operation model is
+ simplified and the interface is overhauled accordingly.
+
+ io.stat
+
+ The stat file. The reported stats are from the point where
+ bio's are issued to request_queue. The stats are counted
+ independent of which policies are enabled. Each line in the
+ file follows the following format. More fields may later be
+ added at the end.
+
+ $MAJ:$MIN rbytes=$RBYTES wbytes=$WBYTES rios=$RIOS wrios=$WIOS
+
+ io.weight
+
+ The weight setting, currently only available and effective if
+ cfq-iosched is in use for the target device. The weight is
+ between 10 and 1000 and defaults to 500. The first line
+ always contains the default weight in the following format to
+ use when per-device setting is missing.
+
+ default $WEIGHT
+
+ Subsequent lines list per-device weights of the following
+ format.
+
+ $MAJ:$MIN $WEIGHT
+
+ Writing "$WEIGHT" or "default $WEIGHT" changes the default
+ setting. Writing "$MAJ:$MIN $WEIGHT" sets per-device weight
+ while "$MAJ:$MIN default" clears it.
+
+ This file is available only on non-root cgroups.
+
+ io.max
+
+ The maximum bandwidth and/or iops setting, only available if
+ blk-throttle is enabled. The file is of the following format.
+
+ $MAJ:$MIN rbps=$RBPS wbps=$WBPS riops=$RIOPS wiops=$WIOPS
+
+ ${R|W}BPS are read/write bytes per second and ${R|W}IOPS are
+ read/write IOs per second. "max" indicates no limit. Writing
+ to the file follows the same format but the individual
+ settings may be ommitted or specified in any order.
+
+ This file is available only on non-root cgroups.


5-3-2. cpuset
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b5e72d7..88bdb73 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -854,6 +854,53 @@ void blkg_conf_finish(struct blkg_conf_ctx *ctx)
}
EXPORT_SYMBOL_GPL(blkg_conf_finish);

+static int blkcg_print_stat(struct seq_file *sf, void *v)
+{
+ struct blkcg *blkcg = css_to_blkcg(seq_css(sf));
+ struct blkcg_gq *blkg;
+
+ rcu_read_lock();
+
+ hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) {
+ const char *dname;
+ struct blkg_rwstat rwstat;
+ u64 rbytes, wbytes, rios, wios;
+
+ dname = blkg_dev_name(blkg);
+ if (!dname)
+ continue;
+
+ spin_lock_irq(blkg->q->queue_lock);
+
+ rwstat = blkg_rwstat_recursive_sum(blkg, NULL,
+ offsetof(struct blkcg_gq, stat_bytes));
+ rbytes = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_READ]);
+ wbytes = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_WRITE]);
+
+ rwstat = blkg_rwstat_recursive_sum(blkg, NULL,
+ offsetof(struct blkcg_gq, stat_ios));
+ rios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_READ]);
+ wios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_WRITE]);
+
+ spin_unlock_irq(blkg->q->queue_lock);
+
+ if (rbytes || wbytes || rios || wios)
+ seq_printf(sf, "%s rbytes=%llu wbytes=%llu rios=%llu wios=%llu\n",
+ dname, rbytes, wbytes, rios, wios);
+ }
+
+ rcu_read_unlock();
+ return 0;
+}
+
+struct cftype blkcg_files[] = {
+ {
+ .name = "stat",
+ .seq_show = blkcg_print_stat,
+ },
+ { } /* terminate */
+};
+
struct cftype blkcg_legacy_files[] = {
{
.name = "reset_stats",
@@ -1101,6 +1148,7 @@ struct cgroup_subsys io_cgrp_subsys = {
.css_offline = blkcg_css_offline,
.css_free = blkcg_css_free,
.can_attach = blkcg_can_attach,
+ .dfl_cftypes = blkcg_files,
.legacy_cftypes = blkcg_legacy_files,
.legacy_name = "blkio",
#ifdef CONFIG_MEMCG
@@ -1273,6 +1321,9 @@ int blkcg_policy_register(struct blkcg_policy *pol)
mutex_unlock(&blkcg_pol_mutex);

/* everything is in place, add intf files for the new policy */
+ if (pol->dfl_cftypes)
+ WARN_ON(cgroup_add_dfl_cftypes(&io_cgrp_subsys,
+ pol->dfl_cftypes));
if (pol->legacy_cftypes)
WARN_ON(cgroup_add_legacy_cftypes(&io_cgrp_subsys,
pol->legacy_cftypes));
@@ -1312,6 +1363,8 @@ void blkcg_policy_unregister(struct blkcg_policy *pol)
goto out_unlock;

/* kill the intf files first */
+ if (pol->dfl_cftypes)
+ cgroup_rm_cftypes(pol->dfl_cftypes);
if (pol->legacy_cftypes)
cgroup_rm_cftypes(pol->legacy_cftypes);

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index a8bb2fd..c75a263 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1265,6 +1265,117 @@ static struct cftype throtl_legacy_files[] = {
{ } /* terminate */
};

+static u64 tg_prfill_max(struct seq_file *sf, struct blkg_policy_data *pd,
+ int off)
+{
+ struct throtl_grp *tg = pd_to_tg(pd);
+ const char *dname = blkg_dev_name(pd->blkg);
+ char bufs[4][21] = { "max", "max", "max", "max" };
+
+ if (!dname)
+ return 0;
+ if (tg->bps[READ] == -1 && tg->bps[WRITE] == -1 &&
+ tg->iops[READ] == -1 && tg->iops[WRITE] == -1)
+ return 0;
+
+ if (tg->bps[READ] != -1)
+ snprintf(bufs[0], sizeof(bufs[0]), "%llu", tg->bps[READ]);
+ if (tg->bps[WRITE] != -1)
+ snprintf(bufs[1], sizeof(bufs[1]), "%llu", tg->bps[WRITE]);
+ if (tg->iops[READ] != -1)
+ snprintf(bufs[2], sizeof(bufs[2]), "%u", tg->iops[READ]);
+ if (tg->iops[WRITE] != -1)
+ snprintf(bufs[3], sizeof(bufs[3]), "%u", tg->iops[WRITE]);
+
+ seq_printf(sf, "%s rbps=%s wbps=%s riops=%s wiops=%s\n",
+ dname, bufs[0], bufs[1], bufs[2], bufs[3]);
+ return 0;
+}
+
+static int tg_print_max(struct seq_file *sf, void *v)
+{
+ blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), tg_prfill_max,
+ &blkcg_policy_throtl, seq_cft(sf)->private, false);
+ return 0;
+}
+
+static ssize_t tg_set_max(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ struct blkcg *blkcg = css_to_blkcg(of_css(of));
+ struct blkg_conf_ctx ctx;
+ struct throtl_grp *tg;
+ u64 v[4];
+ int ret;
+
+ ret = blkg_conf_prep(blkcg, &blkcg_policy_throtl, buf, &ctx);
+ if (ret)
+ return ret;
+
+ tg = blkg_to_tg(ctx.blkg);
+
+ v[0] = tg->bps[READ];
+ v[1] = tg->bps[WRITE];
+ v[2] = tg->iops[READ];
+ v[3] = tg->iops[WRITE];
+
+ while (true) {
+ char tok[27]; /* wiops=18446744073709551616 */
+ char *p;
+ u64 val = -1;
+ int len;
+
+ if (sscanf(ctx.body, "%26s%n", tok, &len) != 1)
+ break;
+ if (tok[0] == '\0')
+ break;
+ ctx.body += len;
+
+ ret = -EINVAL;
+ p = tok;
+ strsep(&p, "=");
+ if (!p || (sscanf(p, "%llu", &val) != 1 && strcmp(p, "max")))
+ goto out_finish;
+
+ ret = -ERANGE;
+ if (!val)
+ goto out_finish;
+
+ ret = -EINVAL;
+ if (!strcmp(tok, "rbps"))
+ v[0] = val;
+ else if (!strcmp(tok, "wbps"))
+ v[1] = val;
+ else if (!strcmp(tok, "riops"))
+ v[2] = min_t(u64, val, UINT_MAX);
+ else if (!strcmp(tok, "wiops"))
+ v[3] = min_t(u64, val, UINT_MAX);
+ else
+ goto out_finish;
+ }
+
+ tg->bps[READ] = v[0];
+ tg->bps[WRITE] = v[1];
+ tg->iops[READ] = v[2];
+ tg->iops[WRITE] = v[3];
+
+ tg_conf_updated(tg);
+ ret = 0;
+out_finish:
+ blkg_conf_finish(&ctx);
+ return ret ?: nbytes;
+}
+
+static struct cftype throtl_files[] = {
+ {
+ .name = "max",
+ .flags = CFTYPE_NOT_ON_ROOT,
+ .seq_show = tg_print_max,
+ .write = tg_set_max,
+ },
+ { } /* terminate */
+};
+
static void throtl_shutdown_wq(struct request_queue *q)
{
struct throtl_data *td = q->td;
@@ -1273,6 +1384,7 @@ static void throtl_shutdown_wq(struct request_queue *q)
}

static struct blkcg_policy blkcg_policy_throtl = {
+ .dfl_cftypes = throtl_files,
.legacy_cftypes = throtl_legacy_files,

.pd_alloc_fn = throtl_pd_alloc,
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 7a72301..97da571 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1740,7 +1740,7 @@ static int cfq_print_leaf_weight(struct seq_file *sf, void *v)

static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off,
- bool is_leaf_weight)
+ bool on_dfl, bool is_leaf_weight)
{
struct blkcg *blkcg = css_to_blkcg(of_css(of));
struct blkg_conf_ctx ctx;
@@ -1753,9 +1753,17 @@ static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of,
if (ret)
return ret;

- ret = -EINVAL;
- if (sscanf(ctx.body, "%llu", &v) != 1)
+ if (sscanf(ctx.body, "%llu", &v) == 1) {
+ /* require "default" on dfl */
+ ret = -ERANGE;
+ if (!v && on_dfl)
+ goto out_finish;
+ } else if (!strcmp(strim(ctx.body), "default")) {
+ v = 0;
+ } else {
+ ret = -EINVAL;
goto out_finish;
+ }

cfqg = blkg_to_cfqg(ctx.blkg);
cfqgd = blkcg_to_cfqgd(blkcg);
@@ -1779,13 +1787,13 @@ static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of,
static ssize_t cfqg_set_weight_device(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
- return __cfqg_set_weight_device(of, buf, nbytes, off, false);
+ return __cfqg_set_weight_device(of, buf, nbytes, off, false, false);
}

static ssize_t cfqg_set_leaf_weight_device(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
- return __cfqg_set_weight_device(of, buf, nbytes, off, true);
+ return __cfqg_set_weight_device(of, buf, nbytes, off, false, true);
}

static int __cfq_set_weight(struct cgroup_subsys_state *css, u64 val,
@@ -2103,6 +2111,48 @@ static struct cftype cfq_blkcg_legacy_files[] = {
#endif /* CONFIG_DEBUG_BLK_CGROUP */
{ } /* terminate */
};
+
+static int cfq_print_weight_on_dfl(struct seq_file *sf, void *v)
+{
+ struct blkcg *blkcg = css_to_blkcg(seq_css(sf));
+ struct cfq_group_data *cgd = blkcg_to_cfqgd(blkcg);
+
+ seq_printf(sf, "default %u\n", cgd->weight);
+ blkcg_print_blkgs(sf, blkcg, cfqg_prfill_weight_device,
+ &blkcg_policy_cfq, 0, false);
+ return 0;
+}
+
+static ssize_t cfq_set_weight_on_dfl(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ char *endp;
+ int ret;
+ u64 v;
+
+ buf = strim(buf);
+
+ /* "WEIGHT" or "default WEIGHT" sets the default weight */
+ v = simple_strtoull(buf, &endp, 0);
+ if (*endp == '\0' || sscanf(buf, "default %llu", &v) == 1) {
+ ret = __cfq_set_weight(of_css(of), v, false);
+ return ret ?: nbytes;
+ }
+
+ /* "MAJ:MIN WEIGHT" */
+ return __cfqg_set_weight_device(of, buf, nbytes, off, true, false);
+}
+
+static struct cftype cfq_blkcg_files[] = {
+ {
+ .name = "weight",
+ .flags = CFTYPE_NOT_ON_ROOT,
+ .seq_show = cfq_print_weight_on_dfl,
+ .write = cfq_set_weight_on_dfl,
+ },
+ { } /* terminate */
+};
+
#else /* GROUP_IOSCHED */
static struct cfq_group *cfq_lookup_cfqg(struct cfq_data *cfqd,
struct blkcg *blkcg)
@@ -4659,6 +4709,7 @@ static struct elevator_type iosched_cfq = {

#ifdef CONFIG_CFQ_GROUP_IOSCHED
static struct blkcg_policy blkcg_policy_cfq = {
+ .dfl_cftypes = cfq_blkcg_files,
.legacy_cftypes = cfq_blkcg_legacy_files,

.cpd_alloc_fn = cfq_cpd_alloc,
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index b270aef..9a7c4bd 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -148,6 +148,7 @@ typedef void (blkcg_pol_reset_pd_stats_fn)(struct blkg_policy_data *pd);
struct blkcg_policy {
int plid;
/* cgroup files for the policy */
+ struct cftype *dfl_cftypes;
struct cftype *legacy_cftypes;

/* operations */
--
2.4.3

2015-08-18 21:01:29

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 01/10] cgroup: don't print subsystems for the default hierarchy

On Fri, Jul 24, 2015 at 02:43:45PM -0400, Tejun Heo wrote:
> It doesn't make sense to print subsystems on mount option or
> /proc/PID/cgroup for the default hierarchy.
>
> * cgroup.controllers file at the root of the default hierarchy lists
> the currently attached controllers.
>
> * The default hierarchy is catch-all for unmounted subsystems.
>
> * The default hierarchy doesn't accept any mount options.
>
> Suppress subsystem printing on mount options and /proc/PID/cgroup for
> the default hierarchy.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Li Zefan <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: [email protected]

Applied to cgroup/for-4.3-unified-base.

Thanks.

--
tejun

2015-08-18 21:01:51

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v2 02/10] cgroup: introduce cgroup_subsys->legacy_name

On Tue, Jul 28, 2015 at 01:56:47PM -0400, Tejun Heo wrote:
> This allows cgroup subsystems to use a different name on the unified
> hierarchy. cgroup_subsys->name is used on the unified hierarchy,
> ->legacy_name elsewhere. If ->legacy_name is not explicitly set, it's
> automatically set to ->name and the userland visible behavior remains
> unchanged.
>
> v2: Make parse_cgroupfs_options() only consider ->legacy_name as mount
> options are used only on legacy hierarchies. Suggested by Li
> Zefan.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Li Zefan <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: [email protected]

Applied to cgroup/for-4.3-unified-base.

Thanks.

--
tejun