2018-11-12 09:59:35

by Paolo Valente

[permalink] [raw]
Subject: [PATCH 00/12] unify the interface of the proportional-share policy in blkio/io

Hi Jens, Tejun, all,
about nine months ago, we agreed on a solution for unifying the
interface of the proportional-share policy in blkio/io [1]. Angelo
and I finally completed it. Let me briefly recall the problem and the
solution.

The current implementation of cgroups doesn't allow two or more
entities, e.g., I/O schedulers, to share the same files. So, if CFQ
creates its files for the proportional-share policy, such as, e.g,
weight files for blkio/io groups, BFQ cannot attach somehow to them.
Thus, to enable people to set group weights with BFQ, I resorted to
making BFQ create its own version of these common files, by prepending
a bfq prefix.

Actually, no legacy code uses these different names, or is likely to
do so. Having these two sets of names is simply a source of
confusion, as pointed out also, e.g., by Lennart Poettering (CCed
here), and acknowledged by Tejun [2].

In [1] we agreed on a solution that solves this problem, by actually
making it possible to share cgroups files. Both writing to and
reading from a shared file trigger the appropriate operation for each
of the entities that share the file. In particular, in case of
reading,
- if all entities produce the same output, the this common output is
shown only once;
- if the outputs differ, then every per-entity output is shown,
preceded by the name of the entity that produced that output.

With this solution, legacy code that, e.g., sets group weights, just
works, regardless of the I/O scheduler actually implementing
proportional share.

But note that this extension is not restricted to only blkio/io. The
general group interface now enables files to be shared among multiple
entities of any kind.

(I have also added a patch to fix some clerical errors in bfq doc,
which I found while making the latter consistent with the new
interface.)

Thanks,
Paolo

[1] https://lkml.org/lkml/2018/1/4/667
[2] https://github.com/systemd/systemd/issues/7057

Angelo Ruocco (7):
kernfs: add function to find kernfs_node without increasing ref
counter
cgroup: link cftypes of the same subsystem with the same name
cgroup: add owner name to cftypes
block, bfq: align min and default weights with cfq
cgroup: make all functions of all cftypes be invoked
block, cfq: allow cgroup files to be shared
block, throttle: allow sharing cgroup statistic files

Paolo Valente (5):
cgroup: add hook seq_show_cft with also the owning cftype as parameter
block, cgroup: pass cftype to functions that need to use it
block, bfq: use standard file names for the proportional-share policy
doc, bfq-iosched: fix a few clerical errors
doc, bfq-iosched: make it consistent with the new cgroup interface

Documentation/block/bfq-iosched.txt | 31 +++--
block/bfq-cgroup.c | 148 +++++++++++++-------
block/bfq-iosched.h | 4 +-
block/blk-cgroup.c | 22 +--
block/blk-throttle.c | 24 ++--
block/cfq-iosched.c | 105 +++++++++++----
fs/kernfs/dir.c | 13 ++
include/linux/blk-cgroup.h | 10 +-
include/linux/cgroup-defs.h | 14 +-
include/linux/cgroup.h | 13 ++
include/linux/kernfs.h | 7 +
kernel/cgroup/cgroup.c | 262 +++++++++++++++++++++++++++++-------
12 files changed, 483 insertions(+), 170 deletions(-)

--
2.16.1


2018-11-12 09:58:18

by Paolo Valente

[permalink] [raw]
Subject: [PATCH 11/12] doc, bfq-iosched: fix a few clerical errors

This commit fixes a few clerical errors in
Documentation/block/bfq-iosched.txt.

Signed-off-by: Paolo Valente <[email protected]>
---
Documentation/block/bfq-iosched.txt | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/block/bfq-iosched.txt b/Documentation/block/bfq-iosched.txt
index 8d8d8f06cab2..6d7dd5ab8554 100644
--- a/Documentation/block/bfq-iosched.txt
+++ b/Documentation/block/bfq-iosched.txt
@@ -42,7 +42,7 @@ sustainable throughputs, on the same systems as above:

BFQ works for multi-queue devices too.

-The table of contents follow. Impatients can just jump to Section 3.
+The table of contents follows. Impatients can just jump to Section 3.

CONTENTS

@@ -51,7 +51,7 @@ CONTENTS
1-2 Server systems
2. How does BFQ work?
3. What are BFQ's tunables and how to properly configure BFQ?
-4. BFQ group scheduling
+4. Group scheduling with BFQ
4-1 Service guarantees provided
4-2 Interface

@@ -294,7 +294,7 @@ too.
per-process ioprio and weight
-----------------------------

-Unless the cgroups interface is used (see "4. BFQ group scheduling"),
+Unless the cgroups interface is used (see "4. Group scheduling with BFQ"),
weights can be assigned to processes only indirectly, through I/O
priorities, and according to the relation:
weight = (IOPRIO_BE_NR - ioprio) * 10.
--
2.16.1


2018-11-12 09:58:20

by Paolo Valente

[permalink] [raw]
Subject: [PATCH 12/12] doc, bfq-iosched: make it consistent with the new cgroup interface

BFQ now shares interface files with CFQ, for the proportional-share
policy. Make documentation consistent with that.

Signed-off-by: Paolo Valente <[email protected]>
---
Documentation/block/bfq-iosched.txt | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/Documentation/block/bfq-iosched.txt b/Documentation/block/bfq-iosched.txt
index 6d7dd5ab8554..b1faf091266e 100644
--- a/Documentation/block/bfq-iosched.txt
+++ b/Documentation/block/bfq-iosched.txt
@@ -508,12 +508,11 @@ process.
To get proportional sharing of bandwidth with BFQ for a given device,
BFQ must of course be the active scheduler for that device.

-Within each group directory, the names of the files associated with
-BFQ-specific cgroup parameters and stats begin with the "bfq."
-prefix. So, with cgroups-v1 or cgroups-v2, the full prefix for
-BFQ-specific files is "blkio.bfq." or "io.bfq." For example, the group
-parameter to set the weight of a group with BFQ is blkio.bfq.weight
-or io.bfq.weight.
+BFQ shares interface files with CFQ. So, if one reads/writes a file
+provided by the proportional share policy for a group, then the
+associated operation is performed by/on BFQ for each device where BFQ
+is the active scheduler, and by/on CFQ for each device where CFQ is
+the active scheduler.

As for cgroups-v1 (blkio controller), the exact set of stat files
created, and kept up-to-date by bfq, depends on whether
@@ -521,13 +520,13 @@ CONFIG_DEBUG_BLK_CGROUP is set. If it is set, then bfq creates all
the stat files documented in
Documentation/cgroup-v1/blkio-controller.txt. If, instead,
CONFIG_DEBUG_BLK_CGROUP is not set, then bfq creates only the files
-blkio.bfq.io_service_bytes
-blkio.bfq.io_service_bytes_recursive
-blkio.bfq.io_serviced
-blkio.bfq.io_serviced_recursive
+blkio.io_service_bytes
+blkio.io_service_bytes_recursive
+blkio.io_serviced
+blkio.io_serviced_recursive

The value of CONFIG_DEBUG_BLK_CGROUP greatly influences the maximum
-throughput sustainable with bfq, because updating the blkio.bfq.*
+throughput sustainable with BFQ, because updating the blkio.*
stats is rather costly, especially for some of the stats enabled by
CONFIG_DEBUG_BLK_CGROUP.

@@ -536,8 +535,8 @@ Parameters to set

For each group, there is only the following parameter to set.

-weight (namely blkio.bfq.weight or io.bfq-weight): the weight of the
-group inside its parent. Available values: 1..10000 (default 100). The
+weight (namely blkio.weight or io.weight): the weight of the group
+inside its parent. Available values: 1..10000 (default 100). The
linear mapping between ioprio and weights, described at the beginning
of the tunable section, is still valid, but all weights higher than
IOPRIO_BE_NR*10 are mapped to ioprio 0.
--
2.16.1


2018-11-12 09:58:44

by Paolo Valente

[permalink] [raw]
Subject: [PATCH 08/12] block, cfq: allow cgroup files to be shared

From: Angelo Ruocco <[email protected]>

Some of the files exposed in a cgroup by cfq have the same meaning as
the files exposed by bfq and throttle.

This commit allows these files to be shared.

Signed-off-by: Angelo Ruocco <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
block/cfq-iosched.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 47 insertions(+), 2 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index af0c59c2dcde..f046039bdfc6 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -2016,7 +2016,8 @@ static struct cftype cfq_blkcg_legacy_files[] = {
},
{
.name = "weight",
- .flags = CFTYPE_NOT_ON_ROOT,
+ .owner_name = "cfq",
+ .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_SHARES_FILE,
.seq_show = cfq_print_weight,
.write_u64 = cfq_set_weight,
},
@@ -2035,40 +2036,56 @@ static struct cftype cfq_blkcg_legacy_files[] = {
/* statistics, covers only the tasks in the cfqg */
{
.name = "time",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct cfq_group, stats.time),
.seq_show_cft = cfqg_print_stat,
},
{
.name = "sectors",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.seq_show = cfqg_print_stat_sectors,
},
{
.name = "io_service_bytes",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = (unsigned long)&blkcg_policy_cfq,
.seq_show_cft = blkg_print_stat_bytes,
},
{
.name = "io_serviced",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = (unsigned long)&blkcg_policy_cfq,
.seq_show_cft = blkg_print_stat_ios,
},
{
.name = "io_service_time",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct cfq_group, stats.service_time),
.seq_show_cft = cfqg_print_rwstat,
},
{
.name = "io_wait_time",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct cfq_group, stats.wait_time),
.seq_show_cft = cfqg_print_rwstat,
},
{
.name = "io_merged",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct cfq_group, stats.merged),
.seq_show_cft = cfqg_print_rwstat,
},
{
.name = "io_queued",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct cfq_group, stats.queued),
.seq_show_cft = cfqg_print_rwstat,
},
@@ -2076,70 +2093,97 @@ static struct cftype cfq_blkcg_legacy_files[] = {
/* the same statictics which cover the cfqg and its descendants */
{
.name = "time_recursive",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct cfq_group, stats.time),
.seq_show_cft = cfqg_print_stat_recursive,
},
{
.name = "sectors_recursive",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.seq_show = cfqg_print_stat_sectors_recursive,
},
{
.name = "io_service_bytes_recursive",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = (unsigned long)&blkcg_policy_cfq,
.seq_show_cft = blkg_print_stat_bytes_recursive,
},
{
.name = "io_serviced_recursive",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = (unsigned long)&blkcg_policy_cfq,
.seq_show_cft = blkg_print_stat_ios_recursive,
},
{
.name = "io_service_time_recursive",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct cfq_group, stats.service_time),
.seq_show_cft = cfqg_print_rwstat_recursive,
},
{
.name = "io_wait_time_recursive",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct cfq_group, stats.wait_time),
.seq_show_cft = cfqg_print_rwstat_recursive,
},
{
.name = "io_merged_recursive",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct cfq_group, stats.merged),
.seq_show_cft = cfqg_print_rwstat_recursive,
},
{
.name = "io_queued_recursive",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct cfq_group, stats.queued),
.seq_show_cft = cfqg_print_rwstat_recursive,
},
#ifdef CONFIG_DEBUG_BLK_CGROUP
{
.name = "avg_queue_size",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.seq_show = cfqg_print_avg_queue_size,
},
{
.name = "group_wait_time",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct cfq_group, stats.group_wait_time),
.seq_show_cft = cfqg_print_stat,
},
{
.name = "idle_time",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct cfq_group, stats.idle_time),
.seq_show_cft = cfqg_print_stat,
},
{
.name = "empty_time",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct cfq_group, stats.empty_time),
.seq_show_cft = cfqg_print_stat,
},
{
.name = "dequeue",
+ .owner_name = "cfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct cfq_group, stats.dequeue),
.seq_show_cft = cfqg_print_stat,
},
{
.name = "unaccounted_time",
+ .owner_name = "cfq",
.private = offsetof(struct cfq_group, stats.unaccounted_time),
.seq_show_cft = cfqg_print_stat,
},
@@ -2181,7 +2225,8 @@ static ssize_t cfq_set_weight_on_dfl(struct kernfs_open_file *of,
static struct cftype cfq_blkcg_files[] = {
{
.name = "weight",
- .flags = CFTYPE_NOT_ON_ROOT,
+ .owner_name = "cfq",
+ .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_SHARES_FILE,
.seq_show = cfq_print_weight_on_dfl,
.write = cfq_set_weight_on_dfl,
},
--
2.16.1


2018-11-12 09:58:48

by Paolo Valente

[permalink] [raw]
Subject: [PATCH 07/12] cgroup: make all functions of all cftypes be invoked

From: Angelo Ruocco <[email protected]>

When two or more policies share a file their respective cftypes are
linked together. The allowed operations on those files are: open,
release, write and show, mapped to the functions defined in the
cftypes.

This commit makes the cgroup core invoke, whenever one of those
operations is requested, the respective function of all the cftypes
linked together.

Signed-off-by: Angelo Ruocco <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
kernel/cgroup/cgroup.c | 181 ++++++++++++++++++++++++++++++++++++-------------
1 file changed, 132 insertions(+), 49 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index e3cc437669a8..6d4cfd6395ec 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3481,66 +3481,107 @@ static int cgroup_cpu_pressure_show(struct seq_file *seq, void *v)
static int cgroup_file_open(struct kernfs_open_file *of)
{
struct cftype *cft = of->kn->priv;
+ struct cftype *n;
+ int ret = 0;

+ list_for_each_cft(cft, n) {
+ if (cft->open)
+ ret = cft->open(of);
+ /*
+ * If there has been an error with the open function of one of
+ * the cft associated with the file, we call the release
+ * function of all the cftype associated to cft whose open
+ * function succeded.
+ */
+ if (ret) {
+ struct cftype *c = of->kn->priv;
+ struct cftype *n;
+
+ list_for_each_cft(c, n) {
+ if (cft == c)
+ break;
+ if (c->release)
+ c->release(of);
+ }
+ break;
+ }
+ }

- if (cft->open)
- return cft->open(of);
- return 0;
+ return ret;
}

static void cgroup_file_release(struct kernfs_open_file *of)
{
struct cftype *cft = of->kn->priv;
+ struct cftype *n;

- if (cft->release)
- cft->release(of);
+ list_for_each_cft(cft, n)
+ if (cft->release)
+ cft->release(of);
}

+/*
+ * Call all the write functions of the cftypes associated with the file.
+ *
+ * When a write fails, don't keep trying to write into the file via the write
+ * functions of the other cftypes associated with it.
+ */
static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
size_t nbytes, loff_t off)
{
struct cgroup_namespace *ns = current->nsproxy->cgroup_ns;
struct cgroup *cgrp = of->kn->parent->priv;
struct cftype *cft = of->kn->priv;
+ struct cftype *n;
struct cgroup_subsys_state *css;
- int ret;
+ int ret = 0;

- /*
- * If namespaces are delegation boundaries, disallow writes to
- * files in an non-init namespace root from inside the namespace
- * except for the files explicitly marked delegatable -
- * cgroup.procs and cgroup.subtree_control.
- */
- if ((cgrp->root->flags & CGRP_ROOT_NS_DELEGATE) &&
- !(cft->flags & CFTYPE_NS_DELEGATABLE) &&
- ns != &init_cgroup_ns && ns->root_cset->dfl_cgrp == cgrp)
- return -EPERM;
+ list_for_each_cft(cft, n) {
+ /*
+ * If namespaces are delegation boundaries, disallow writes to
+ * files in an non-init namespace root from inside the
+ * namespace except for the files explicitly marked
+ * delegatable - cgroup.procs and cgroup.subtree_control.
+ */
+ if ((cgrp->root->flags & CGRP_ROOT_NS_DELEGATE) &&
+ !(cft->flags & CFTYPE_NS_DELEGATABLE) &&
+ ns != &init_cgroup_ns && ns->root_cset->dfl_cgrp == cgrp)
+ return -EPERM;

- if (cft->write)
- return cft->write(of, buf, nbytes, off);
+ if (cft->write) {
+ ret = cft->write(of, buf, nbytes, off);

- /*
- * kernfs guarantees that a file isn't deleted with operations in
- * flight, which means that the matching css is and stays alive and
- * doesn't need to be pinned. The RCU locking is not necessary
- * either. It's just for the convenience of using cgroup_css().
- */
- rcu_read_lock();
- css = cgroup_css(cgrp, cft->ss);
- rcu_read_unlock();
+ if (ret)
+ break;
+ continue;
+ }

- if (cft->write_u64) {
- unsigned long long v;
- ret = kstrtoull(buf, 0, &v);
- if (!ret)
- ret = cft->write_u64(css, cft, v);
- } else if (cft->write_s64) {
- long long v;
- ret = kstrtoll(buf, 0, &v);
- if (!ret)
- ret = cft->write_s64(css, cft, v);
- } else {
- ret = -EINVAL;
+ /*
+ * kernfs guarantees that a file isn't deleted with operations
+ * in flight, which means that the matching css is and stays
+ * alive and doesn't need to be pinned. The RCU locking is not
+ * necessary either. It's just for the convenience of using
+ * cgroup_css().
+ */
+ rcu_read_lock();
+ css = cgroup_css(cgrp, cft->ss);
+ rcu_read_unlock();
+
+ if (cft->write_u64) {
+ unsigned long long v;
+
+ ret = kstrtoull(buf, 0, &v);
+ if (!ret)
+ ret = cft->write_u64(css, cft, v);
+ } else if (cft->write_s64) {
+ long long v;
+
+ ret = kstrtoll(buf, 0, &v);
+ if (!ret)
+ ret = cft->write_s64(css, cft, v);
+ } else {
+ return -EINVAL;
+ }
}

return ret ?: nbytes;
@@ -3562,22 +3603,64 @@ static void cgroup_seqfile_stop(struct seq_file *seq, void *v)
seq_cft(seq)->seq_stop(seq, v);
}

+/*
+ * A file shared by more cftypes may be showing different values. In that case
+ * call all the show functions and print the name of the owner that defined
+ * them.
+ */
static int cgroup_seqfile_show(struct seq_file *m, void *arg)
{
struct cftype *cft = seq_cft(m);
+ struct cftype *n;
struct cgroup_subsys_state *css = seq_css(m);
+ char *first_seqshow_str = NULL;
+ size_t first_str_size = 0;
+ size_t current_str_size = 0;
int ret = 0;

- if (cft->seq_show)
- ret = cft->seq_show(m, arg);
- else if (cft->seq_show_cft)
- ret = cft->seq_show_cft(m, cft, arg);
- else if (cft->read_u64)
- seq_printf(m, "%llu\n", cft->read_u64(css, cft));
- else if (cft->read_s64)
- seq_printf(m, "%lld\n", cft->read_s64(css, cft));
- else
- ret = -EINVAL;
+ list_for_each_cft(cft, n) {
+ if (cft->seq_show) {
+ ret = cft->seq_show(m, arg);
+ if (ret)
+ break;
+ } else if (cft->seq_show_cft) {
+ ret = cft->seq_show_cft(m, cft, arg);
+ if (ret)
+ break;
+ } else if (cft->read_u64) {
+ seq_printf(m, "%llu\n", cft->read_u64(css, cft));
+ } else if (cft->read_s64) {
+ seq_printf(m, "%lld\n", cft->read_s64(css, cft));
+ } else {
+ ret = -EINVAL;
+ break;
+ }
+ current_str_size = m->count - current_str_size;
+
+ if (first_seqshow_str == NULL) {
+ first_seqshow_str = kmalloc(m->size, GFP_KERNEL);
+ first_str_size = m->count;
+ strcpy(first_seqshow_str, m->buf);
+ first_str_size = m->count;
+ } else if (strcmp(first_seqshow_str,
+ m->buf + m->count - current_str_size)) {
+ first_str_size = -1;
+ }
+
+ if (current_str_size) {
+ seq_printf(m, " - %s\n", cft->owner_name);
+ current_str_size = m->count;
+ }
+ }
+
+ /*
+ * If all the cft->seqfile_show/read are equal, truncate the
+ * output of the seqfile to the length of the first string.
+ */
+ if (first_str_size != -1)
+ m->count = first_str_size;
+
+ kfree(first_seqshow_str);
return ret;
}

--
2.16.1


2018-11-12 09:58:54

by Paolo Valente

[permalink] [raw]
Subject: [PATCH 06/12] block, bfq: align min and default weights with cfq

From: Angelo Ruocco <[email protected]>

The I/O schedulers bfq and cfq expose a cgroup attribute with the same
meaning: weight.

This commit changes bfq default and min weights to match the ones set
by cfq.

Signed-off-by: Angelo Ruocco <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 77651d817ecd..249d8128d3ee 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -22,13 +22,13 @@
#define BFQ_IOPRIO_CLASSES 3
#define BFQ_CL_IDLE_TIMEOUT (HZ/5)

-#define BFQ_MIN_WEIGHT 1
+#define BFQ_MIN_WEIGHT 10
#define BFQ_MAX_WEIGHT 1000
#define BFQ_WEIGHT_CONVERSION_COEFF 10

#define BFQ_DEFAULT_QUEUE_IOPRIO 4

-#define BFQ_WEIGHT_LEGACY_DFL 100
+#define BFQ_WEIGHT_LEGACY_DFL 500
#define BFQ_DEFAULT_GRP_IOPRIO 0
#define BFQ_DEFAULT_GRP_CLASS IOPRIO_CLASS_BE

--
2.16.1


2018-11-12 09:58:57

by Paolo Valente

[permalink] [raw]
Subject: [PATCH 03/12] block, cgroup: pass cftype to functions that need to use it

Some seq_show functions need to access the cftype they belong to, for
retrieving the data to show. These functions get their cftype by using
the seq_cft accessor for the seq_file. This solution is no longer
viable in case a seq_file is shared among more than one cftype,
because the accessor always returns (only) the first of the cftypes
sharing the seq_file.

This commit enables these seq_show functions to be passed their
cftype, by replacing their prototype with that of the newly defined
seq_show_cft hook, and by invoking these functions through this new
hook.

Signed-off-by: Angelo Ruocco <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-cgroup.c | 54 +++++++++++++++++++++++---------------------
block/blk-cgroup.c | 22 +++++++++++-------
block/blk-throttle.c | 8 +++----
block/cfq-iosched.c | 56 ++++++++++++++++++++++++----------------------
include/linux/blk-cgroup.h | 10 +++++----
5 files changed, 81 insertions(+), 69 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index d9a7916ff0ab..50b2d7ba6b9d 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -918,17 +918,17 @@ static ssize_t bfq_io_set_weight(struct kernfs_open_file *of,
}

#ifdef CONFIG_DEBUG_BLK_CGROUP
-static int bfqg_print_stat(struct seq_file *sf, void *v)
+static int bfqg_print_stat(struct seq_file *sf, struct cftype *cft, void *v)
{
blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_stat,
- &blkcg_policy_bfq, seq_cft(sf)->private, false);
+ &blkcg_policy_bfq, cft->private, false);
return 0;
}

-static int bfqg_print_rwstat(struct seq_file *sf, void *v)
+static int bfqg_print_rwstat(struct seq_file *sf, struct cftype *cft, void *v)
{
blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_rwstat,
- &blkcg_policy_bfq, seq_cft(sf)->private, true);
+ &blkcg_policy_bfq, cft->private, true);
return 0;
}

@@ -949,19 +949,21 @@ static u64 bfqg_prfill_rwstat_recursive(struct seq_file *sf,
return __blkg_prfill_rwstat(sf, pd, &sum);
}

-static int bfqg_print_stat_recursive(struct seq_file *sf, void *v)
+static int bfqg_print_stat_recursive(struct seq_file *sf, struct cftype *cft,
+ void *v)
{
blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
bfqg_prfill_stat_recursive, &blkcg_policy_bfq,
- seq_cft(sf)->private, false);
+ cft->private, false);
return 0;
}

-static int bfqg_print_rwstat_recursive(struct seq_file *sf, void *v)
+static int bfqg_print_rwstat_recursive(struct seq_file *sf, struct cftype *cft,
+ void *v)
{
blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
bfqg_prfill_rwstat_recursive, &blkcg_policy_bfq,
- seq_cft(sf)->private, true);
+ cft->private, true);
return 0;
}

@@ -1063,18 +1065,18 @@ struct cftype bfq_blkcg_legacy_files[] = {
{
.name = "bfq.io_service_bytes",
.private = (unsigned long)&blkcg_policy_bfq,
- .seq_show = blkg_print_stat_bytes,
+ .seq_show_cft = blkg_print_stat_bytes,
},
{
.name = "bfq.io_serviced",
.private = (unsigned long)&blkcg_policy_bfq,
- .seq_show = blkg_print_stat_ios,
+ .seq_show_cft = blkg_print_stat_ios,
},
#ifdef CONFIG_DEBUG_BLK_CGROUP
{
.name = "bfq.time",
.private = offsetof(struct bfq_group, stats.time),
- .seq_show = bfqg_print_stat,
+ .seq_show_cft = bfqg_print_stat,
},
{
.name = "bfq.sectors",
@@ -1083,22 +1085,22 @@ struct cftype bfq_blkcg_legacy_files[] = {
{
.name = "bfq.io_service_time",
.private = offsetof(struct bfq_group, stats.service_time),
- .seq_show = bfqg_print_rwstat,
+ .seq_show_cft = bfqg_print_rwstat,
},
{
.name = "bfq.io_wait_time",
.private = offsetof(struct bfq_group, stats.wait_time),
- .seq_show = bfqg_print_rwstat,
+ .seq_show_cft = bfqg_print_rwstat,
},
{
.name = "bfq.io_merged",
.private = offsetof(struct bfq_group, stats.merged),
- .seq_show = bfqg_print_rwstat,
+ .seq_show_cft = bfqg_print_rwstat,
},
{
.name = "bfq.io_queued",
.private = offsetof(struct bfq_group, stats.queued),
- .seq_show = bfqg_print_rwstat,
+ .seq_show_cft = bfqg_print_rwstat,
},
#endif /* CONFIG_DEBUG_BLK_CGROUP */

@@ -1106,18 +1108,18 @@ struct cftype bfq_blkcg_legacy_files[] = {
{
.name = "bfq.io_service_bytes_recursive",
.private = (unsigned long)&blkcg_policy_bfq,
- .seq_show = blkg_print_stat_bytes_recursive,
+ .seq_show_cft = blkg_print_stat_bytes_recursive,
},
{
.name = "bfq.io_serviced_recursive",
.private = (unsigned long)&blkcg_policy_bfq,
- .seq_show = blkg_print_stat_ios_recursive,
+ .seq_show_cft = blkg_print_stat_ios_recursive,
},
#ifdef CONFIG_DEBUG_BLK_CGROUP
{
.name = "bfq.time_recursive",
.private = offsetof(struct bfq_group, stats.time),
- .seq_show = bfqg_print_stat_recursive,
+ .seq_show_cft = bfqg_print_stat_recursive,
},
{
.name = "bfq.sectors_recursive",
@@ -1126,22 +1128,22 @@ struct cftype bfq_blkcg_legacy_files[] = {
{
.name = "bfq.io_service_time_recursive",
.private = offsetof(struct bfq_group, stats.service_time),
- .seq_show = bfqg_print_rwstat_recursive,
+ .seq_show_cft = bfqg_print_rwstat_recursive,
},
{
.name = "bfq.io_wait_time_recursive",
.private = offsetof(struct bfq_group, stats.wait_time),
- .seq_show = bfqg_print_rwstat_recursive,
+ .seq_show_cft = bfqg_print_rwstat_recursive,
},
{
.name = "bfq.io_merged_recursive",
.private = offsetof(struct bfq_group, stats.merged),
- .seq_show = bfqg_print_rwstat_recursive,
+ .seq_show_cft = bfqg_print_rwstat_recursive,
},
{
.name = "bfq.io_queued_recursive",
.private = offsetof(struct bfq_group, stats.queued),
- .seq_show = bfqg_print_rwstat_recursive,
+ .seq_show_cft = bfqg_print_rwstat_recursive,
},
{
.name = "bfq.avg_queue_size",
@@ -1150,22 +1152,22 @@ struct cftype bfq_blkcg_legacy_files[] = {
{
.name = "bfq.group_wait_time",
.private = offsetof(struct bfq_group, stats.group_wait_time),
- .seq_show = bfqg_print_stat,
+ .seq_show_cft = bfqg_print_stat,
},
{
.name = "bfq.idle_time",
.private = offsetof(struct bfq_group, stats.idle_time),
- .seq_show = bfqg_print_stat,
+ .seq_show_cft = bfqg_print_stat,
},
{
.name = "bfq.empty_time",
.private = offsetof(struct bfq_group, stats.empty_time),
- .seq_show = bfqg_print_stat,
+ .seq_show_cft = bfqg_print_stat,
},
{
.name = "bfq.dequeue",
.private = offsetof(struct bfq_group, stats.dequeue),
- .seq_show = bfqg_print_stat,
+ .seq_show_cft = bfqg_print_stat,
},
#endif /* CONFIG_DEBUG_BLK_CGROUP */
{ } /* terminate */
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 992da5592c6e..f173e7437203 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -667,15 +667,16 @@ static u64 blkg_prfill_rwstat_field(struct seq_file *sf,
/**
* blkg_print_stat_bytes - seq_show callback for blkg->stat_bytes
* @sf: seq_file to print to
+ * @cft: cftype holding the data
* @v: unused
*
* To be used as cftype->seq_show to print blkg->stat_bytes.
* cftype->private must be set to the blkcg_policy.
*/
-int blkg_print_stat_bytes(struct seq_file *sf, void *v)
+int blkg_print_stat_bytes(struct seq_file *sf, struct cftype *cft, void *v)
{
blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
- blkg_prfill_rwstat_field, (void *)seq_cft(sf)->private,
+ blkg_prfill_rwstat_field, (void *)cft->private,
offsetof(struct blkcg_gq, stat_bytes), true);
return 0;
}
@@ -684,15 +685,16 @@ EXPORT_SYMBOL_GPL(blkg_print_stat_bytes);
/**
* blkg_print_stat_bytes - seq_show callback for blkg->stat_ios
* @sf: seq_file to print to
+ * @cft: cftype holding the data
* @v: unused
*
* To be used as cftype->seq_show to print blkg->stat_ios. cftype->private
* must be set to the blkcg_policy.
*/
-int blkg_print_stat_ios(struct seq_file *sf, void *v)
+int blkg_print_stat_ios(struct seq_file *sf, struct cftype *cft, void *v)
{
blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
- blkg_prfill_rwstat_field, (void *)seq_cft(sf)->private,
+ blkg_prfill_rwstat_field, (void *)cft->private,
offsetof(struct blkcg_gq, stat_ios), true);
return 0;
}
@@ -710,13 +712,15 @@ static u64 blkg_prfill_rwstat_field_recursive(struct seq_file *sf,
/**
* blkg_print_stat_bytes_recursive - recursive version of blkg_print_stat_bytes
* @sf: seq_file to print to
+ * @cft: cftype holding the data
* @v: unused
*/
-int blkg_print_stat_bytes_recursive(struct seq_file *sf, void *v)
+int blkg_print_stat_bytes_recursive(struct seq_file *sf, struct cftype *cft,
+ void *v)
{
blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
blkg_prfill_rwstat_field_recursive,
- (void *)seq_cft(sf)->private,
+ (void *)cft->private,
offsetof(struct blkcg_gq, stat_bytes), true);
return 0;
}
@@ -725,13 +729,15 @@ EXPORT_SYMBOL_GPL(blkg_print_stat_bytes_recursive);
/**
* blkg_print_stat_ios_recursive - recursive version of blkg_print_stat_ios
* @sf: seq_file to print to
+ * @cft: cftype holding the data
* @v: unused
*/
-int blkg_print_stat_ios_recursive(struct seq_file *sf, void *v)
+int blkg_print_stat_ios_recursive(struct seq_file *sf, struct cftype *cft,
+ void *v)
{
blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
blkg_prfill_rwstat_field_recursive,
- (void *)seq_cft(sf)->private,
+ (void *)cft->private,
offsetof(struct blkcg_gq, stat_ios), true);
return 0;
}
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 4bda70e8db48..5c43821dc528 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1493,22 +1493,22 @@ static struct cftype throtl_legacy_files[] = {
{
.name = "throttle.io_service_bytes",
.private = (unsigned long)&blkcg_policy_throtl,
- .seq_show = blkg_print_stat_bytes,
+ .seq_show_cft = blkg_print_stat_bytes,
},
{
.name = "throttle.io_service_bytes_recursive",
.private = (unsigned long)&blkcg_policy_throtl,
- .seq_show = blkg_print_stat_bytes_recursive,
+ .seq_show_cft = blkg_print_stat_bytes_recursive,
},
{
.name = "throttle.io_serviced",
.private = (unsigned long)&blkcg_policy_throtl,
- .seq_show = blkg_print_stat_ios,
+ .seq_show_cft = blkg_print_stat_ios,
},
{
.name = "throttle.io_serviced_recursive",
.private = (unsigned long)&blkcg_policy_throtl,
- .seq_show = blkg_print_stat_ios_recursive,
+ .seq_show_cft = blkg_print_stat_ios_recursive,
},
{ } /* terminate */
};
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 6a3d87dd3c1a..af0c59c2dcde 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1884,17 +1884,17 @@ static int cfq_set_leaf_weight(struct cgroup_subsys_state *css,
return __cfq_set_weight(css, val, false, false, true);
}

-static int cfqg_print_stat(struct seq_file *sf, void *v)
+static int cfqg_print_stat(struct seq_file *sf, struct cftype *cft, void *v)
{
blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_stat,
- &blkcg_policy_cfq, seq_cft(sf)->private, false);
+ &blkcg_policy_cfq, cft->private, false);
return 0;
}

-static int cfqg_print_rwstat(struct seq_file *sf, void *v)
+static int cfqg_print_rwstat(struct seq_file *sf, struct cftype *cft, void *v)
{
blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_rwstat,
- &blkcg_policy_cfq, seq_cft(sf)->private, true);
+ &blkcg_policy_cfq, cft->private, true);
return 0;
}

@@ -1914,19 +1914,21 @@ static u64 cfqg_prfill_rwstat_recursive(struct seq_file *sf,
return __blkg_prfill_rwstat(sf, pd, &sum);
}

-static int cfqg_print_stat_recursive(struct seq_file *sf, void *v)
+static int cfqg_print_stat_recursive(struct seq_file *sf, struct cftype *cft,
+ void *v)
{
blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
cfqg_prfill_stat_recursive, &blkcg_policy_cfq,
- seq_cft(sf)->private, false);
+ cft->private, false);
return 0;
}

-static int cfqg_print_rwstat_recursive(struct seq_file *sf, void *v)
+static int cfqg_print_rwstat_recursive(struct seq_file *sf, struct cftype *cft,
+ void *v)
{
blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
cfqg_prfill_rwstat_recursive, &blkcg_policy_cfq,
- seq_cft(sf)->private, true);
+ cft->private, true);
return 0;
}

@@ -2034,7 +2036,7 @@ static struct cftype cfq_blkcg_legacy_files[] = {
{
.name = "time",
.private = offsetof(struct cfq_group, stats.time),
- .seq_show = cfqg_print_stat,
+ .seq_show_cft = cfqg_print_stat,
},
{
.name = "sectors",
@@ -2043,39 +2045,39 @@ static struct cftype cfq_blkcg_legacy_files[] = {
{
.name = "io_service_bytes",
.private = (unsigned long)&blkcg_policy_cfq,
- .seq_show = blkg_print_stat_bytes,
+ .seq_show_cft = blkg_print_stat_bytes,
},
{
.name = "io_serviced",
.private = (unsigned long)&blkcg_policy_cfq,
- .seq_show = blkg_print_stat_ios,
+ .seq_show_cft = blkg_print_stat_ios,
},
{
.name = "io_service_time",
.private = offsetof(struct cfq_group, stats.service_time),
- .seq_show = cfqg_print_rwstat,
+ .seq_show_cft = cfqg_print_rwstat,
},
{
.name = "io_wait_time",
.private = offsetof(struct cfq_group, stats.wait_time),
- .seq_show = cfqg_print_rwstat,
+ .seq_show_cft = cfqg_print_rwstat,
},
{
.name = "io_merged",
.private = offsetof(struct cfq_group, stats.merged),
- .seq_show = cfqg_print_rwstat,
+ .seq_show_cft = cfqg_print_rwstat,
},
{
.name = "io_queued",
.private = offsetof(struct cfq_group, stats.queued),
- .seq_show = cfqg_print_rwstat,
+ .seq_show_cft = cfqg_print_rwstat,
},

/* the same statictics which cover the cfqg and its descendants */
{
.name = "time_recursive",
.private = offsetof(struct cfq_group, stats.time),
- .seq_show = cfqg_print_stat_recursive,
+ .seq_show_cft = cfqg_print_stat_recursive,
},
{
.name = "sectors_recursive",
@@ -2084,32 +2086,32 @@ static struct cftype cfq_blkcg_legacy_files[] = {
{
.name = "io_service_bytes_recursive",
.private = (unsigned long)&blkcg_policy_cfq,
- .seq_show = blkg_print_stat_bytes_recursive,
+ .seq_show_cft = blkg_print_stat_bytes_recursive,
},
{
.name = "io_serviced_recursive",
.private = (unsigned long)&blkcg_policy_cfq,
- .seq_show = blkg_print_stat_ios_recursive,
+ .seq_show_cft = blkg_print_stat_ios_recursive,
},
{
.name = "io_service_time_recursive",
.private = offsetof(struct cfq_group, stats.service_time),
- .seq_show = cfqg_print_rwstat_recursive,
+ .seq_show_cft = cfqg_print_rwstat_recursive,
},
{
.name = "io_wait_time_recursive",
.private = offsetof(struct cfq_group, stats.wait_time),
- .seq_show = cfqg_print_rwstat_recursive,
+ .seq_show_cft = cfqg_print_rwstat_recursive,
},
{
.name = "io_merged_recursive",
.private = offsetof(struct cfq_group, stats.merged),
- .seq_show = cfqg_print_rwstat_recursive,
+ .seq_show_cft = cfqg_print_rwstat_recursive,
},
{
.name = "io_queued_recursive",
.private = offsetof(struct cfq_group, stats.queued),
- .seq_show = cfqg_print_rwstat_recursive,
+ .seq_show_cft = cfqg_print_rwstat_recursive,
},
#ifdef CONFIG_DEBUG_BLK_CGROUP
{
@@ -2119,27 +2121,27 @@ static struct cftype cfq_blkcg_legacy_files[] = {
{
.name = "group_wait_time",
.private = offsetof(struct cfq_group, stats.group_wait_time),
- .seq_show = cfqg_print_stat,
+ .seq_show_cft = cfqg_print_stat,
},
{
.name = "idle_time",
.private = offsetof(struct cfq_group, stats.idle_time),
- .seq_show = cfqg_print_stat,
+ .seq_show_cft = cfqg_print_stat,
},
{
.name = "empty_time",
.private = offsetof(struct cfq_group, stats.empty_time),
- .seq_show = cfqg_print_stat,
+ .seq_show_cft = cfqg_print_stat,
},
{
.name = "dequeue",
.private = offsetof(struct cfq_group, stats.dequeue),
- .seq_show = cfqg_print_stat,
+ .seq_show_cft = cfqg_print_stat,
},
{
.name = "unaccounted_time",
.private = offsetof(struct cfq_group, stats.unaccounted_time),
- .seq_show = cfqg_print_stat,
+ .seq_show_cft = cfqg_print_stat,
},
#endif /* CONFIG_DEBUG_BLK_CGROUP */
{ } /* terminate */
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 1e76ceebeb5d..7e016338724e 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -212,10 +212,12 @@ u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
u64 blkg_prfill_stat(struct seq_file *sf, struct blkg_policy_data *pd, int off);
u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
int off);
-int blkg_print_stat_bytes(struct seq_file *sf, void *v);
-int blkg_print_stat_ios(struct seq_file *sf, void *v);
-int blkg_print_stat_bytes_recursive(struct seq_file *sf, void *v);
-int blkg_print_stat_ios_recursive(struct seq_file *sf, void *v);
+int blkg_print_stat_bytes(struct seq_file *sf, struct cftype *cft, void *v);
+int blkg_print_stat_ios(struct seq_file *sf, struct cftype *cft, void *v);
+int blkg_print_stat_bytes_recursive(struct seq_file *sf, struct cftype *cft,
+ void *v);
+int blkg_print_stat_ios_recursive(struct seq_file *sf, struct cftype *cft,
+ void *v);

u64 blkg_stat_recursive_sum(struct blkcg_gq *blkg,
struct blkcg_policy *pol, int off);
--
2.16.1


2018-11-12 09:59:10

by Paolo Valente

[permalink] [raw]
Subject: [PATCH 02/12] cgroup: add hook seq_show_cft with also the owning cftype as parameter

The current implementation of the seq_show hook in the cftype struct
has only, as parameters, the seq_file to write to and the arguments
passed by the command line. Thus, the only way to retrieve the cftype
that owns an instance of such hook function is by using the accessor
in the seq_file itself.

But in a future scenario where the same file may be shared by multiple
cftypes, this accessor will point only to the first of the cftypes
linked to the seq_file. It will then be impossible to access the
cftype owning the seq_show function within the seq_show itself, unless
such cftype is the first one.

This commit adds an additional seq_show_cft hook that has as a formal
parameter also the cftype that owns the function.

Signed-off-by: Angelo Ruocco <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
include/linux/cgroup-defs.h | 3 ++-
kernel/cgroup/cgroup.c | 15 +++++++++------
2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 5e1694fe035b..7841db6e7fb3 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -543,8 +543,9 @@ struct cftype {
*/
s64 (*read_s64)(struct cgroup_subsys_state *css, struct cftype *cft);

- /* generic seq_file read interface */
+ /* generic seq_file read interfaces*/
int (*seq_show)(struct seq_file *sf, void *v);
+ int (*seq_show_cft)(struct seq_file *sf, struct cftype *cft, void *v);

/* optional ops, implement all or none */
void *(*seq_start)(struct seq_file *sf, loff_t *ppos);
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 8b79318810ad..74012b61fe19 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1448,7 +1448,8 @@ static umode_t cgroup_file_mode(const struct cftype *cft)
{
umode_t mode = 0;

- if (cft->read_u64 || cft->read_s64 || cft->seq_show)
+ if (cft->read_u64 || cft->read_s64 || cft->seq_show ||
+ cft->seq_show_cft)
mode |= S_IRUGO;

if (cft->write_u64 || cft->write_s64 || cft->write) {
@@ -3549,17 +3550,19 @@ static int cgroup_seqfile_show(struct seq_file *m, void *arg)
{
struct cftype *cft = seq_cft(m);
struct cgroup_subsys_state *css = seq_css(m);
+ int ret = 0;

if (cft->seq_show)
- return cft->seq_show(m, arg);
-
- if (cft->read_u64)
+ ret = cft->seq_show(m, arg);
+ else if (cft->seq_show_cft)
+ ret = cft->seq_show_cft(m, cft, arg);
+ else if (cft->read_u64)
seq_printf(m, "%llu\n", cft->read_u64(css, cft));
else if (cft->read_s64)
seq_printf(m, "%lld\n", cft->read_s64(css, cft));
else
- return -EINVAL;
- return 0;
+ ret = -EINVAL;
+ return ret;
}

static struct kernfs_ops cgroup_kf_single_ops = {
--
2.16.1


2018-11-12 09:59:11

by Paolo Valente

[permalink] [raw]
Subject: [PATCH 05/12] cgroup: add owner name to cftypes

From: Angelo Ruocco <[email protected]>

The piece of information "who created a certain cftype" is not stored
anywhere, thus a cftype is not able to know who is its owner.

This commit addresses this problem by adding a new field in the cftype
structure that enables the name of its owner to be explicitly set.

Signed-off-by: Angelo Ruocco <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
include/linux/cgroup-defs.h | 2 ++
include/linux/cgroup.h | 13 +++++++++++++
2 files changed, 15 insertions(+)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index d659763c7221..6e31f478c6e1 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -36,6 +36,7 @@ struct seq_file;
#define MAX_CGROUP_TYPE_NAMELEN 32
#define MAX_CGROUP_ROOT_NAMELEN 64
#define MAX_CFTYPE_NAME 64
+#define MAX_OWNER_NAME 64

/* define the enumeration of all cgroup subsystems */
#define SUBSYS(_x) _x ## _cgrp_id,
@@ -505,6 +506,7 @@ struct cftype {
* end of cftype array.
*/
char name[MAX_CFTYPE_NAME];
+ char owner_name[MAX_OWNER_NAME];
unsigned long private;

/*
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 9968332cceed..bf17a9843f45 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -293,6 +293,19 @@ void css_task_iter_end(struct css_task_iter *it);
; \
else

+/**
+ * list_for_each_cft - walk circular list of cftypes linked together
+ * @cft: cftype from where to start
+ * @n: cftype used as a temporary storage
+ *
+ * A cftype pointed by a file may be part of a circular list of cftypes, this
+ * macro walks the circular list starting from any given cftype. Unlike the
+ * "list_for_each_entry" macro, the first element is included in the iteration.
+ */
+#define list_for_each_cft(cft, n) \
+ for (n = NULL; cft != n; n = (n == NULL) ? cft : n, \
+ cft = list_next_entry(cft, share_node))
+
/*
* Inline functions.
*/
--
2.16.1


2018-11-12 09:59:18

by Paolo Valente

[permalink] [raw]
Subject: [PATCH 04/12] cgroup: link cftypes of the same subsystem with the same name

From: Angelo Ruocco <[email protected]>

When a cgroup policy is activated, it creates its files in its
subsystem directory. Two policies are not able to create a file with
the same name: if a policy tries to create a file that has the same
name as a file created by another policy, the cgroup core stops it,
warns the user about the error, and then proceeds to delete all the
files created by the last policy.

However, in some specific situations, it may be useful for two or more
policies to use a common file, e.g., the I/O schedulers bfq and cfq
have the same "weight" attribute, that changes the behavior of the two
schedulers in a similar way.

This commit prepares the interface that allows two policies of the same
subsystem to share files. It adds a flag CFTYPE_SHARE_FILE for cftypes,
flag that allows cftypes to be linked together if they are part of the
same subsystem and have the same name.

There is a limitation for a cftype that wants to share a file: it can't
have the hooks seq_start/next/stop. The reason is that there is no
consistent way to show portions of a file once multiple cftypes are
attached to it, and thus more than one seq_show() is invoked: there are
neither an univocal start point, nor univocal "next" and "stop"
operations.

Signed-off-by: Angelo Ruocco <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
include/linux/cgroup-defs.h | 9 ++++++
kernel/cgroup/cgroup.c | 78 +++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 7841db6e7fb3..d659763c7221 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -93,6 +93,8 @@ enum {
CFTYPE_NO_PREFIX = (1 << 3), /* (DON'T USE FOR NEW FILES) no subsys prefix */
CFTYPE_WORLD_WRITABLE = (1 << 4), /* (DON'T USE FOR NEW FILES) S_IWUGO */

+ CFTYPE_SHARES_FILE = (1 << 5), /* shares file w/ other cfts */
+
/* internal flags, do not use outside cgroup core proper */
__CFTYPE_ONLY_ON_DFL = (1 << 16), /* only on default hierarchy */
__CFTYPE_NOT_ON_DFL = (1 << 17), /* not on default hierarchy */
@@ -528,6 +530,13 @@ struct cftype {
*/
struct cgroup_subsys *ss; /* NULL for cgroup core files */
struct list_head node; /* anchored at ss->cfts */
+
+ /*
+ * List of cftypes that are sharing the same file. It allows the hook
+ * functions of the cftypes in the list to be called together.
+ */
+ struct list_head share_node;
+
struct kernfs_ops *kf_ops;

int (*open)(struct kernfs_open_file *of);
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 74012b61fe19..e3cc437669a8 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1579,9 +1579,12 @@ struct cgroup *cgroup_kn_lock_live(struct kernfs_node *kn, bool drain_offline)
return NULL;
}

-static void cgroup_rm_file(struct cgroup *cgrp, const struct cftype *cft)
+static void cgroup_rm_file(struct cgroup *cgrp, struct cftype *cft)
{
char name[CGROUP_FILE_NAME_MAX];
+ struct kernfs_node *kn = kernfs_find(cgrp->kn,
+ cgroup_file_name(cgrp, cft, name));
+ struct cftype *cfts = kn->priv;

lockdep_assert_held(&cgroup_mutex);

@@ -1596,7 +1599,19 @@ static void cgroup_rm_file(struct cgroup *cgrp, const struct cftype *cft)
del_timer_sync(&cfile->notify_timer);
}

- kernfs_remove_by_name(cgrp->kn, cgroup_file_name(cgrp, cft, name));
+ /* Delete the file only if it's used by one cftype */
+ if (list_empty(&cft->share_node) || atomic_read(&kn->count) == 1) {
+ kernfs_remove(kn);
+ } else {
+ /*
+ * Update the "priv" pointer of the kernfs_node if the cftype
+ * that first created the file is removed.
+ */
+ if (cft == cfts)
+ kn->priv = list_next_entry(cft, share_node);
+
+ kernfs_put(kn);
+ }
}

/**
@@ -3467,6 +3482,7 @@ static int cgroup_file_open(struct kernfs_open_file *of)
{
struct cftype *cft = of->kn->priv;

+
if (cft->open)
return cft->open(of);
return 0;
@@ -3615,6 +3631,23 @@ static int cgroup_add_file(struct cgroup_subsys_state *css, struct cgroup *cgrp,
#ifdef CONFIG_DEBUG_LOCK_ALLOC
key = &cft->lockdep_key;
#endif
+
+ if (cft->flags & CFTYPE_SHARES_FILE) {
+ kn = kernfs_find(cgrp->kn, cgroup_file_name(cgrp, cft, name));
+ if (kn) {
+ struct cftype *cfts = kn->priv;
+
+ if (cfts->flags & CFTYPE_SHARES_FILE) {
+ /*
+ * kn->count keeps track of how many cftypes
+ * share kn
+ */
+ kernfs_get(kn);
+ goto out_set_cfile;
+ }
+ }
+ }
+
kn = __kernfs_create_file(cgrp->kn, cgroup_file_name(cgrp, cft, name),
cgroup_file_mode(cft),
GLOBAL_ROOT_UID, GLOBAL_ROOT_GID,
@@ -3629,6 +3662,7 @@ static int cgroup_add_file(struct cgroup_subsys_state *css, struct cgroup *cgrp,
return ret;
}

+out_set_cfile:
if (cft->file_offset) {
struct cgroup_file *cfile = (void *)css + cft->file_offset;

@@ -3726,11 +3760,46 @@ static void cgroup_exit_cftypes(struct cftype *cfts)
cft->kf_ops = NULL;
cft->ss = NULL;

+ list_del(&cft->share_node);
+
/* revert flags set by cgroup core while adding @cfts */
cft->flags &= ~(__CFTYPE_ONLY_ON_DFL | __CFTYPE_NOT_ON_DFL);
}
}

+/*
+ * Link a cftype that wants to share a file to the list of cftypes that are
+ * using it.
+ *
+ * The conditions for a cftype to be put in an existing list of cftypes and
+ * thus start to share a file are:
+ * - to have the flag CFTYPE_SHARES_FILE set;
+ * - to have all flags coincide with the flags of the other cftypes in the
+ * list;
+ * - to not have a seq_start hook: there is no consistent way to show
+ * portions of a file once multiple cftypes are attached to it, and thus
+ * more than one seq_show() is invoked.
+ *
+ * Once two or more cftypes are linked together, the file only points
+ * to the first of them.
+ */
+static void cgroup_link_cftype(struct cgroup_subsys *ss, struct cftype *cft)
+{
+ struct cftype *cfts;
+
+ list_for_each_entry(cfts, &ss->cfts, node) {
+ struct cftype *c;
+
+ for (c = cfts; c->name[0] != '\0'; c++)
+ if (c != cft && !(c->flags ^ cft->flags) &&
+ !(c->seq_start || cft->seq_start) &&
+ !strcmp(c->name, cft->name)) {
+ list_add(&cft->share_node, &c->share_node);
+ return;
+ }
+ }
+}
+
static int cgroup_init_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
{
struct cftype *cft;
@@ -3760,6 +3829,11 @@ static int cgroup_init_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)

cft->kf_ops = kf_ops;
cft->ss = ss;
+
+ INIT_LIST_HEAD(&cft->share_node);
+
+ if (cft->flags & CFTYPE_SHARES_FILE)
+ cgroup_link_cftype(ss, cft);
}

return 0;
--
2.16.1


2018-11-12 09:59:43

by Paolo Valente

[permalink] [raw]
Subject: [PATCH 10/12] block, throttle: allow sharing cgroup statistic files

From: Angelo Ruocco <[email protected]>

Some of the cgroup files defined in the throttle policy have the same
meaning as those defined in the proportional share policy.

This commit uses the new file sharing interface in cgroup to share
these files.

Signed-off-by: Angelo Ruocco <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
block/blk-throttle.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 5c43821dc528..239957c12d34 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1491,22 +1491,30 @@ static struct cftype throtl_legacy_files[] = {
.write = tg_set_conf_uint,
},
{
- .name = "throttle.io_service_bytes",
+ .name = "io_service_bytes",
+ .owner_name = "throttle",
+ .flags = CFTYPE_SHARES_FILE,
.private = (unsigned long)&blkcg_policy_throtl,
.seq_show_cft = blkg_print_stat_bytes,
},
{
- .name = "throttle.io_service_bytes_recursive",
+ .name = "io_service_bytes_recursive",
+ .owner_name = "throttle",
+ .flags = CFTYPE_SHARES_FILE,
.private = (unsigned long)&blkcg_policy_throtl,
.seq_show_cft = blkg_print_stat_bytes_recursive,
},
{
- .name = "throttle.io_serviced",
+ .name = "io_serviced",
+ .owner_name = "throttle",
+ .flags = CFTYPE_SHARES_FILE,
.private = (unsigned long)&blkcg_policy_throtl,
.seq_show_cft = blkg_print_stat_ios,
},
{
- .name = "throttle.io_serviced_recursive",
+ .name = "io_serviced_recursive",
+ .owner_name = "throttle",
+ .flags = CFTYPE_SHARES_FILE,
.private = (unsigned long)&blkcg_policy_throtl,
.seq_show_cft = blkg_print_stat_ios_recursive,
},
--
2.16.1


2018-11-12 09:59:48

by Paolo Valente

[permalink] [raw]
Subject: [PATCH 09/12] block, bfq: use standard file names for the proportional-share policy

Some of the files exposed in a cgroup by bfq, for the proportional
share policy, have the same meaning as the files owned by cfq.

The old implementation of the cgroup interface didn't allow different
entities to create cgroup files with the same name (in the same
subsystem). So, for bfq, we had to add the prefix "bfq" to the names
of its cgroup files.

This commit renames the cgroup files of the bfq scheduler as those
exposed by cfq, and makes bfq share these files with any other policy.

Signed-off-by: Angelo Ruocco <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-cgroup.c | 94 +++++++++++++++++++++++++++++++++++++++---------------
1 file changed, 69 insertions(+), 25 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 50b2d7ba6b9d..9148fc38c737 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -1055,50 +1055,67 @@ struct blkcg_policy blkcg_policy_bfq = {

struct cftype bfq_blkcg_legacy_files[] = {
{
- .name = "bfq.weight",
- .flags = CFTYPE_NOT_ON_ROOT,
+ .name = "weight",
+ .owner_name = "bfq",
+ .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_SHARES_FILE,
.seq_show = bfq_io_show_weight,
.write_u64 = bfq_io_set_weight_legacy,
},

/* statistics, covers only the tasks in the bfqg */
{
- .name = "bfq.io_service_bytes",
+ .name = "io_service_bytes",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = (unsigned long)&blkcg_policy_bfq,
.seq_show_cft = blkg_print_stat_bytes,
},
{
- .name = "bfq.io_serviced",
+ .name = "io_serviced",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = (unsigned long)&blkcg_policy_bfq,
.seq_show_cft = blkg_print_stat_ios,
},
#ifdef CONFIG_DEBUG_BLK_CGROUP
{
- .name = "bfq.time",
+ .name = "time",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct bfq_group, stats.time),
.seq_show_cft = bfqg_print_stat,
},
{
- .name = "bfq.sectors",
+ .name = "sectors",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.seq_show = bfqg_print_stat_sectors,
},
{
- .name = "bfq.io_service_time",
+ .name = "io_service_time",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct bfq_group, stats.service_time),
.seq_show_cft = bfqg_print_rwstat,
},
{
- .name = "bfq.io_wait_time",
+ .name = "io_wait_time",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct bfq_group, stats.wait_time),
.seq_show_cft = bfqg_print_rwstat,
},
{
- .name = "bfq.io_merged",
+ .name = "io_merged",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct bfq_group, stats.merged),
.seq_show_cft = bfqg_print_rwstat,
},
{
- .name = "bfq.io_queued",
+ .name = "io_queued",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct bfq_group, stats.queued),
.seq_show_cft = bfqg_print_rwstat,
},
@@ -1106,66 +1123,92 @@ struct cftype bfq_blkcg_legacy_files[] = {

/* the same statictics which cover the bfqg and its descendants */
{
- .name = "bfq.io_service_bytes_recursive",
+ .name = "io_service_bytes_recursive",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = (unsigned long)&blkcg_policy_bfq,
.seq_show_cft = blkg_print_stat_bytes_recursive,
},
{
- .name = "bfq.io_serviced_recursive",
+ .name = "io_serviced_recursive",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = (unsigned long)&blkcg_policy_bfq,
.seq_show_cft = blkg_print_stat_ios_recursive,
},
#ifdef CONFIG_DEBUG_BLK_CGROUP
{
- .name = "bfq.time_recursive",
+ .name = "time_recursive",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct bfq_group, stats.time),
.seq_show_cft = bfqg_print_stat_recursive,
},
{
- .name = "bfq.sectors_recursive",
+ .name = "sectors_recursive",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.seq_show = bfqg_print_stat_sectors_recursive,
},
{
- .name = "bfq.io_service_time_recursive",
+ .name = "io_service_time_recursive",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct bfq_group, stats.service_time),
.seq_show_cft = bfqg_print_rwstat_recursive,
},
{
- .name = "bfq.io_wait_time_recursive",
+ .name = "io_wait_time_recursive",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct bfq_group, stats.wait_time),
.seq_show_cft = bfqg_print_rwstat_recursive,
},
{
- .name = "bfq.io_merged_recursive",
+ .name = "io_merged_recursive",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct bfq_group, stats.merged),
.seq_show_cft = bfqg_print_rwstat_recursive,
},
{
- .name = "bfq.io_queued_recursive",
+ .name = "io_queued_recursive",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct bfq_group, stats.queued),
.seq_show_cft = bfqg_print_rwstat_recursive,
},
{
- .name = "bfq.avg_queue_size",
+ .name = "avg_queue_size",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.seq_show = bfqg_print_avg_queue_size,
},
{
- .name = "bfq.group_wait_time",
+ .name = "group_wait_time",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct bfq_group, stats.group_wait_time),
.seq_show_cft = bfqg_print_stat,
},
{
- .name = "bfq.idle_time",
+ .name = "idle_time",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct bfq_group, stats.idle_time),
.seq_show_cft = bfqg_print_stat,
},
{
- .name = "bfq.empty_time",
+ .name = "empty_time",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct bfq_group, stats.empty_time),
.seq_show_cft = bfqg_print_stat,
},
{
- .name = "bfq.dequeue",
+ .name = "dequeue",
+ .owner_name = "bfq",
+ .flags = CFTYPE_SHARES_FILE,
.private = offsetof(struct bfq_group, stats.dequeue),
.seq_show_cft = bfqg_print_stat,
},
@@ -1175,8 +1218,9 @@ struct cftype bfq_blkcg_legacy_files[] = {

struct cftype bfq_blkg_files[] = {
{
- .name = "bfq.weight",
- .flags = CFTYPE_NOT_ON_ROOT,
+ .name = "weight",
+ .owner_name = "bfq",
+ .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_SHARES_FILE,
.seq_show = bfq_io_show_weight,
.write = bfq_io_set_weight,
},
--
2.16.1


2018-11-12 10:00:44

by Paolo Valente

[permalink] [raw]
Subject: [PATCH 01/12] kernfs: add function to find kernfs_node without increasing ref counter

From: Angelo Ruocco <[email protected]>

The kernfs pseudo file system doesn't export any function to only find
a node by name, without also getting a reference on it.
But in some cases it is useful to just locate a kernfs node, while
using it or not depends on some other condition.

This commit adds a function to just look for a node, without getting
a reference on it.

Signed-off-by: Angelo Ruocco <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
fs/kernfs/dir.c | 13 +++++++++++++
include/linux/kernfs.h | 7 +++++++
2 files changed, 20 insertions(+)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 4ca0b5c18192..0003c665869d 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -892,6 +892,19 @@ static struct kernfs_node *kernfs_walk_ns(struct kernfs_node *parent,
return parent;
}

+struct kernfs_node *kernfs_find(struct kernfs_node *parent,
+ const unsigned char *name)
+{
+ struct kernfs_node *kn;
+
+ mutex_lock(&kernfs_mutex);
+ kn = kernfs_find_ns(parent, name, NULL);
+ mutex_unlock(&kernfs_mutex);
+
+ return kn;
+}
+EXPORT_SYMBOL_GPL(kernfs_find);
+
/**
* kernfs_find_and_get_ns - find and get kernfs_node with the given name
* @parent: kernfs_node to search under
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 5b36b1287a5a..26e69d285964 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -309,6 +309,8 @@ void pr_cont_kernfs_path(struct kernfs_node *kn);
struct kernfs_node *kernfs_get_parent(struct kernfs_node *kn);
struct kernfs_node *kernfs_find_and_get_ns(struct kernfs_node *parent,
const char *name, const void *ns);
+struct kernfs_node *kernfs_find(struct kernfs_node *parent,
+ const unsigned char *name);
struct kernfs_node *kernfs_walk_and_get_ns(struct kernfs_node *parent,
const char *path, const void *ns);
void kernfs_get(struct kernfs_node *kn);
@@ -391,6 +393,11 @@ static inline struct kernfs_node *
kernfs_find_and_get_ns(struct kernfs_node *parent, const char *name,
const void *ns)
{ return NULL; }
+
+static inline struct kernfs_node *
+kernfs_find(struct kernfs_node *parent, const char *name)
+{ return NULL; }
+
static inline struct kernfs_node *
kernfs_walk_and_get_ns(struct kernfs_node *parent, const char *path,
const void *ns)
--
2.16.1


2018-11-12 10:01:06

by Paolo Valente

[permalink] [raw]
Subject: Re: [PATCH 00/12] unify the interface of the proportional-share policy in blkio/io

Forgot to CC Lennart, sorry.

> Il giorno 12 nov 2018, alle ore 10:56, Paolo Valente <[email protected]> ha scritto:
>
> Hi Jens, Tejun, all,
> about nine months ago, we agreed on a solution for unifying the
> interface of the proportional-share policy in blkio/io [1]. Angelo
> and I finally completed it. Let me briefly recall the problem and the
> solution.
>
> The current implementation of cgroups doesn't allow two or more
> entities, e.g., I/O schedulers, to share the same files. So, if CFQ
> creates its files for the proportional-share policy, such as, e.g,
> weight files for blkio/io groups, BFQ cannot attach somehow to them.
> Thus, to enable people to set group weights with BFQ, I resorted to
> making BFQ create its own version of these common files, by prepending
> a bfq prefix.
>
> Actually, no legacy code uses these different names, or is likely to
> do so. Having these two sets of names is simply a source of
> confusion, as pointed out also, e.g., by Lennart Poettering (CCed
> here), and acknowledged by Tejun [2].
>
> In [1] we agreed on a solution that solves this problem, by actually
> making it possible to share cgroups files. Both writing to and
> reading from a shared file trigger the appropriate operation for each
> of the entities that share the file. In particular, in case of
> reading,
> - if all entities produce the same output, the this common output is
> shown only once;
> - if the outputs differ, then every per-entity output is shown,
> preceded by the name of the entity that produced that output.
>
> With this solution, legacy code that, e.g., sets group weights, just
> works, regardless of the I/O scheduler actually implementing
> proportional share.
>
> But note that this extension is not restricted to only blkio/io. The
> general group interface now enables files to be shared among multiple
> entities of any kind.
>
> (I have also added a patch to fix some clerical errors in bfq doc,
> which I found while making the latter consistent with the new
> interface.)
>
> Thanks,
> Paolo
>
> [1] https://lkml.org/lkml/2018/1/4/667
> [2] https://github.com/systemd/systemd/issues/7057
>
> Angelo Ruocco (7):
> kernfs: add function to find kernfs_node without increasing ref
> counter
> cgroup: link cftypes of the same subsystem with the same name
> cgroup: add owner name to cftypes
> block, bfq: align min and default weights with cfq
> cgroup: make all functions of all cftypes be invoked
> block, cfq: allow cgroup files to be shared
> block, throttle: allow sharing cgroup statistic files
>
> Paolo Valente (5):
> cgroup: add hook seq_show_cft with also the owning cftype as parameter
> block, cgroup: pass cftype to functions that need to use it
> block, bfq: use standard file names for the proportional-share policy
> doc, bfq-iosched: fix a few clerical errors
> doc, bfq-iosched: make it consistent with the new cgroup interface
>
> Documentation/block/bfq-iosched.txt | 31 +++--
> block/bfq-cgroup.c | 148 +++++++++++++-------
> block/bfq-iosched.h | 4 +-
> block/blk-cgroup.c | 22 +--
> block/blk-throttle.c | 24 ++--
> block/cfq-iosched.c | 105 +++++++++++----
> fs/kernfs/dir.c | 13 ++
> include/linux/blk-cgroup.h | 10 +-
> include/linux/cgroup-defs.h | 14 +-
> include/linux/cgroup.h | 13 ++
> include/linux/kernfs.h | 7 +
> kernel/cgroup/cgroup.c | 262 +++++++++++++++++++++++++++++-------
> 12 files changed, 483 insertions(+), 170 deletions(-)
>
> --
> 2.16.1


2018-11-12 10:18:18

by Paolo Valente

[permalink] [raw]
Subject: Re: [PATCH 00/12] unify the interface of the proportional-share policy in blkio/io



> Il giorno 12 nov 2018, alle ore 11:00, Oleksandr Natalenko <[email protected]> ha scritto:
>
> On 12.11.2018 10:56, Paolo Valente wrote:
>> Hi Jens, Tejun, all,
>> about nine months ago, we agreed on a solution for unifying the
>> interface of the proportional-share policy in blkio/io [1]. Angelo
>> and I finally completed it. Let me briefly recall the problem and the
>> solution.
>> The current implementation of cgroups doesn't allow two or more
>> entities, e.g., I/O schedulers, to share the same files. So, if CFQ
>> creates its files for the proportional-share policy, such as, e.g,
>> weight files for blkio/io groups, BFQ cannot attach somehow to them.
>> Thus, to enable people to set group weights with BFQ, I resorted to
>> making BFQ create its own version of these common files, by prepending
>> a bfq prefix.
>> Actually, no legacy code uses these different names, or is likely to
>> do so. Having these two sets of names is simply a source of
>> confusion, as pointed out also, e.g., by Lennart Poettering (CCed
>> here), and acknowledged by Tejun [2].
>> In [1] we agreed on a solution that solves this problem, by actually
>> making it possible to share cgroups files. Both writing to and
>> reading from a shared file trigger the appropriate operation for each
>> of the entities that share the file. In particular, in case of
>> reading,
>> - if all entities produce the same output, the this common output is
>> shown only once;
>> - if the outputs differ, then every per-entity output is shown,
>> preceded by the name of the entity that produced that output.
>> With this solution, legacy code that, e.g., sets group weights, just
>> works, regardless of the I/O scheduler actually implementing
>> proportional share.
>> But note that this extension is not restricted to only blkio/io. The
>> general group interface now enables files to be shared among multiple
>> entities of any kind.
>> (I have also added a patch to fix some clerical errors in bfq doc,
>> which I found while making the latter consistent with the new
>> interface.)
>> Thanks,
>> Paolo
>> [1] https://lkml.org/lkml/2018/1/4/667
>> [2] https://github.com/systemd/systemd/issues/7057
>> Angelo Ruocco (7):
>> kernfs: add function to find kernfs_node without increasing ref
>> counter
>> cgroup: link cftypes of the same subsystem with the same name
>> cgroup: add owner name to cftypes
>> block, bfq: align min and default weights with cfq
>> cgroup: make all functions of all cftypes be invoked
>> block, cfq: allow cgroup files to be shared
>> block, throttle: allow sharing cgroup statistic files
>> Paolo Valente (5):
>> cgroup: add hook seq_show_cft with also the owning cftype as parameter
>> block, cgroup: pass cftype to functions that need to use it
>> block, bfq: use standard file names for the proportional-share policy
>> doc, bfq-iosched: fix a few clerical errors
>> doc, bfq-iosched: make it consistent with the new cgroup interface
>> Documentation/block/bfq-iosched.txt | 31 +++--
>> block/bfq-cgroup.c | 148 +++++++++++++-------
>> block/bfq-iosched.h | 4 +-
>> block/blk-cgroup.c | 22 +--
>> block/blk-throttle.c | 24 ++--
>> block/cfq-iosched.c | 105 +++++++++++----
>> fs/kernfs/dir.c | 13 ++
>> include/linux/blk-cgroup.h | 10 +-
>> include/linux/cgroup-defs.h | 14 +-
>> include/linux/cgroup.h | 13 ++
>> include/linux/kernfs.h | 7 +
>> kernel/cgroup/cgroup.c | 262 +++++++++++++++++++++++++++++-------
>> 12 files changed, 483 insertions(+), 170 deletions(-)
>> --
>> 2.16.1
>
> I thought all the legacy stuff including CFS et al. is going to be removed in v4.21 completely…
>

Thanks for pointing this out.

People with a lower kernel version than the future 4.21 just cannot
and will not be able to use the proportional share policy on blk-mq
(with legacy code), because of the name issue highlighted in this
email. If this patch series gets accepted, a backport will solve the
problem. In this respect, such a backport might even happen
'automatically', as most bfq commit seem to get backported to older,
stable kernels.

In addition, this extension
- extends the whole cgroups interface, in a seamless and
backward-compatible way, to prevent future issues like these;
- solves a similar issue with throttle (which AFAIK won't go away
with 4.21).

Thanks,
Paolo

> --
> Oleksandr Natalenko (post-factum)


2018-11-12 12:29:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 01/12] kernfs: add function to find kernfs_node without increasing ref counter

On Mon, Nov 12, 2018 at 10:56:21AM +0100, Paolo Valente wrote:
> From: Angelo Ruocco <[email protected]>
>
> The kernfs pseudo file system doesn't export any function to only find
> a node by name, without also getting a reference on it.
> But in some cases it is useful to just locate a kernfs node, while
> using it or not depends on some other condition.
>
> This commit adds a function to just look for a node, without getting
> a reference on it.

Eeek, that sounds really bad. So you save off a pointer to something,
and have no idea if that pointer now really is valid or not? It can
instantly disappear right afterwards.

This feels wrong, what is the problem of having a properly reference
counted object passed back to you that you have to create a dangerous
function like this?

greg k-h

2018-11-12 15:36:32

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 00/12] unify the interface of the proportional-share policy in blkio/io

On 11/12/18 3:17 AM, Paolo Valente wrote:
>
>
>> Il giorno 12 nov 2018, alle ore 11:00, Oleksandr Natalenko <[email protected]> ha scritto:
>>
>> On 12.11.2018 10:56, Paolo Valente wrote:
>>> Hi Jens, Tejun, all,
>>> about nine months ago, we agreed on a solution for unifying the
>>> interface of the proportional-share policy in blkio/io [1]. Angelo
>>> and I finally completed it. Let me briefly recall the problem and the
>>> solution.
>>> The current implementation of cgroups doesn't allow two or more
>>> entities, e.g., I/O schedulers, to share the same files. So, if CFQ
>>> creates its files for the proportional-share policy, such as, e.g,
>>> weight files for blkio/io groups, BFQ cannot attach somehow to them.
>>> Thus, to enable people to set group weights with BFQ, I resorted to
>>> making BFQ create its own version of these common files, by prepending
>>> a bfq prefix.
>>> Actually, no legacy code uses these different names, or is likely to
>>> do so. Having these two sets of names is simply a source of
>>> confusion, as pointed out also, e.g., by Lennart Poettering (CCed
>>> here), and acknowledged by Tejun [2].
>>> In [1] we agreed on a solution that solves this problem, by actually
>>> making it possible to share cgroups files. Both writing to and
>>> reading from a shared file trigger the appropriate operation for each
>>> of the entities that share the file. In particular, in case of
>>> reading,
>>> - if all entities produce the same output, the this common output is
>>> shown only once;
>>> - if the outputs differ, then every per-entity output is shown,
>>> preceded by the name of the entity that produced that output.
>>> With this solution, legacy code that, e.g., sets group weights, just
>>> works, regardless of the I/O scheduler actually implementing
>>> proportional share.
>>> But note that this extension is not restricted to only blkio/io. The
>>> general group interface now enables files to be shared among multiple
>>> entities of any kind.
>>> (I have also added a patch to fix some clerical errors in bfq doc,
>>> which I found while making the latter consistent with the new
>>> interface.)
>>> Thanks,
>>> Paolo
>>> [1] https://lkml.org/lkml/2018/1/4/667
>>> [2] https://github.com/systemd/systemd/issues/7057
>>> Angelo Ruocco (7):
>>> kernfs: add function to find kernfs_node without increasing ref
>>> counter
>>> cgroup: link cftypes of the same subsystem with the same name
>>> cgroup: add owner name to cftypes
>>> block, bfq: align min and default weights with cfq
>>> cgroup: make all functions of all cftypes be invoked
>>> block, cfq: allow cgroup files to be shared
>>> block, throttle: allow sharing cgroup statistic files
>>> Paolo Valente (5):
>>> cgroup: add hook seq_show_cft with also the owning cftype as parameter
>>> block, cgroup: pass cftype to functions that need to use it
>>> block, bfq: use standard file names for the proportional-share policy
>>> doc, bfq-iosched: fix a few clerical errors
>>> doc, bfq-iosched: make it consistent with the new cgroup interface
>>> Documentation/block/bfq-iosched.txt | 31 +++--
>>> block/bfq-cgroup.c | 148 +++++++++++++-------
>>> block/bfq-iosched.h | 4 +-
>>> block/blk-cgroup.c | 22 +--
>>> block/blk-throttle.c | 24 ++--
>>> block/cfq-iosched.c | 105 +++++++++++----
>>> fs/kernfs/dir.c | 13 ++
>>> include/linux/blk-cgroup.h | 10 +-
>>> include/linux/cgroup-defs.h | 14 +-
>>> include/linux/cgroup.h | 13 ++
>>> include/linux/kernfs.h | 7 +
>>> kernel/cgroup/cgroup.c | 262 +++++++++++++++++++++++++++++-------
>>> 12 files changed, 483 insertions(+), 170 deletions(-)
>>> --
>>> 2.16.1
>>
>> I thought all the legacy stuff including CFS et al. is going to be removed in v4.21 completely…
>>
>
> Thanks for pointing this out.
>
> People with a lower kernel version than the future 4.21 just cannot
> and will not be able to use the proportional share policy on blk-mq
> (with legacy code), because of the name issue highlighted in this
> email. If this patch series gets accepted, a backport will solve the
> problem. In this respect, such a backport might even happen
> 'automatically', as most bfq commit seem to get backported to older,
> stable kernels.
>
> In addition, this extension
> - extends the whole cgroups interface, in a seamless and
> backward-compatible way, to prevent future issues like these;
> - solves a similar issue with throttle (which AFAIK won't go away
> with 4.21).

There's no way this series can get accepted, since you've made the
mistake of basing it on something that won't apply to the block
tree for 4.21. I've outlined these rules before, but here they are
again:

1) Patches destined for the CURRENT kernel version should be
against my for-linus branch. That means that right now, any
patches that should to into 4.20 should be against that.

2) Patches destined for the NEXT kernel version should be against
my for-x.y/block branch, where x.y is the next version. As of
right now, patches for 4.21 should be against my for-4.21/bloc
branch.

I'd encourage you to respin against that, particularly in this case
since we've both got a lot of churn, and also removal of various
items that you are patching here.

--
Jens Axboe


2018-11-12 15:46:46

by Paolo Valente

[permalink] [raw]
Subject: Re: [PATCH 00/12] unify the interface of the proportional-share policy in blkio/io



> Il giorno 12 nov 2018, alle ore 16:35, Jens Axboe <[email protected]> ha scritto:
>
> On 11/12/18 3:17 AM, Paolo Valente wrote:
>>
>>
>>> Il giorno 12 nov 2018, alle ore 11:00, Oleksandr Natalenko <[email protected]> ha scritto:
>>>
>>> On 12.11.2018 10:56, Paolo Valente wrote:
>>>> Hi Jens, Tejun, all,
>>>> about nine months ago, we agreed on a solution for unifying the
>>>> interface of the proportional-share policy in blkio/io [1]. Angelo
>>>> and I finally completed it. Let me briefly recall the problem and the
>>>> solution.
>>>> The current implementation of cgroups doesn't allow two or more
>>>> entities, e.g., I/O schedulers, to share the same files. So, if CFQ
>>>> creates its files for the proportional-share policy, such as, e.g,
>>>> weight files for blkio/io groups, BFQ cannot attach somehow to them.
>>>> Thus, to enable people to set group weights with BFQ, I resorted to
>>>> making BFQ create its own version of these common files, by prepending
>>>> a bfq prefix.
>>>> Actually, no legacy code uses these different names, or is likely to
>>>> do so. Having these two sets of names is simply a source of
>>>> confusion, as pointed out also, e.g., by Lennart Poettering (CCed
>>>> here), and acknowledged by Tejun [2].
>>>> In [1] we agreed on a solution that solves this problem, by actually
>>>> making it possible to share cgroups files. Both writing to and
>>>> reading from a shared file trigger the appropriate operation for each
>>>> of the entities that share the file. In particular, in case of
>>>> reading,
>>>> - if all entities produce the same output, the this common output is
>>>> shown only once;
>>>> - if the outputs differ, then every per-entity output is shown,
>>>> preceded by the name of the entity that produced that output.
>>>> With this solution, legacy code that, e.g., sets group weights, just
>>>> works, regardless of the I/O scheduler actually implementing
>>>> proportional share.
>>>> But note that this extension is not restricted to only blkio/io. The
>>>> general group interface now enables files to be shared among multiple
>>>> entities of any kind.
>>>> (I have also added a patch to fix some clerical errors in bfq doc,
>>>> which I found while making the latter consistent with the new
>>>> interface.)
>>>> Thanks,
>>>> Paolo
>>>> [1] https://lkml.org/lkml/2018/1/4/667
>>>> [2] https://github.com/systemd/systemd/issues/7057
>>>> Angelo Ruocco (7):
>>>> kernfs: add function to find kernfs_node without increasing ref
>>>> counter
>>>> cgroup: link cftypes of the same subsystem with the same name
>>>> cgroup: add owner name to cftypes
>>>> block, bfq: align min and default weights with cfq
>>>> cgroup: make all functions of all cftypes be invoked
>>>> block, cfq: allow cgroup files to be shared
>>>> block, throttle: allow sharing cgroup statistic files
>>>> Paolo Valente (5):
>>>> cgroup: add hook seq_show_cft with also the owning cftype as parameter
>>>> block, cgroup: pass cftype to functions that need to use it
>>>> block, bfq: use standard file names for the proportional-share policy
>>>> doc, bfq-iosched: fix a few clerical errors
>>>> doc, bfq-iosched: make it consistent with the new cgroup interface
>>>> Documentation/block/bfq-iosched.txt | 31 +++--
>>>> block/bfq-cgroup.c | 148 +++++++++++++-------
>>>> block/bfq-iosched.h | 4 +-
>>>> block/blk-cgroup.c | 22 +--
>>>> block/blk-throttle.c | 24 ++--
>>>> block/cfq-iosched.c | 105 +++++++++++----
>>>> fs/kernfs/dir.c | 13 ++
>>>> include/linux/blk-cgroup.h | 10 +-
>>>> include/linux/cgroup-defs.h | 14 +-
>>>> include/linux/cgroup.h | 13 ++
>>>> include/linux/kernfs.h | 7 +
>>>> kernel/cgroup/cgroup.c | 262 +++++++++++++++++++++++++++++-------
>>>> 12 files changed, 483 insertions(+), 170 deletions(-)
>>>> --
>>>> 2.16.1
>>>
>>> I thought all the legacy stuff including CFS et al. is going to be removed in v4.21 completely…
>>>
>>
>> Thanks for pointing this out.
>>
>> People with a lower kernel version than the future 4.21 just cannot
>> and will not be able to use the proportional share policy on blk-mq
>> (with legacy code), because of the name issue highlighted in this
>> email. If this patch series gets accepted, a backport will solve the
>> problem. In this respect, such a backport might even happen
>> 'automatically', as most bfq commit seem to get backported to older,
>> stable kernels.
>>
>> In addition, this extension
>> - extends the whole cgroups interface, in a seamless and
>> backward-compatible way, to prevent future issues like these;
>> - solves a similar issue with throttle (which AFAIK won't go away
>> with 4.21).
>
> There's no way this series can get accepted, since you've made the
> mistake of basing it on something that won't apply to the block
> tree for 4.21.

Of course, sorry :(

We'll rebase V2.

BTW, since this patch series is probably even more useful for older
than for future kernels, might it make sense to also propose it for
stable/longterm kernels (provided that such a possibility exists)?

Thanks,
Paolo

> I've outlined these rules before, but here they are
> again:
>
> 1) Patches destined for the CURRENT kernel version should be
> against my for-linus branch. That means that right now, any
> patches that should to into 4.20 should be against that.
>
> 2) Patches destined for the NEXT kernel version should be against
> my for-x.y/block branch, where x.y is the next version. As of
> right now, patches for 4.21 should be against my for-4.21/bloc
> branch.
>
> I'd encourage you to respin against that, particularly in this case
> since we've both got a lot of churn, and also removal of various
> items that you are patching here.
>
> --
> Jens Axboe


2018-11-12 15:49:51

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 00/12] unify the interface of the proportional-share policy in blkio/io

On 11/12/18 8:45 AM, Paolo Valente wrote:
> BTW, since this patch series is probably even more useful for older
> than for future kernels, might it make sense to also propose it for
> stable/longterm kernels (provided that such a possibility exists)?

That just not how things work, we don't put different things in
older/stable kernels, it's strictly backports of what we have in
current/newer kernels. Hence it appears to be a dead end right now.

--
Jens Axboe


2018-11-12 15:54:47

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH 00/12] unify the interface of the proportional-share policy in blkio/io

On Mon, Nov 12, 2018 at 08:48:35AM -0700, Jens Axboe wrote:
> On 11/12/18 8:45 AM, Paolo Valente wrote:
> > BTW, since this patch series is probably even more useful for older
> > than for future kernels, might it make sense to also propose it for
> > stable/longterm kernels (provided that such a possibility exists)?
>
> That just not how things work, we don't put different things in
> older/stable kernels, it's strictly backports of what we have in
> current/newer kernels. Hence it appears to be a dead end right now.
>

It may not be useful currently, but my plans are to do a scheduler agnostic
proportional io controller next, so having these interfaces unified would be
nice so I don't have to do a rqos.io.weight or something similar. Thanks,

Josef

2018-11-12 16:06:59

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 00/12] unify the interface of the proportional-share policy in blkio/io

On 11/12/18 8:54 AM, Josef Bacik wrote:
> On Mon, Nov 12, 2018 at 08:48:35AM -0700, Jens Axboe wrote:
>> On 11/12/18 8:45 AM, Paolo Valente wrote:
>>> BTW, since this patch series is probably even more useful for older
>>> than for future kernels, might it make sense to also propose it for
>>> stable/longterm kernels (provided that such a possibility exists)?
>>
>> That just not how things work, we don't put different things in
>> older/stable kernels, it's strictly backports of what we have in
>> current/newer kernels. Hence it appears to be a dead end right now.
>>
>
> It may not be useful currently, but my plans are to do a scheduler agnostic
> proportional io controller next, so having these interfaces unified would be
> nice so I don't have to do a rqos.io.weight or something similar. Thanks,

I'm not saying the work isn't useful, I'm saying that we can't go adding
different interfaces to stable kernels than what we currently have in
tip. I'm all for unified interfaces for this kind of thing, it's much
better than having something that's specific to any given
implementation.

--
Jens Axboe


2018-11-13 01:58:37

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 01/12] kernfs: add function to find kernfs_node without increasing ref counter

On Mon, Nov 12, 2018 at 04:28:40AM -0800, Greg Kroah-Hartman wrote:
> On Mon, Nov 12, 2018 at 10:56:21AM +0100, Paolo Valente wrote:
> > From: Angelo Ruocco <[email protected]>
> >
> > The kernfs pseudo file system doesn't export any function to only find
> > a node by name, without also getting a reference on it.
> > But in some cases it is useful to just locate a kernfs node, while
> > using it or not depends on some other condition.
> >
> > This commit adds a function to just look for a node, without getting
> > a reference on it.
>
> Eeek, that sounds really bad. So you save off a pointer to something,
> and have no idea if that pointer now really is valid or not? It can
> instantly disappear right afterwards.
>
> This feels wrong, what is the problem of having a properly reference
> counted object passed back to you that you have to create a dangerous
> function like this?

I agree with Greg, this function is dangerous. What's wrong with using
find_get and then doing a put once you successfully take it over, or
fail to take it over?

2018-11-13 17:54:54

by Paolo Valente

[permalink] [raw]
Subject: Re: [PATCH 01/12] kernfs: add function to find kernfs_node without increasing ref counter



> Il giorno 12 nov 2018, alle ore 13:28, Greg Kroah-Hartman <[email protected]> ha scritto:
>
> On Mon, Nov 12, 2018 at 10:56:21AM +0100, Paolo Valente wrote:
>> From: Angelo Ruocco <[email protected]>
>>
>> The kernfs pseudo file system doesn't export any function to only find
>> a node by name, without also getting a reference on it.
>> But in some cases it is useful to just locate a kernfs node, while
>> using it or not depends on some other condition.
>>
>> This commit adds a function to just look for a node, without getting
>> a reference on it.
>
> Eeek, that sounds really bad. So you save off a pointer to something,
> and have no idea if that pointer now really is valid or not? It can
> instantly disappear right afterwards.
>

Hi Greg,
that function is invoked only in functions executed with cgroup_mutex
held. This guarantees that nothing disappears or becomes
inconsistent. That's why we decided to go for this optimization,
instead of doing useless gets&puts pairs. Still, I'm not expert
enough to state whether it is impossible that, once we have defined
that function, it may then get used in some unsafe way.

So, I seem to see two options:
1) Add a comment on the function, saying that cgroup_mutex must be
held while invoking it (I guess you won't like this one).
2) Do not define such a new function, and, in the other patches, use
the already-available find_and_get.

Looking forward to your feedback (or of other knowledgeable people on
this issue) before proceeding to a rebased V2,
Paolo


> This feels wrong, what is the problem of having a properly reference
> counted object passed back to you that you have to create a dangerous
> function like this?
>
> greg k-h


2018-11-13 19:36:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 01/12] kernfs: add function to find kernfs_node without increasing ref counter

On Tue, Nov 13, 2018 at 06:53:59PM +0100, Paolo Valente wrote:
>
>
> > Il giorno 12 nov 2018, alle ore 13:28, Greg Kroah-Hartman <[email protected]> ha scritto:
> >
> > On Mon, Nov 12, 2018 at 10:56:21AM +0100, Paolo Valente wrote:
> >> From: Angelo Ruocco <[email protected]>
> >>
> >> The kernfs pseudo file system doesn't export any function to only find
> >> a node by name, without also getting a reference on it.
> >> But in some cases it is useful to just locate a kernfs node, while
> >> using it or not depends on some other condition.
> >>
> >> This commit adds a function to just look for a node, without getting
> >> a reference on it.
> >
> > Eeek, that sounds really bad. So you save off a pointer to something,
> > and have no idea if that pointer now really is valid or not? It can
> > instantly disappear right afterwards.
> >
>
> Hi Greg,
> that function is invoked only in functions executed with cgroup_mutex
> held. This guarantees that nothing disappears or becomes
> inconsistent. That's why we decided to go for this optimization,
> instead of doing useless gets&puts pairs. Still, I'm not expert
> enough to state whether it is impossible that, once we have defined
> that function, it may then get used in some unsafe way.

I can guarantee once you define that function, it will be used in an
unsafe way :(

So just don't create it, use the put calls, it's fast and should never
be a performance issue.

> So, I seem to see two options:
> 1) Add a comment on the function, saying that cgroup_mutex must be
> held while invoking it (I guess you won't like this one).

Nope, do not create it.

> 2) Do not define such a new function, and, in the other patches, use
> the already-available find_and_get.

Yes, please do that.

thanks,

greg k-h

2018-11-15 11:55:38

by Angelo Ruocco

[permalink] [raw]
Subject: Re: [PATCH 00/12] unify the interface of the proportional-share policy in blkio/io

Hi Jens,

I have rebased the patchset against the for-4.21/block branch, but I
can't test them properly because the compiling process has an error on
a different file. In particular:

include/net/xfrm.h:1465:3 error: unknown type 'spruct'
include/net/xfrm.h:1465:30 error: expected ':', ',', ';', '}' or
'__attribute__' before 'auth'

To be clear, and so that you can check I haven't made some trivial
mistakes: I have added/fetched the remote [1] and then simply rebased
against the for-4.21/block branch.

[1] git://git.kernel.dk/linux-block

Angelo

2018-11-12 16:35 GMT+01:00, Jens Axboe <[email protected]>:
> On 11/12/18 3:17 AM, Paolo Valente wrote:
>>
>>
>>> Il giorno 12 nov 2018, alle ore 11:00, Oleksandr Natalenko
>>> <[email protected]> ha scritto:
>>>
>>> On 12.11.2018 10:56, Paolo Valente wrote:
>>>> Hi Jens, Tejun, all,
>>>> about nine months ago, we agreed on a solution for unifying the
>>>> interface of the proportional-share policy in blkio/io [1]. Angelo
>>>> and I finally completed it. Let me briefly recall the problem and the
>>>> solution.
>>>> The current implementation of cgroups doesn't allow two or more
>>>> entities, e.g., I/O schedulers, to share the same files. So, if CFQ
>>>> creates its files for the proportional-share policy, such as, e.g,
>>>> weight files for blkio/io groups, BFQ cannot attach somehow to them.
>>>> Thus, to enable people to set group weights with BFQ, I resorted to
>>>> making BFQ create its own version of these common files, by prepending
>>>> a bfq prefix.
>>>> Actually, no legacy code uses these different names, or is likely to
>>>> do so. Having these two sets of names is simply a source of
>>>> confusion, as pointed out also, e.g., by Lennart Poettering (CCed
>>>> here), and acknowledged by Tejun [2].
>>>> In [1] we agreed on a solution that solves this problem, by actually
>>>> making it possible to share cgroups files. Both writing to and
>>>> reading from a shared file trigger the appropriate operation for each
>>>> of the entities that share the file. In particular, in case of
>>>> reading,
>>>> - if all entities produce the same output, the this common output is
>>>> shown only once;
>>>> - if the outputs differ, then every per-entity output is shown,
>>>> preceded by the name of the entity that produced that output.
>>>> With this solution, legacy code that, e.g., sets group weights, just
>>>> works, regardless of the I/O scheduler actually implementing
>>>> proportional share.
>>>> But note that this extension is not restricted to only blkio/io. The
>>>> general group interface now enables files to be shared among multiple
>>>> entities of any kind.
>>>> (I have also added a patch to fix some clerical errors in bfq doc,
>>>> which I found while making the latter consistent with the new
>>>> interface.)
>>>> Thanks,
>>>> Paolo
>>>> [1] https://lkml.org/lkml/2018/1/4/667
>>>> [2] https://github.com/systemd/systemd/issues/7057
>>>> Angelo Ruocco (7):
>>>> kernfs: add function to find kernfs_node without increasing ref
>>>> counter
>>>> cgroup: link cftypes of the same subsystem with the same name
>>>> cgroup: add owner name to cftypes
>>>> block, bfq: align min and default weights with cfq
>>>> cgroup: make all functions of all cftypes be invoked
>>>> block, cfq: allow cgroup files to be shared
>>>> block, throttle: allow sharing cgroup statistic files
>>>> Paolo Valente (5):
>>>> cgroup: add hook seq_show_cft with also the owning cftype as parameter
>>>> block, cgroup: pass cftype to functions that need to use it
>>>> block, bfq: use standard file names for the proportional-share policy
>>>> doc, bfq-iosched: fix a few clerical errors
>>>> doc, bfq-iosched: make it consistent with the new cgroup interface
>>>> Documentation/block/bfq-iosched.txt | 31 +++--
>>>> block/bfq-cgroup.c | 148 +++++++++++++-------
>>>> block/bfq-iosched.h | 4 +-
>>>> block/blk-cgroup.c | 22 +--
>>>> block/blk-throttle.c | 24 ++--
>>>> block/cfq-iosched.c | 105 +++++++++++----
>>>> fs/kernfs/dir.c | 13 ++
>>>> include/linux/blk-cgroup.h | 10 +-
>>>> include/linux/cgroup-defs.h | 14 +-
>>>> include/linux/cgroup.h | 13 ++
>>>> include/linux/kernfs.h | 7 +
>>>> kernel/cgroup/cgroup.c | 262
>>>> +++++++++++++++++++++++++++++-------
>>>> 12 files changed, 483 insertions(+), 170 deletions(-)
>>>> --
>>>> 2.16.1
>>>
>>> I thought all the legacy stuff including CFS et al. is going to be
>>> removed in v4.21 completely…
>>>
>>
>> Thanks for pointing this out.
>>
>> People with a lower kernel version than the future 4.21 just cannot
>> and will not be able to use the proportional share policy on blk-mq
>> (with legacy code), because of the name issue highlighted in this
>> email. If this patch series gets accepted, a backport will solve the
>> problem. In this respect, such a backport might even happen
>> 'automatically', as most bfq commit seem to get backported to older,
>> stable kernels.
>>
>> In addition, this extension
>> - extends the whole cgroups interface, in a seamless and
>> backward-compatible way, to prevent future issues like these;
>> - solves a similar issue with throttle (which AFAIK won't go away
>> with 4.21).
>
> There's no way this series can get accepted, since you've made the
> mistake of basing it on something that won't apply to the block
> tree for 4.21. I've outlined these rules before, but here they are
> again:
>
> 1) Patches destined for the CURRENT kernel version should be
> against my for-linus branch. That means that right now, any
> patches that should to into 4.20 should be against that.
>
> 2) Patches destined for the NEXT kernel version should be against
> my for-x.y/block branch, where x.y is the next version. As of
> right now, patches for 4.21 should be against my for-4.21/bloc
> branch.
>
> I'd encourage you to respin against that, particularly in this case
> since we've both got a lot of churn, and also removal of various
> items that you are patching here.
>
> --
> Jens Axboe
>
>

2018-11-15 16:32:31

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 00/12] unify the interface of the proportional-share policy in blkio/io

On 11/15/18 4:54 AM, Angelo Ruocco wrote:
> Hi Jens,
>
> I have rebased the patchset against the for-4.21/block branch, but I
> can't test them properly because the compiling process has an error on
> a different file. In particular:
>
> include/net/xfrm.h:1465:3 error: unknown type 'spruct'
> include/net/xfrm.h:1465:30 error: expected ':', ',', ';', '}' or
> '__attribute__' before 'auth'
>
> To be clear, and so that you can check I haven't made some trivial
> mistakes: I have added/fetched the remote [1] and then simply rebased
> against the for-4.21/block branch.

I think you have memory issues, a 'p' is just one bit away from a 't'.

--
Jens Axboe


2018-11-19 09:47:19

by Angelo Ruocco

[permalink] [raw]
Subject: Re: [PATCH 00/12] unify the interface of the proportional-share policy in blkio/io

2018-11-15 17:30 GMT+01:00, Jens Axboe <[email protected]>:
> On 11/15/18 4:54 AM, Angelo Ruocco wrote:
>> Hi Jens,
>>
>> I have rebased the patchset against the for-4.21/block branch, but I
>> can't test them properly because the compiling process has an error on
>> a different file. In particular:
>>
>> include/net/xfrm.h:1465:3 error: unknown type 'spruct'
>> include/net/xfrm.h:1465:30 error: expected ':', ',', ';', '}' or
>> '__attribute__' before 'auth'
>>
>> To be clear, and so that you can check I haven't made some trivial
>> mistakes: I have added/fetched the remote [1] and then simply rebased
>> against the for-4.21/block branch.
>
> I think you have memory issues, a 'p' is just one bit away from a 't'.

That was indeed the problem.

Thank you,
Angelo

>
> --
> Jens Axboe
>