2019-06-13 22:31:08

by Tejun Heo

[permalink] [raw]
Subject: [PATCHSET block/for-linus] Assorted blkcg fixes

Hello,

This patchset contains the following blkcg fixes.

0001-blk-iolatency-clear-use_delay-when-io.latency-is-set.patch
0002-blkcg-update-blkcg_print_stat-to-handle-larger-outpu.patch
0003-blkcg-perpcu_ref-init-exit-should-be-done-from-blkg_.patch
0004-blkcg-blkcg_activate_policy-should-initialize-ancest.patch
0005-blkcg-writeback-dead-memcgs-shouldn-t-contribute-to-.patch

Please refer to each patch's description for details. Patchset is
also available in the following git branch.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-blkcg-fixes

Thanks. diffstat follows.

block/blk-cgroup.c | 24 ++++++++++++------------
block/blk-iolatency.c | 4 +++-
fs/fs-writeback.c | 8 +++++++-
3 files changed, 22 insertions(+), 14 deletions(-)

--
tejun


2019-06-13 22:31:14

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 1/5] blk-iolatency: clear use_delay when io.latency is set to zero

If use_delay was non-zero when the latency target of a cgroup was set
to zero, it will stay stuck until io.latency is enabled on the cgroup
again. This keeps readahead disabled for the cgroup impacting
performance negatively.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Josef Bacik <[email protected]>
Fixes: d70675121546 ("block: introduce blk-iolatency io controller")
Cc: [email protected] # v4.19+
---
block/blk-iolatency.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index d22e61bced86..17896bb3aaf2 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -778,8 +778,10 @@ static int iolatency_set_min_lat_nsec(struct blkcg_gq *blkg, u64 val)

if (!oldval && val)
return 1;
- if (oldval && !val)
+ if (oldval && !val) {
+ blkcg_clear_delay(blkg);
return -1;
+ }
return 0;
}

--
2.17.1

2019-06-13 22:31:25

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 3/5] blkcg: perpcu_ref init/exit should be done from blkg_alloc/free()

blkg alloc is performed as a separate step from the rest of blkg
creation so that GFP_KERNEL allocations can be used when creating
blkgs from configuration file writes because otherwise user actions
may fail due to failures of opportunistic GFP_NOWAIT allocations.

While making blkgs use percpu_ref, 7fcf2b033b84 ("blkcg: change blkg
reference counting to use percpu_ref") incorrectly added unconditional
opportunistic percpu_ref_init() to blkg_create() breaking this
guarantee.

This patch moves percpu_ref_init() to blkg_alloc() so makes it use
@gfp_mask that blkg_alloc() is called with. Also, percpu_ref_exit()
is moved to blkg_free() for consistency.

Signed-off-by: Tejun Heo <[email protected]>
Fixes: 7fcf2b033b84 ("blkcg: change blkg reference counting to use percpu_ref")
Cc: Dennis Zhou <[email protected]>
---
block/blk-cgroup.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index e4715b35d42c..04d286934c5e 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -79,6 +79,7 @@ static void blkg_free(struct blkcg_gq *blkg)

blkg_rwstat_exit(&blkg->stat_ios);
blkg_rwstat_exit(&blkg->stat_bytes);
+ percpu_ref_exit(&blkg->refcnt);
kfree(blkg);
}

@@ -86,8 +87,6 @@ static void __blkg_release(struct rcu_head *rcu)
{
struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);

- percpu_ref_exit(&blkg->refcnt);
-
/* release the blkcg and parent blkg refs this blkg has been holding */
css_put(&blkg->blkcg->css);
if (blkg->parent)
@@ -132,6 +131,9 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q,
if (!blkg)
return NULL;

+ if (percpu_ref_init(&blkg->refcnt, blkg_release, 0, gfp_mask))
+ goto err_free;
+
if (blkg_rwstat_init(&blkg->stat_bytes, gfp_mask) ||
blkg_rwstat_init(&blkg->stat_ios, gfp_mask))
goto err_free;
@@ -244,11 +246,6 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
blkg_get(blkg->parent);
}

- ret = percpu_ref_init(&blkg->refcnt, blkg_release, 0,
- GFP_NOWAIT | __GFP_NOWARN);
- if (ret)
- goto err_cancel_ref;
-
/* invoke per-policy init */
for (i = 0; i < BLKCG_MAX_POLS; i++) {
struct blkcg_policy *pol = blkcg_policy[i];
@@ -281,8 +278,6 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
blkg_put(blkg);
return ERR_PTR(ret);

-err_cancel_ref:
- percpu_ref_exit(&blkg->refcnt);
err_put_congested:
wb_congested_put(wb_congested);
err_put_css:
--
2.17.1

2019-06-13 22:31:38

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 4/5] blkcg: blkcg_activate_policy() should initialize ancestors first

When blkcg_activate_policy() is creating blkg_policy_data for existing
blkgs, it did in the wrong order - descendants first. Fix it. None
of the existing controllers seem affected by this.

Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-cgroup.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 04d286934c5e..440797293235 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1390,7 +1390,8 @@ int blkcg_activate_policy(struct request_queue *q,

spin_lock_irq(&q->queue_lock);

- list_for_each_entry(blkg, &q->blkg_list, q_node) {
+ /* blkg_list is pushed at the head, reverse walk to init parents first */
+ list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) {
struct blkg_policy_data *pd;

if (blkg->pd[pol->plid])
--
2.17.1

2019-06-13 22:32:02

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 5/5] blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration

wbc_account_io() collects information on cgroup ownership of writeback
pages to determine which cgroup should own the inode. Pages can stay
associated with dead memcgs but we want to avoid attributing IOs to
dead blkcgs as much as possible as the association is likely to be
stale. However, currently, pages associated with dead memcgs
contribute to the accounting delaying and/or confusing the
arbitration.

Fix it by ignoring pages associated with dead memcgs.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Jan Kara <[email protected]>
---
fs/fs-writeback.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e41cbe8e81b9..9ebfb1b28430 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -715,6 +715,7 @@ void wbc_detach_inode(struct writeback_control *wbc)
void wbc_account_io(struct writeback_control *wbc, struct page *page,
size_t bytes)
{
+ struct cgroup_subsys_state *css;
int id;

/*
@@ -726,7 +727,12 @@ void wbc_account_io(struct writeback_control *wbc, struct page *page,
if (!wbc->wb)
return;

- id = mem_cgroup_css_from_page(page)->id;
+ css = mem_cgroup_css_from_page(page);
+ /* dead cgroups shouldn't contribute to inode ownership arbitration */
+ if (!(css->flags & CSS_ONLINE))
+ return;
+
+ id = css->id;

if (id == wbc->wb_id) {
wbc->wb_bytes += bytes;
--
2.17.1

2019-06-13 22:32:17

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 2/5] blkcg: update blkcg_print_stat() to handle larger outputs

Depending on the number of devices, blkcg stats can go over the
default seqfile buf size. seqfile normally retries with a larger
buffer but since the ->pd_stat() addition, blkcg_print_stat() doesn't
tell seqfile that overflow has happened and the output gets printed
truncated. Fix it by calling seq_commit() w/ -1 on possible
overflows.

Signed-off-by: Tejun Heo <[email protected]>
Fixes: 903d23f0a354 ("blk-cgroup: allow controllers to output their own stats")
Cc: [email protected] # v4.19+
Cc: Josef Bacik <[email protected]>
---
block/blk-cgroup.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 1f7127b03490..e4715b35d42c 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1006,8 +1006,12 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
}
next:
if (has_stats) {
- off += scnprintf(buf+off, size-off, "\n");
- seq_commit(sf, off);
+ if (off < size - 1) {
+ off += scnprintf(buf+off, size-off, "\n");
+ seq_commit(sf, off);
+ } else {
+ seq_commit(sf, -1);
+ }
}
}

--
2.17.1

2019-06-15 07:41:27

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCHSET block/for-linus] Assorted blkcg fixes

On 6/13/19 4:30 PM, Tejun Heo wrote:
> Hello,
>
> This patchset contains the following blkcg fixes.
>
> 0001-blk-iolatency-clear-use_delay-when-io.latency-is-set.patch
> 0002-blkcg-update-blkcg_print_stat-to-handle-larger-outpu.patch
> 0003-blkcg-perpcu_ref-init-exit-should-be-done-from-blkg_.patch
> 0004-blkcg-blkcg_activate_policy-should-initialize-ancest.patch
> 0005-blkcg-writeback-dead-memcgs-shouldn-t-contribute-to-.patch
>
> Please refer to each patch's description for details. Patchset is
> also available in the following git branch.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-blkcg-fixes
>
> Thanks. diffstat follows.

Are you fine with these hitting 5.3?

--
Jens Axboe

2019-06-15 15:52:47

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCHSET block/for-linus] Assorted blkcg fixes

Hello,

On Sat, Jun 15, 2019 at 01:40:50AM -0600, Jens Axboe wrote:
> > Please refer to each patch's description for details. Patchset is
> > also available in the following git branch.
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-blkcg-fixes
> >
> > Thanks. diffstat follows.
>
> Are you fine with these hitting 5.3?

Yeah, none of them are very urgent. 5.3 should be fine.

Thanks.

--
tejun

2019-06-15 16:40:51

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCHSET block/for-linus] Assorted blkcg fixes

On 6/15/19 9:50 AM, Tejun Heo wrote:
> Hello,
>
> On Sat, Jun 15, 2019 at 01:40:50AM -0600, Jens Axboe wrote:
>>> Please refer to each patch's description for details. Patchset is
>>> also available in the following git branch.
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-blkcg-fixes
>>>
>>> Thanks. diffstat follows.
>>
>> Are you fine with these hitting 5.3?
>
> Yeah, none of them are very urgent. 5.3 should be fine.

OK great, added for 5.3, thanks.

--
Jens Axboe

2019-06-19 11:21:49

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 5/5] blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration

On Thu 13-06-19 15:30:41, Tejun Heo wrote:
> wbc_account_io() collects information on cgroup ownership of writeback
> pages to determine which cgroup should own the inode. Pages can stay
> associated with dead memcgs but we want to avoid attributing IOs to
> dead blkcgs as much as possible as the association is likely to be
> stale. However, currently, pages associated with dead memcgs
> contribute to the accounting delaying and/or confusing the
> arbitration.
>
> Fix it by ignoring pages associated with dead memcgs.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Jan Kara <[email protected]>

I see Jens has already pulled the changes so this is mostly informative but
the patch looks good to me.

Honza
> ---
> fs/fs-writeback.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index e41cbe8e81b9..9ebfb1b28430 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -715,6 +715,7 @@ void wbc_detach_inode(struct writeback_control *wbc)
> void wbc_account_io(struct writeback_control *wbc, struct page *page,
> size_t bytes)
> {
> + struct cgroup_subsys_state *css;
> int id;
>
> /*
> @@ -726,7 +727,12 @@ void wbc_account_io(struct writeback_control *wbc, struct page *page,
> if (!wbc->wb)
> return;
>
> - id = mem_cgroup_css_from_page(page)->id;
> + css = mem_cgroup_css_from_page(page);
> + /* dead cgroups shouldn't contribute to inode ownership arbitration */
> + if (!(css->flags & CSS_ONLINE))
> + return;
> +
> + id = css->id;
>
> if (id == wbc->wb_id) {
> wbc->wb_bytes += bytes;
> --
> 2.17.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR