2018-08-31 01:55:37

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 00/15] blkcg ref count refactor/cleanup + blkcg avg_lat

Hi everyone,

This is a fairly lengthy patchset that aims to cleanup reference
counting for blkcgs and blkgs. There are 4 problems that this patchset
tries to address:
1. fix blkcg destruction
2. always associate a bio with a blkg
3. remove the extra css ref held by bios and utilize the blkg ref
4. add average latency tracking to blkcg core in io.stat.

First, there is a regression in blkcg destruction where references
weren't properly put causing blkcgs to never be destroyed. Previously,
blkgs were destroyed during offlining of the blkcg. This puts back the
blkcg reference a blkg holds allowing blkcg ref to reach zero. Then,
blkcg_css_free() is called as part of the final cleanup.

To address the first problem, 0001 reverts the broken commit, 0002
delays blkg destruction until writeback has finished, and 0003 closes
the window on a race condition between a css migration and dying, and
blkg association. This should fix the issue where blkg_get() was getting
called when a blkcg had already begun exiting. If a bio finds itself
here, it will just fall back to root. Oddly enough at one point,
blk-throttle was using policy data from and associating with potentially
different blkgs, thus how this was exposed.

0004 also address a similar problem with task_css(current, ...) where
association tries to get a css of a task that is migrating with the
cgroup dying.

Second, both blk-throttle and blk-iolatency rely on blkg association
to enable their policies. Rather than each policy (and future policies)
implement this logic independently, this consolidates it such that
all bios are tagged with a blkg.

Third, with the addition of always having a blkg reference, the blkcg
can now be referenced through it rather than maintaining an additional
pointer and reference. So let's clean this up.

Finally, it seems rather useful to know on average how well IOs are
doing per cgroup. This adds the average latency statistic to core where
it encompasses IOs from all decendants.

This patchset contains the following 15 patches:

0001-Revert-blk-throttle-fix-race-between-blkcg_bio_issue.patch
0002-blkcg-delay-blkg-destruction-until-after-writeback-h.patch
0003-blkcg-use-tryget-logic-when-associating-a-blkg-with-.patch
0004-blkcg-fix-ref-count-issue-with-bio_blkcg-using-task_.patch
0005-blkcg-update-blkg_lookup_create-to-do-locking.patch
0006-blkcg-always-associate-a-bio-with-a-blkg.patch
0007-blkcg-consolidate-bio_issue_init-and-blkg-associatio.patch
0008-blkcg-associate-a-blkg-for-pages-being-evicted-by-sw.patch
0009-blkcg-associate-writeback-bios-with-a-blkg.patch
0010-blkcg-remove-bio-bi_css-and-instead-use-bio-bi_blkg.patch
0011-blkcg-remove-additional-reference-to-the-css.patch
0012-blkcg-cleanup-and-make-blk_get_rl-use-blkg_lookup_cr.patch
0013-blkcg-change-blkg-reference-counting-to-use-percpu_r.patch
0014-blkcg-rename-blkg_try_get-to-blkg_tryget.patch
0015-blkcg-add-average-latency-tracking-to-blk-cgroup.patch

0001-0003 addresses the regression in the blkcg cleanup path.
0004 fixes a small window race condition with task_css(current, ...).
0005 is a prepatory patch that cleans up blkg lookup create.
0006-0009 associates all bios with a blkg: regular IO, swap, writeback.
0010 removes the extra css pointer in bios.
0011 removes the implicit reference left behind in 0010.
0012 cleans up blk_get_rl making use of the new blkg ref
0013 changes blkg ref counting from atomic to percpu.
0014 renames and makes blkg_try_get consistent with css_tryget.
0015 adds average latency tracking as part of blk-cgroup.

This patchset is on top of axboe#for-4.19/block #b86d865cb1ca.

diffstats below:

Dennis Zhou (Facebook) (15):
Revert "blk-throttle: fix race between blkcg_bio_issue_check() and
cgroup_rmdir()"
blkcg: delay blkg destruction until after writeback has finished
blkcg: use tryget logic when associating a blkg with a bio
blkcg: fix ref count issue with bio_blkcg using task_css
blkcg: update blkg_lookup_create to do locking
blkcg: always associate a bio with a blkg
blkcg: consolidate bio_issue_init and blkg association
blkcg: associate a blkg for pages being evicted by swap
blkcg: associate writeback bios with a blkg
blkcg: remove bio->bi_css and instead use bio->bi_blkg
blkcg: remove additional reference to the css
blkcg: cleanup and make blk_get_rl use blkg_lookup_create
blkcg: change blkg reference counting to use percpu_ref
blkcg: rename blkg_try_get to blkg_tryget
blkcg: add average latency tracking to blk-cgroup

Documentation/admin-guide/cgroup-v2.rst | 14 +-
block/bio.c | 187 +++++++++++----
block/blk-cgroup.c | 298 +++++++++++++++++-------
block/blk-iolatency.c | 26 +--
block/blk-throttle.c | 12 +-
block/bounce.c | 4 +-
drivers/block/loop.c | 5 +-
drivers/md/raid0.c | 2 +-
fs/buffer.c | 10 +-
fs/ext4/page-io.c | 2 +-
include/linux/bio.h | 23 +-
include/linux/blk-cgroup.h | 167 ++++++++-----
include/linux/blk_types.h | 1 -
include/linux/cgroup.h | 2 +
include/linux/writeback.h | 5 +-
kernel/cgroup/cgroup.c | 4 +-
kernel/trace/blktrace.c | 4 +-
mm/backing-dev.c | 5 +
mm/page_io.c | 2 +-
19 files changed, 520 insertions(+), 253 deletions(-)

Thanks,
Dennis


2018-08-31 01:55:52

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 15/15] blkcg: add average latency tracking to blk-cgroup

From: "Dennis Zhou (Facebook)" <[email protected]>

Latency is an important metric to understanding whether or not you're
receiving adequate service from your block devices. blk-iolatency
demonstrates the utility of such information.

This patch introduces a moving average to track latency to blk-cgroup.
The value can be found in all non-root cgroups in io.stat. A bio's
latency is counted and propagated up to, but excluding, the root cgroup.
It uses a minimum window of 1s and windows only elapse with active bios.
A single value is contributed to the moving average from each window.
The percpu stats are long running, thus each interval requires
calculating the delta between the previous read and current read.

Signed-off-by: Dennis Zhou <[email protected]>
---
Documentation/admin-guide/cgroup-v2.rst | 6 +-
block/bio.c | 3 +
block/blk-cgroup.c | 117 +++++++++++++++++++++++-
include/linux/blk-cgroup.h | 9 ++
4 files changed, 127 insertions(+), 8 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 2dc8f95077aa..1cdc0e4279c5 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1521,9 +1521,9 @@ IO Latency Interface Files

avg_lat
This is an exponential moving average with a decay rate of 1/exp
- bound by the sampling interval. The decay rate interval can be
- calculated by multiplying the win value in io.stat by the
- corresponding number of samples based on the win value.
+ every 12 samples, with a sampling rate of 1s. Only IO activity
+ can elapse a window and idle periods extend the most recent
+ window.

win
The sampling window size in milliseconds. This is the minimum
diff --git a/block/bio.c b/block/bio.c
index a0b816811e7d..2739e6f5acb7 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1720,6 +1720,9 @@ void bio_endio(struct bio *bio)
if (!bio_integrity_endio(bio))
return;

+ if (bio->bi_blkg && bio->bi_blkg->parent)
+ blkg_record_latency(bio);
+
if (bio->bi_disk)
rq_qos_done_bio(bio->bi_disk->queue, bio);

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 1eaf097e38b0..b720ca629eea 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -17,6 +17,7 @@
#include <linux/ioprio.h>
#include <linux/kdev_t.h>
#include <linux/module.h>
+#include <linux/sched/loadavg.h>
#include <linux/sched/signal.h>
#include <linux/err.h>
#include <linux/blkdev.h>
@@ -32,6 +33,18 @@

#define MAX_KEY_LEN 100

+/*
+ * This constant is used to fake the fixed-point moving average calculation
+ * just like load average for blkg->lat_avg. The call to CALC_LOAD folds
+ * (FIXED_1 (2048) - exp_factor) * new_sample into lat_avg. The sampling
+ * window size is fixed to 1s, so BLKCG_EXP_12s is the corresponding value
+ * to create a 1/exp decay rate every 12s when windows elapse immediately.
+ * Note, windows only elapse with IO activity and idle periods extend the
+ * most recent window.
+ */
+#define BLKG_EXP_12s 1884
+#define BLKG_STAT_WIN_SIZE NSEC_PER_SEC
+
/*
* blkcg_pol_mutex protects blkcg_policy[] and policy [de]activation.
* blkcg_pol_register_mutex nests outside of it and synchronizes entire
@@ -72,6 +85,9 @@ static void blkg_free(struct blkcg_gq *blkg)
if (!blkg)
return;

+ if (blkg->rq_stat)
+ free_percpu(blkg->rq_stat);
+
for (i = 0; i < BLKCG_MAX_POLS; i++)
if (blkg->pd[i])
blkcg_policy[i]->pd_free_fn(blkg->pd[i]);
@@ -120,7 +136,7 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q,
gfp_t gfp_mask)
{
struct blkcg_gq *blkg;
- int i;
+ int i, cpu;

/* alloc and init base part */
blkg = kzalloc_node(sizeof(*blkg), gfp_mask, q->node);
@@ -159,6 +175,20 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q,
pd->plid = i;
}

+ /* init rq_stats */
+ blkg->rq_stat = __alloc_percpu_gfp(sizeof(struct blk_rq_stat),
+ __alignof__(struct blk_rq_stat),
+ gfp_mask);
+ if (!blkg->rq_stat)
+ goto err_free;
+ for_each_possible_cpu(cpu) {
+ struct blk_rq_stat *s;
+ s = per_cpu_ptr(blkg->rq_stat, cpu);
+ blk_rq_stat_init(s);
+ }
+ blk_rq_stat_init(&blkg->last_rq_stat);
+ atomic64_set(&blkg->win_start, ktime_to_ns(ktime_get()));
+
return blkg;

err_free:
@@ -981,7 +1011,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
const char *dname;
char *buf;
struct blkg_rwstat rwstat;
- u64 rbytes, wbytes, rios, wios, dbytes, dios;
+ u64 rbytes, wbytes, rios, wios, dbytes, dios, avg_lat;
size_t size = seq_get_buf(sf, &buf), off = 0;
int i;
bool has_stats = false;
@@ -1012,14 +1042,16 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
wios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_WRITE]);
dios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_DISCARD]);

+ avg_lat = div64_u64(blkg->lat_avg, NSEC_PER_USEC);
+
spin_unlock_irq(blkg->q->queue_lock);

if (rbytes || wbytes || rios || wios) {
has_stats = true;
off += scnprintf(buf+off, size-off,
- "rbytes=%llu wbytes=%llu rios=%llu wios=%llu dbytes=%llu dios=%llu",
- rbytes, wbytes, rios, wios,
- dbytes, dios);
+ "rbytes=%llu wbytes=%llu rios=%llu wios=%llu dbytes=%llu dios=%llu avg_lat=%llu",
+ rbytes, wbytes, rios, wios, dbytes, dios,
+ avg_lat);
}

if (!blkcg_debug_stats)
@@ -1638,6 +1670,81 @@ void blkcg_policy_unregister(struct blkcg_policy *pol)
}
EXPORT_SYMBOL_GPL(blkcg_policy_unregister);

+/*
+ * This aggregates the latency of all bios under this cgroup and then
+ * advances the moving average window. A window contributes a single
+ * value to the moving average regardless of how many IOs occurred.
+ */
+static void blkg_aggregate_latency(struct blkcg_gq *blkg)
+{
+ struct blk_rq_stat rq_stat;
+ struct blk_rq_stat *last_rq_stat;
+ u64 mean;
+ int cpu;
+
+ blk_rq_stat_init(&rq_stat);
+ preempt_disable();
+ for_each_online_cpu(cpu) {
+ struct blk_rq_stat *s;
+ s = per_cpu_ptr(blkg->rq_stat, cpu);
+ blk_rq_stat_sum(&rq_stat, s);
+ }
+ preempt_enable();
+
+ last_rq_stat = &blkg->last_rq_stat;
+
+ mean = div64_u64(rq_stat.nr_samples * rq_stat.mean -
+ last_rq_stat->nr_samples * last_rq_stat->mean,
+ rq_stat.nr_samples - last_rq_stat->nr_samples);
+ CALC_LOAD(blkg->lat_avg, BLKG_EXP_12s, mean);
+ blkg->last_rq_stat = rq_stat;
+}
+
+/**
+ * blkg_record_latency - records the latency of a bio
+ * @bio: bio of interest
+ *
+ * This records the latency of a bio in all nodes up to root, excluding root.
+ */
+void blkg_record_latency(struct bio *bio)
+{
+ u64 now = ktime_to_ns(ktime_get());
+ u64 start = bio_issue_time(&bio->bi_issue);
+ u64 win_start, req_time;
+ struct blkcg_gq *blkg;
+ struct blk_rq_stat *rq_stat;
+ bool issue_as_root = bio_issue_as_root_blkg(bio);
+
+ blkg = bio->bi_blkg;
+ if (!blkg)
+ return;
+
+ /*
+ * Have to do this so we are truncated to the correct time that our
+ * issue is truncated to.
+ */
+ now = __bio_issue_time(now);
+
+ if (now <= start || issue_as_root)
+ return;
+
+ req_time = now - start;
+
+ while (blkg && blkg->parent) {
+ rq_stat = get_cpu_ptr(blkg->rq_stat);
+ blk_rq_stat_add(rq_stat, req_time);
+ put_cpu_ptr(rq_stat);
+
+ win_start = atomic64_read(&blkg->win_start);
+ if (now > win_start && (now - win_start) >= BLKG_STAT_WIN_SIZE)
+ if (atomic64_cmpxchg(&blkg->win_start,
+ win_start, now) == win_start)
+ blkg_aggregate_latency(blkg);
+
+ blkg = blkg->parent;
+ }
+}
+
/*
* Scale the accumulated delay based on how long it has been since we updated
* the delay. We only call this when we are adding delay, in case it's been a
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 0134cdd270b8..215af051f876 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -136,6 +136,11 @@ struct blkcg_gq {

struct blkg_policy_data *pd[BLKCG_MAX_POLS];

+ struct blk_rq_stat __percpu *rq_stat;
+ struct blk_rq_stat last_rq_stat;
+ atomic64_t win_start;
+ u64 lat_avg;
+
struct rcu_head rcu_head;

atomic_t use_delay;
@@ -895,6 +900,8 @@ static inline void blkcg_clear_delay(struct blkcg_gq *blkg)
}
}

+void blkg_record_latency(struct bio *bio);
+
void blkcg_add_delay(struct blkcg_gq *blkg, u64 now, u64 delta);
void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay);
void blkcg_maybe_throttle_current(void);
@@ -917,6 +924,8 @@ struct blkcg_policy {

#define blkcg_root_css ((struct cgroup_subsys_state *)ERR_PTR(-EINVAL))

+static inline void blkg_record_latency(struct bio *bio) {}
+
static inline void blkcg_maybe_throttle_current(void) { }
static inline bool blk_cgroup_congested(void) { return false; }

--
2.17.1


2018-08-31 01:56:02

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 12/15] blkcg: cleanup and make blk_get_rl use blkg_lookup_create

From: "Dennis Zhou (Facebook)" <[email protected]>

blk_get_rl is responsible for identifying which request_list a request
should be allocated to. Try get logic was added earlier, but
semantically the logic was not changed.

This patch makes better use of the bio already having a reference to the
blkg in the hot path. The cold path uses a better fallback of
blkg_lookup_create rather than just blkg_lookup and then falling back to
the q->root_rl. If lookup_create fails with anything but -ENODEV, it
falls back to q->root_rl.

A clarifying comment is added to explain why q->root_rl is used rather
than the root blkg's rl.

Signed-off-by: Dennis Zhou <[email protected]>
---
include/linux/blk-cgroup.h | 53 ++++++++++++++++++++++++++------------
1 file changed, 37 insertions(+), 16 deletions(-)

diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 3eed491e4daa..97cb82029b18 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -537,28 +537,49 @@ static inline struct request_list *blk_get_rl(struct request_queue *q,

rcu_read_lock();

- blkcg = bio_blkcg(bio);
- if (!blkcg)
+ blkg = bio->bi_blkg;
+ if (blkg) {
+ blkcg = bio->bi_blkg->blkcg;
+ if (blkcg == &blkcg_root)
+ goto rl_use_root;
+
+ blkg_get(blkg);
+ return &blkg->rl;
+ }
+
+ while (true) {
blkcg = css_to_blkcg(blkcg_css());
+ if (blkcg == &blkcg_root)
+ goto rl_use_root;

- /* bypass blkg lookup and use @q->root_rl directly for root */
- if (blkcg == &blkcg_root)
- goto root_rl;
+ blkg = blkg_lookup(blkcg, q);

- /*
- * Try to use blkg->rl. blkg lookup may fail under memory pressure
- * or if either the blkcg or queue is going away. Fall back to
- * root_rl in such cases.
- */
- blkg = blkg_lookup(blkcg, q);
- if (unlikely(!blkg))
- goto root_rl;
+ if (unlikely(!blkg))
+ blkg = __blkg_lookup_create(blkcg, q);
+
+ if (IS_ERR(blkg)) {
+ if (PTR_ERR(blkg) == -ENODEV) {
+ cpu_relax();
+ continue;
+ } else {
+ goto rl_use_root;
+ }
+ }
+
+ if (blkg_try_get(blkg))
+ break;
+ cpu_relax();
+ }

- if (!blkg_try_get(blkg))
- goto root_rl;
rcu_read_unlock();
return &blkg->rl;
-root_rl:
+
+ /*
+ * Each blkg has its own request_list, however, the root blkcg
+ * uses the request_queue's root_rl. This is to avoid most
+ * overhead for the root blkcg.
+ */
+rl_use_root:
rcu_read_unlock();
return &q->root_rl;
}
--
2.17.1


2018-08-31 01:56:24

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 11/15] blkcg: remove additional reference to the css

From: "Dennis Zhou (Facebook)" <[email protected]>

The previous patch in this series removed carrying around a pointer to
the css in blkg. However, the blkg association logic still relied on
taking a reference on the css to ensure we wouldn't fail in getting a
reference for the blkg.

Here we remove the implicit dependency on the css and utilize tryget and
retry logic during association. This streamlines the three ways that
association can happen: generic, swap, writeback. They now share common
association logic with separate retry mechanisms for obtaining a copy of
the css.

Signed-off-by: Dennis Zhou <[email protected]>
---
block/bio.c | 89 +++++++++++++++++++++++++++-----------
include/linux/blk-cgroup.h | 35 ++++-----------
include/linux/cgroup.h | 2 +
kernel/cgroup/cgroup.c | 4 +-
4 files changed, 77 insertions(+), 53 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index ec55ee810503..b792bffecce1 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1949,18 +1949,42 @@ int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
return 0;
}

+/**
+ * __bio_associate_blkg_from_css - internal blkg association function
+ *
+ * This in the core association function that all association paths rely on.
+ * This handles -ENOMEM, but propagates -ENODEV to allow for separate retry
+ * scenarios. This takes a reference on the blkg, which is released upon
+ * freeing of the bio.
+ */
static int __bio_associate_blkg_from_css(struct bio *bio,
struct cgroup_subsys_state *css)
{
+ struct request_queue *q = bio->bi_disk->queue;
struct blkcg_gq *blkg;
+ int ret;

rcu_read_lock();

- blkg = blkg_lookup_create(css_to_blkcg(css), bio->bi_disk->queue);
+ if (!css || !css->parent) {
+ blkg = q->root_blkg;
+ } else {
+ blkg = blkg_lookup_create(css_to_blkcg(css), q);
+
+ if (IS_ERR(blkg)) {
+ ret = PTR_ERR(blkg);
+ if (ret != -ENOMEM)
+ blkg = q->root_blkg;
+ else
+ goto afc_out;
+ }
+ }

- rcu_read_unlock();
+ ret = bio_associate_blkg(bio, blkg);

- return bio_associate_blkg(bio, blkg);
+afc_out:
+ rcu_read_unlock();
+ return ret;
}

/**
@@ -1969,14 +1993,18 @@ static int __bio_associate_blkg_from_css(struct bio *bio,
* @css: target css
*
* Associate @bio with the blkg found by combining the css's blkg and the
- * request_queue of the @bio. This takes a reference on the css that will
- * be put upon freeing of @bio.
+ * request_queue of the @bio. This falls back to the queue's root_blkg if
+ * the association fails with the css.
*/
int bio_associate_blkg_from_css(struct bio *bio,
struct cgroup_subsys_state *css)
{
- css_get(css);
- return __bio_associate_blkg_from_css(bio, css);
+ if (unlikely(bio->bi_blkg))
+ return -EBUSY;
+ /* there is no retry to get another css so fallback to the root_blkg */
+ if (__bio_associate_blkg_from_css(bio, css))
+ bio_associate_blkg(bio, bio->bi_disk->queue->root_blkg);
+ return 0;
}
EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);

@@ -1987,22 +2015,35 @@ EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);
* @page: the page to lookup the blkcg from
*
* Associate @bio with the blkg from @page's owning memcg and the respective
- * request_queue. This works like every other associate function wrt
- * references.
+ * request_queue. If cgroup_e_css returns NULL, fall back to the queue's
+ * root_blkg.
*
* Note: this must be called after bio has an associated device.
*/
int bio_associate_blkg_from_page(struct bio *bio, struct page *page)
{
struct cgroup_subsys_state *css;
+ int ret;

if (unlikely(bio->bi_blkg))
return -EBUSY;
if (!page->mem_cgroup)
return 0;
- css = cgroup_get_e_css(page->mem_cgroup->css.cgroup, &io_cgrp_subsys);

- return __bio_associate_blkg_from_css(bio, css);
+ rcu_read_lock();
+
+ while (true) {
+ css = cgroup_e_css(page->mem_cgroup->css.cgroup,
+ &io_cgrp_subsys);
+
+ ret = __bio_associate_blkg_from_css(bio, css);
+ if (ret != -ENODEV)
+ break;
+ cpu_relax();
+ }
+
+ rcu_read_unlock();
+ return ret;
}
#endif /* CONFIG_MEMCG */

@@ -2012,12 +2053,12 @@ int bio_associate_blkg_from_page(struct bio *bio, struct page *page)
* @bio: target bio
*
* Associate @bio with the blkg found from the bio's css and the request_queue.
- * If one is not found, bio_lookup_blkg creates the blkg.
+ * If one is not found, bio_lookup_blkg creates the blkg. This falls back to
+ * the queue's root_blkg if association fails.
*/
int bio_associate_create_blkg(struct request_queue *q, struct bio *bio)
{
- struct blkcg *blkcg;
- struct blkcg_gq *blkg;
+ struct cgroup_subsys_state *css;
int ret = 0;

/* someone has already associated this bio with a blkg */
@@ -2026,19 +2067,19 @@ int bio_associate_create_blkg(struct request_queue *q, struct bio *bio)

rcu_read_lock();

- blkcg = css_to_blkcg(blkcg_get_css());
+ while (true) {
+ css = blkcg_css();

- if (!blkcg->css.parent) {
- ret = bio_associate_blkg(bio, q->root_blkg);
- goto assoc_out;
+ ret = __bio_associate_blkg_from_css(bio, css);
+ if (ret != -ENODEV)
+ break;
+ cpu_relax();
}

- blkg = blkg_lookup_create(blkcg, q);
- if (IS_ERR(blkg))
- blkg = q->root_blkg;
+ /* explicitly fall back to root */
+ if (unlikely(!bio->bi_blkg))
+ bio_associate_blkg(bio, q->root_blkg);

- ret = bio_associate_blkg(bio, blkg);
-assoc_out:
rcu_read_unlock();
return ret;
}
@@ -2054,8 +2095,6 @@ void bio_disassociate_task(struct bio *bio)
bio->bi_ioc = NULL;
}
if (bio->bi_blkg) {
- /* a ref is always taken on css */
- css_put(&bio_blkcg(bio)->css);
blkg_put(bio->bi_blkg);
bio->bi_blkg = NULL;
}
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 3c66154709ed..3eed491e4daa 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -233,31 +233,18 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
void blkg_conf_finish(struct blkg_conf_ctx *ctx);

/**
- * blkcg_get_css - find and get a reference to the css
+ * blk_css - find the current css
*
* Find the css associated with either the kthread or the current task.
*/
-static inline struct cgroup_subsys_state *blkcg_get_css(void)
+static inline struct cgroup_subsys_state *blkcg_css(void)
{
struct cgroup_subsys_state *css;

- rcu_read_lock();
-
css = kthread_blkcg();
- if (css) {
- css_get(css);
- } else {
- while (true) {
- css = task_css(current, io_cgrp_id);
- if (likely(css_tryget(css)))
- break;
- cpu_relax();
- }
- }
-
- rcu_read_unlock();
-
- return css;
+ if (css)
+ return css;
+ return task_css(current, io_cgrp_id);
}

static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css)
@@ -551,11 +538,8 @@ static inline struct request_list *blk_get_rl(struct request_queue *q,
rcu_read_lock();

blkcg = bio_blkcg(bio);
- if (blkcg) {
- css_get(&blkcg->css);
- } else {
- blkcg = css_to_blkcg(blkcg_get_css());
- }
+ if (!blkcg)
+ blkcg = css_to_blkcg(blkcg_css());

/* bypass blkg lookup and use @q->root_rl directly for root */
if (blkcg == &blkcg_root)
@@ -570,7 +554,8 @@ static inline struct request_list *blk_get_rl(struct request_queue *q,
if (unlikely(!blkg))
goto root_rl;

- blkg_get(blkg);
+ if (!blkg_try_get(blkg))
+ goto root_rl;
rcu_read_unlock();
return &blkg->rl;
root_rl:
@@ -587,8 +572,6 @@ static inline struct request_list *blk_get_rl(struct request_queue *q,
*/
static inline void blk_put_rl(struct request_list *rl)
{
- /* an additional ref is always taken for rl */
- css_put(&rl->blkg->blkcg->css);
if (rl->blkg->blkcg != &blkcg_root)
blkg_put(rl->blkg);
}
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index c9fdf6f57913..0c4d56acfdca 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -93,6 +93,8 @@ extern struct css_set init_css_set;

bool css_has_online_children(struct cgroup_subsys_state *css);
struct cgroup_subsys_state *css_from_id(int id, struct cgroup_subsys *ss);
+struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgroup,
+ struct cgroup_subsys *ss);
struct cgroup_subsys_state *cgroup_get_e_css(struct cgroup *cgroup,
struct cgroup_subsys *ss);
struct cgroup_subsys_state *css_tryget_online_from_dir(struct dentry *dentry,
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 077370bf8964..d3fa4bdd7407 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -498,8 +498,8 @@ static struct cgroup_subsys_state *cgroup_tryget_css(struct cgroup *cgrp,
* enabled. If @ss is associated with the hierarchy @cgrp is on, this
* function is guaranteed to return non-NULL css.
*/
-static struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp,
- struct cgroup_subsys *ss)
+struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp,
+ struct cgroup_subsys *ss)
{
lockdep_assert_held(&cgroup_mutex);

--
2.17.1


2018-08-31 01:56:38

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 14/15] blkcg: rename blkg_try_get to blkg_tryget

From: "Dennis Zhou (Facebook)" <[email protected]>

blkg reference counting now uses percpu_ref rather than atomic_t. Let's
make this consistent with css_tryget. This renames blkg_try_get to
blkg_tryget and now returns a bool rather than the blkg or NULL.

Signed-off-by: Dennis Zhou <[email protected]>
---
block/bio.c | 2 +-
block/blk-cgroup.c | 3 +--
block/blk-iolatency.c | 2 +-
include/linux/blk-cgroup.h | 10 ++++------
4 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index b792bffecce1..a0b816811e7d 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1943,7 +1943,7 @@ int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
{
if (unlikely(bio->bi_blkg))
return -EBUSY;
- if (!blkg_try_get(blkg))
+ if (!blkg_tryget(blkg))
return -ENODEV;
bio->bi_blkg = blkg;
return 0;
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index bbea4b44bd8f..1eaf097e38b0 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1777,8 +1777,7 @@ void blkcg_maybe_throttle_current(void)
blkg = blkg_lookup(blkcg, q);
if (!blkg)
goto out;
- blkg = blkg_try_get(blkg);
- if (!blkg)
+ if (!blkg_tryget(blkg))
goto out;
rcu_read_unlock();

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 9d7052bad6f7..5a4cec54c998 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -628,7 +628,7 @@ static void blkiolatency_timer_fn(struct timer_list *t)
* We could be exiting, don't access the pd unless we have a
* ref on the blkg.
*/
- if (!blkg_try_get(blkg))
+ if (!blkg_tryget(blkg))
continue;

iolat = blkg_to_lat(blkg);
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index b60d063fb0d7..0134cdd270b8 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -459,17 +459,15 @@ static inline void blkg_get(struct blkcg_gq *blkg)
}

/**
- * blkg_try_get - try and get a blkg reference
+ * blkg_tryget - try and get a blkg reference
* @blkg: blkg to get
*
* This is for use when doing an RCU lookup of the blkg. We may be in the midst
* of freeing this blkg, so we can only use it if the refcnt is not zero.
*/
-static inline struct blkcg_gq *blkg_try_get(struct blkcg_gq *blkg)
+static inline bool blkg_tryget(struct blkcg_gq *blkg)
{
- if (percpu_ref_tryget(&blkg->refcnt))
- return blkg;
- return NULL;
+ return percpu_ref_tryget(&blkg->refcnt);
}

/**
@@ -560,7 +558,7 @@ static inline struct request_list *blk_get_rl(struct request_queue *q,
}
}

- if (blkg_try_get(blkg))
+ if (blkg_tryget(blkg))
break;
cpu_relax();
}
--
2.17.1


2018-08-31 01:56:41

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 13/15] blkcg: change blkg reference counting to use percpu_ref

From: "Dennis Zhou (Facebook)" <[email protected]>

Now that every bio is associated with a blkg, this puts the use of
blkg_get, blkg_try_get, and blkg_put on the hot path. This switches over
the refcnt in blkg to use percpu_ref.

Signed-off-by: Dennis Zhou <[email protected]>
---
block/blk-cgroup.c | 56 +++++++++++++++++++++-----------------
include/linux/blk-cgroup.h | 14 +++-------
2 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index f678cd555814..bbea4b44bd8f 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -84,6 +84,30 @@ static void blkg_free(struct blkcg_gq *blkg)
kfree(blkg);
}

+/*
+ * A group is RCU protected, but having an rcu lock does not mean that one
+ * can access all the fields of blkg and assume these are valid. For
+ * example, don't try to follow throtl_data and request queue links.
+ *
+ * Having a reference to blkg under an rcu allows accesses to only values
+ * local to groups like group stats and group rate limits.
+ */
+static void __blkg_release(struct percpu_ref *ref)
+{
+ struct blkcg_gq *blkg = container_of(ref, struct blkcg_gq, refcnt);
+
+ percpu_ref_exit(&blkg->refcnt);
+
+ /* release the blkcg and parent blkg refs this blkg has been holding */
+ css_put(&blkg->blkcg->css);
+ if (blkg->parent)
+ blkg_put(blkg->parent);
+
+ wb_congested_put(blkg->wb_congested);
+
+ blkg_free(blkg);
+}
+
/**
* blkg_alloc - allocate a blkg
* @blkcg: block cgroup the new blkg is associated with
@@ -110,7 +134,6 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q,
blkg->q = q;
INIT_LIST_HEAD(&blkg->q_node);
blkg->blkcg = blkcg;
- atomic_set(&blkg->refcnt, 1);

/* root blkg uses @q->root_rl, init rl only for !root blkgs */
if (blkcg != &blkcg_root) {
@@ -217,6 +240,10 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
blkg_get(blkg->parent);
}

+ ret = percpu_ref_init(&blkg->refcnt, __blkg_release, 0, GFP_KERNEL);
+ if (ret)
+ goto err_cancel_ref;
+
/* invoke per-policy init */
for (i = 0; i < BLKCG_MAX_POLS; i++) {
struct blkcg_policy *pol = blkcg_policy[i];
@@ -249,6 +276,8 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
blkg_put(blkg);
return ERR_PTR(ret);

+err_cancel_ref:
+ percpu_ref_exit(&blkg->refcnt);
err_put_congested:
wb_congested_put(wb_congested);
err_put_css:
@@ -378,7 +407,7 @@ static void blkg_destroy(struct blkcg_gq *blkg)
* Put the reference taken at the time of creation so that when all
* queues are gone, group can be destroyed.
*/
- blkg_put(blkg);
+ percpu_ref_kill(&blkg->refcnt);
}

/**
@@ -405,29 +434,6 @@ static void blkg_destroy_all(struct request_queue *q)
q->root_rl.blkg = NULL;
}

-/*
- * A group is RCU protected, but having an rcu lock does not mean that one
- * can access all the fields of blkg and assume these are valid. For
- * example, don't try to follow throtl_data and request queue links.
- *
- * Having a reference to blkg under an rcu allows accesses to only values
- * local to groups like group stats and group rate limits.
- */
-void __blkg_release_rcu(struct rcu_head *rcu_head)
-{
- struct blkcg_gq *blkg = container_of(rcu_head, struct blkcg_gq, rcu_head);
-
- /* release the blkcg and parent blkg refs this blkg has been holding */
- css_put(&blkg->blkcg->css);
- if (blkg->parent)
- blkg_put(blkg->parent);
-
- wb_congested_put(blkg->wb_congested);
-
- blkg_free(blkg);
-}
-EXPORT_SYMBOL_GPL(__blkg_release_rcu);
-
/*
* The next function used by blk_queue_for_each_rl(). It's a bit tricky
* because the root blkg uses @q->root_rl instead of its own rl.
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 97cb82029b18..b60d063fb0d7 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -126,7 +126,7 @@ struct blkcg_gq {
struct request_list rl;

/* reference count */
- atomic_t refcnt;
+ struct percpu_ref refcnt;

/* is this blkg online? protected by both blkcg and q locks */
bool online;
@@ -455,8 +455,7 @@ static inline int blkg_path(struct blkcg_gq *blkg, char *buf, int buflen)
*/
static inline void blkg_get(struct blkcg_gq *blkg)
{
- WARN_ON_ONCE(atomic_read(&blkg->refcnt) <= 0);
- atomic_inc(&blkg->refcnt);
+ percpu_ref_get(&blkg->refcnt);
}

/**
@@ -468,23 +467,18 @@ static inline void blkg_get(struct blkcg_gq *blkg)
*/
static inline struct blkcg_gq *blkg_try_get(struct blkcg_gq *blkg)
{
- if (atomic_inc_not_zero(&blkg->refcnt))
+ if (percpu_ref_tryget(&blkg->refcnt))
return blkg;
return NULL;
}

-
-void __blkg_release_rcu(struct rcu_head *rcu);
-
/**
* blkg_put - put a blkg reference
* @blkg: blkg to put
*/
static inline void blkg_put(struct blkcg_gq *blkg)
{
- WARN_ON_ONCE(atomic_read(&blkg->refcnt) <= 0);
- if (atomic_dec_and_test(&blkg->refcnt))
- call_rcu(&blkg->rcu_head, __blkg_release_rcu);
+ percpu_ref_put(&blkg->refcnt);
}

/**
--
2.17.1


2018-08-31 01:56:42

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 10/15] blkcg: remove bio->bi_css and instead use bio->bi_blkg

From: "Dennis Zhou (Facebook)" <[email protected]>

Prior patches ensured that all bios are now associated with some blkg.
This now makes bio->bi_css unnecessary as blkg maintains a reference to
the blkcg already.

This patch removes the field bi_css and transfers corresponding uses to
access via bi_blkg.

Signed-off-by: Dennis Zhou <[email protected]>
---
block/bio.c | 56 ++++++++------------------------------
block/bounce.c | 2 +-
drivers/block/loop.c | 5 ++--
drivers/md/raid0.c | 2 +-
include/linux/bio.h | 7 ++---
include/linux/blk-cgroup.h | 4 +--
include/linux/blk_types.h | 1 -
kernel/trace/blktrace.c | 4 +--
8 files changed, 22 insertions(+), 59 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 97ef994a08b6..ec55ee810503 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -609,7 +609,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
bio->bi_iter = bio_src->bi_iter;
bio->bi_io_vec = bio_src->bi_io_vec;

- bio_clone_blkcg_association(bio, bio_src);
+ bio_clone_blkg_association(bio, bio_src);

bio_issue_init(&bio->bi_issue, bio_sectors(bio));
}
@@ -1930,34 +1930,6 @@ EXPORT_SYMBOL(bioset_init_from_src);

#ifdef CONFIG_BLK_CGROUP

-/**
- * bio_associate_blkcg - associate a bio with the specified blkcg
- * @bio: target bio
- * @blkcg_css: css of the blkcg to associate
- *
- * Associate @bio with the blkcg specified by @blkcg_css. Block layer will
- * treat @bio as if it were issued by a task which belongs to the blkcg.
- *
- * This function takes an extra reference of @blkcg_css which will be put
- * when @bio is released. The caller must own @bio and is responsible for
- * synchronizing calls to this function. If @blkcg_css is NULL, a call to
- * blkcg_get_css finds the current css from the kthread or task.
- */
-int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css)
-{
- if (unlikely(bio->bi_css))
- return -EBUSY;
-
- if (blkcg_css)
- css_get(blkcg_css);
- else
- blkcg_css = blkcg_get_css();
-
- bio->bi_css = blkcg_css;
- return 0;
-}
-EXPORT_SYMBOL_GPL(bio_associate_blkcg);
-
/**
* bio_associate_blkg - associate a bio with the specified blkg
* @bio: target bio
@@ -2004,7 +1976,6 @@ int bio_associate_blkg_from_css(struct bio *bio,
struct cgroup_subsys_state *css)
{
css_get(css);
- bio->bi_css = css;
return __bio_associate_blkg_from_css(bio, css);
}
EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);
@@ -2025,12 +1996,11 @@ int bio_associate_blkg_from_page(struct bio *bio, struct page *page)
{
struct cgroup_subsys_state *css;

- if (unlikely(bio->bi_css))
+ if (unlikely(bio->bi_blkg))
return -EBUSY;
if (!page->mem_cgroup)
return 0;
css = cgroup_get_e_css(page->mem_cgroup->css.cgroup, &io_cgrp_subsys);
- bio->bi_css = css;

return __bio_associate_blkg_from_css(bio, css);
}
@@ -2056,8 +2026,7 @@ int bio_associate_create_blkg(struct request_queue *q, struct bio *bio)

rcu_read_lock();

- bio_associate_blkcg(bio, NULL);
- blkcg = bio_blkcg(bio);
+ blkcg = css_to_blkcg(blkcg_get_css());

if (!blkcg->css.parent) {
ret = bio_associate_blkg(bio, q->root_blkg);
@@ -2084,30 +2053,27 @@ void bio_disassociate_task(struct bio *bio)
put_io_context(bio->bi_ioc);
bio->bi_ioc = NULL;
}
- if (bio->bi_css) {
- css_put(bio->bi_css);
- bio->bi_css = NULL;
- }
if (bio->bi_blkg) {
+ /* a ref is always taken on css */
+ css_put(&bio_blkcg(bio)->css);
blkg_put(bio->bi_blkg);
bio->bi_blkg = NULL;
}
}

/**
- * bio_clone_blkcg_association - clone blkcg association from src to dst bio
+ * bio_clone_blkg_association - clone blkg association from src to dst bio
* @dst: destination bio
* @src: source bio
*/
-void bio_clone_blkcg_association(struct bio *dst, struct bio *src)
+void bio_clone_blkg_association(struct bio *dst, struct bio *src)
{
- if (src->bi_css)
- WARN_ON(bio_associate_blkcg(dst, src->bi_css));
-
- if (src->bi_blkg)
+ if (src->bi_blkg) {
+ css_get(&bio_blkcg(src)->css);
bio_associate_blkg(dst, src->bi_blkg);
+ }
}
-EXPORT_SYMBOL_GPL(bio_clone_blkcg_association);
+EXPORT_SYMBOL_GPL(bio_clone_blkg_association);
#endif /* CONFIG_BLK_CGROUP */

static void __init biovec_init_slabs(void)
diff --git a/block/bounce.c b/block/bounce.c
index bea3b0cbe4a7..9f7071e176a4 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -257,7 +257,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src, gfp_t gfp_mask,
}
}

- bio_clone_blkcg_association(bio, bio_src);
+ bio_clone_blkg_association(bio, bio_src);

bio_issue_init(&bio->bi_issue, bio_sectors(bio));

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index ea9debf59b22..abad6d15f956 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -77,6 +77,7 @@
#include <linux/falloc.h>
#include <linux/uio.h>
#include <linux/ioprio.h>
+#include <linux/blk-cgroup.h>

#include "loop.h"

@@ -1760,8 +1761,8 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,

/* always use the first bio's css */
#ifdef CONFIG_BLK_CGROUP
- if (cmd->use_aio && rq->bio && rq->bio->bi_css) {
- cmd->css = rq->bio->bi_css;
+ if (cmd->use_aio && rq->bio && rq->bio->bi_blkg) {
+ cmd->css = &bio_blkcg(rq->bio)->css;
css_get(cmd->css);
} else
#endif
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index ac1cffd2a09b..f3fb5bb8c82a 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -542,7 +542,7 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
!discard_bio)
continue;
bio_chain(discard_bio, bio);
- bio_clone_blkcg_association(discard_bio, bio);
+ bio_clone_blkg_association(discard_bio, bio);
if (mddev->gendisk)
trace_block_bio_remap(bdev_get_queue(rdev->bdev),
discard_bio, disk_devt(mddev->gendisk),
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 81530d11d34d..175e5e349d30 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -554,16 +554,13 @@ static inline int bio_associate_blkg_from_page(struct bio *bio,
#endif

#ifdef CONFIG_BLK_CGROUP
-int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css);
int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg);
int bio_associate_blkg_from_css(struct bio *bio,
struct cgroup_subsys_state *css);
int bio_associate_create_blkg(struct request_queue *q, struct bio *bio);
void bio_disassociate_task(struct bio *bio);
-void bio_clone_blkcg_association(struct bio *dst, struct bio *src);
+void bio_clone_blkg_association(struct bio *dst, struct bio *src);
#else /* CONFIG_BLK_CGROUP */
-static inline int bio_associate_blkcg(struct bio *bio,
- struct cgroup_subsys_state *blkcg_css) { return 0; }
static inline int bio_associate_blkg(struct bio *bio,
struct blkcg_gq *blkg) { return 0; }
static inline int bio_associate_blkg_from_css(struct bio *bio,
@@ -572,7 +569,7 @@ static inline int bio_associate_blkg_from_css(struct bio *bio,
static inline int bio_associate_create_blkg(struct request_queue *q,
struct bio *bio) { return 0; }
static inline void bio_disassociate_task(struct bio *bio) { }
-static inline void bio_clone_blkcg_association(struct bio *dst,
+static inline void bio_clone_blkg_association(struct bio *dst,
struct bio *src) { }
#endif /* CONFIG_BLK_CGROUP */

diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 55c348d66372..3c66154709ed 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -275,8 +275,8 @@ static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css)
*/
static inline struct blkcg *bio_blkcg(struct bio *bio)
{
- if (bio && bio->bi_css)
- return css_to_blkcg(bio->bi_css);
+ if (bio && bio->bi_blkg)
+ return bio->bi_blkg->blkcg;
return NULL;
}

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index f6dfb30737d8..9578c7ab1eb6 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -178,7 +178,6 @@ struct bio {
* release. Read comment on top of bio_associate_current().
*/
struct io_context *bi_ioc;
- struct cgroup_subsys_state *bi_css;
struct blkcg_gq *bi_blkg;
struct bio_issue bi_issue;
#endif
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index b951aa1fac61..cd90f653c804 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -776,9 +776,9 @@ blk_trace_bio_get_cgid(struct request_queue *q, struct bio *bio)
if (!bt || !(blk_tracer_flags.val & TRACE_BLK_OPT_CGROUP))
return NULL;

- if (!bio->bi_css)
+ if (!bio->bi_blkg)
return NULL;
- return cgroup_get_kernfs_id(bio->bi_css->cgroup);
+ return cgroup_get_kernfs_id(bio_blkcg(bio)->css.cgroup);
}
#else
static union kernfs_node_id *
--
2.17.1


2018-08-31 01:56:46

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 09/15] blkcg: associate writeback bios with a blkg

From: "Dennis Zhou (Facebook)" <[email protected]>

One of the goals of this series is to remove a separate reference to
the css of the bio. This can and should be accessed via bio_blkcg. In
this patch, the wbc_init_bio call is changed such that it must be called
after a queue has been associated with the bio.

Signed-off-by: Dennis Zhou <[email protected]>
---
Documentation/admin-guide/cgroup-v2.rst | 8 +++++---
fs/buffer.c | 10 +++++-----
fs/ext4/page-io.c | 2 +-
include/linux/writeback.h | 5 +++--
4 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 1746131bc9cb..2dc8f95077aa 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1839,8 +1839,10 @@ following two functions.

wbc_init_bio(@wbc, @bio)
Should be called for each bio carrying writeback data and
- associates the bio with the inode's owner cgroup. Can be
- called anytime between bio allocation and submission.
+ associates the bio with the inode's owner cgroup and the
+ corresponding request queue. This must be called after
+ a queue (device) has been associated with the bio and
+ before submission.

wbc_account_io(@wbc, @page, @bytes)
Should be called for each data segment being written out.
@@ -1859,7 +1861,7 @@ the configuration, the bio may be executed at a lower priority and if
the writeback session is holding shared resources, e.g. a journal
entry, may lead to priority inversion. There is no one easy solution
for the problem. Filesystems can try to work around specific problem
-cases by skipping wbc_init_bio() or using bio_associate_blkcg()
+cases by skipping wbc_init_bio() or using bio_associate_create_blkg()
directly.


diff --git a/fs/buffer.c b/fs/buffer.c
index cabc045f483d..7fb8adb44583 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3049,11 +3049,6 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
*/
bio = bio_alloc(GFP_NOIO, 1);

- if (wbc) {
- wbc_init_bio(wbc, bio);
- wbc_account_io(wbc, bh->b_page, bh->b_size);
- }
-
bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
bio_set_dev(bio, bh->b_bdev);
bio->bi_write_hint = write_hint;
@@ -3073,6 +3068,11 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
op_flags |= REQ_PRIO;
bio_set_op_attrs(bio, op, op_flags);

+ if (wbc) {
+ wbc_init_bio(wbc, bio);
+ wbc_account_io(wbc, bh->b_page, bh->b_size);
+ }
+
submit_bio(bio);
return 0;
}
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index db7590178dfc..2aa62d58d8dd 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -374,13 +374,13 @@ static int io_submit_init_bio(struct ext4_io_submit *io,
bio = bio_alloc(GFP_NOIO, BIO_MAX_PAGES);
if (!bio)
return -ENOMEM;
- wbc_init_bio(io->io_wbc, bio);
bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
bio_set_dev(bio, bh->b_bdev);
bio->bi_end_io = ext4_end_bio;
bio->bi_private = ext4_get_io_end(io->io_end);
io->io_bio = bio;
io->io_next_block = bh->b_blocknr;
+ wbc_init_bio(io->io_wbc, bio);
return 0;
}

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index fdfd04e348f6..738a0c24874f 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -246,7 +246,8 @@ static inline void wbc_attach_fdatawrite_inode(struct writeback_control *wbc,
*
* @bio is a part of the writeback in progress controlled by @wbc. Perform
* writeback specific initialization. This is used to apply the cgroup
- * writeback context.
+ * writeback context. Must be called after the bio has been associated with
+ * a device.
*/
static inline void wbc_init_bio(struct writeback_control *wbc, struct bio *bio)
{
@@ -257,7 +258,7 @@ static inline void wbc_init_bio(struct writeback_control *wbc, struct bio *bio)
* regular writeback instead of writing things out itself.
*/
if (wbc->wb)
- bio_associate_blkcg(bio, wbc->wb->blkcg_css);
+ bio_associate_blkg_from_css(bio, wbc->wb->blkcg_css);
}

#else /* CONFIG_CGROUP_WRITEBACK */
--
2.17.1


2018-08-31 01:56:52

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 08/15] blkcg: associate a blkg for pages being evicted by swap

From: "Dennis Zhou (Facebook)" <[email protected]>

A prior patch in this series added blkg association to bios issued by
cgroups. There are two other paths that we want to attribute work back
to the appropriate cgroup: swap and writeback. Here we modify the way
swap tags bios to include the blkg. Writeback will be tackle in the next
patch.

Signed-off-by: Dennis Zhou <[email protected]>
---
block/bio.c | 83 ++++++++++++++++++++++++++++++++-------------
include/linux/bio.h | 11 ++++--
mm/page_io.c | 2 +-
3 files changed, 68 insertions(+), 28 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index ab41f5b7eb1f..97ef994a08b6 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1930,30 +1930,6 @@ EXPORT_SYMBOL(bioset_init_from_src);

#ifdef CONFIG_BLK_CGROUP

-#ifdef CONFIG_MEMCG
-/**
- * bio_associate_blkcg_from_page - associate a bio with the page's blkcg
- * @bio: target bio
- * @page: the page to lookup the blkcg from
- *
- * Associate @bio with the blkcg from @page's owning memcg. This works like
- * every other associate function wrt references.
- */
-int bio_associate_blkcg_from_page(struct bio *bio, struct page *page)
-{
- struct cgroup_subsys_state *blkcg_css;
-
- if (unlikely(bio->bi_css))
- return -EBUSY;
- if (!page->mem_cgroup)
- return 0;
- blkcg_css = cgroup_get_e_css(page->mem_cgroup->css.cgroup,
- &io_cgrp_subsys);
- bio->bi_css = blkcg_css;
- return 0;
-}
-#endif /* CONFIG_MEMCG */
-
/**
* bio_associate_blkcg - associate a bio with the specified blkcg
* @bio: target bio
@@ -2001,6 +1977,65 @@ int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
return 0;
}

+static int __bio_associate_blkg_from_css(struct bio *bio,
+ struct cgroup_subsys_state *css)
+{
+ struct blkcg_gq *blkg;
+
+ rcu_read_lock();
+
+ blkg = blkg_lookup_create(css_to_blkcg(css), bio->bi_disk->queue);
+
+ rcu_read_unlock();
+
+ return bio_associate_blkg(bio, blkg);
+}
+
+/**
+ * bio_associate_blkg_from_css - associate a bio with a specified css
+ * @bio: target bio
+ * @css: target css
+ *
+ * Associate @bio with the blkg found by combining the css's blkg and the
+ * request_queue of the @bio. This takes a reference on the css that will
+ * be put upon freeing of @bio.
+ */
+int bio_associate_blkg_from_css(struct bio *bio,
+ struct cgroup_subsys_state *css)
+{
+ css_get(css);
+ bio->bi_css = css;
+ return __bio_associate_blkg_from_css(bio, css);
+}
+EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);
+
+#ifdef CONFIG_MEMCG
+/**
+ * bio_associate_blkg_from_page - associate a bio with the page's blkg
+ * @bio: target bio
+ * @page: the page to lookup the blkcg from
+ *
+ * Associate @bio with the blkg from @page's owning memcg and the respective
+ * request_queue. This works like every other associate function wrt
+ * references.
+ *
+ * Note: this must be called after bio has an associated device.
+ */
+int bio_associate_blkg_from_page(struct bio *bio, struct page *page)
+{
+ struct cgroup_subsys_state *css;
+
+ if (unlikely(bio->bi_css))
+ return -EBUSY;
+ if (!page->mem_cgroup)
+ return 0;
+ css = cgroup_get_e_css(page->mem_cgroup->css.cgroup, &io_cgrp_subsys);
+ bio->bi_css = css;
+
+ return __bio_associate_blkg_from_css(bio, css);
+}
+#endif /* CONFIG_MEMCG */
+
/**
* bio_associate_create_blkg - associate a bio with a blkg from q
* @q: request_queue where bio is going
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d4626108b9d7..81530d11d34d 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -547,15 +547,17 @@ do { \
disk_devt((bio)->bi_disk)

#if defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP)
-int bio_associate_blkcg_from_page(struct bio *bio, struct page *page);
+int bio_associate_blkg_from_page(struct bio *bio, struct page *page);
#else
-static inline int bio_associate_blkcg_from_page(struct bio *bio,
- struct page *page) { return 0; }
+static inline int bio_associate_blkg_from_page(struct bio *bio,
+ struct page *page) { return 0; }
#endif

#ifdef CONFIG_BLK_CGROUP
int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css);
int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg);
+int bio_associate_blkg_from_css(struct bio *bio,
+ struct cgroup_subsys_state *css);
int bio_associate_create_blkg(struct request_queue *q, struct bio *bio);
void bio_disassociate_task(struct bio *bio);
void bio_clone_blkcg_association(struct bio *dst, struct bio *src);
@@ -564,6 +566,9 @@ static inline int bio_associate_blkcg(struct bio *bio,
struct cgroup_subsys_state *blkcg_css) { return 0; }
static inline int bio_associate_blkg(struct bio *bio,
struct blkcg_gq *blkg) { return 0; }
+static inline int bio_associate_blkg_from_css(struct bio *bio,
+ struct cgroup_subsys_state *css)
+{ return 0; }
static inline int bio_associate_create_blkg(struct request_queue *q,
struct bio *bio) { return 0; }
static inline void bio_disassociate_task(struct bio *bio) { }
diff --git a/mm/page_io.c b/mm/page_io.c
index aafd19ec1db4..573d3663d846 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -339,7 +339,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
goto out;
}
bio->bi_opf = REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc);
- bio_associate_blkcg_from_page(bio, page);
+ bio_associate_blkg_from_page(bio, page);
count_swpout_vm_event(page);
set_page_writeback(page);
unlock_page(page);
--
2.17.1


2018-08-31 01:56:55

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 06/15] blkcg: always associate a bio with a blkg

From: "Dennis Zhou (Facebook)" <[email protected]>

Previously, blkg's were only assigned as needed by blk-iolatency and
blk-throttle. bio->css was also always being associated while blkg was
being looked up and then thrown away in blkcg_bio_issue_check.

This patch beings the cleanup of bio->css and bio->bi_blkg by always
associating a blkg in blkcg_bio_issue_check. This tries to create the
blkg, but if it is not possible, falls back to using the root_blkg of
the request_queue. Therefore, a bio will always be associated with a
blkg.

A missing definition for bio_associate_blkg is also added for parity in
bio.h.

Signed-off-by: Dennis Zhou <[email protected]>
---
block/bio.c | 41 ++++++++++++++++++++++++++++++++++++++
include/linux/bio.h | 5 +++++
include/linux/blk-cgroup.h | 19 ++----------------
3 files changed, 48 insertions(+), 17 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 09a31e4d46bb..e937f9681188 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1999,6 +1999,44 @@ int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
return 0;
}

+/**
+ * bio_associate_create_blkg - associate a bio with a blkg from q
+ * @q: request_queue where bio is going
+ * @bio: target bio
+ *
+ * Associate @bio with the blkg found from the bio's css and the request_queue.
+ * If one is not found, bio_lookup_blkg creates the blkg.
+ */
+int bio_associate_create_blkg(struct request_queue *q, struct bio *bio)
+{
+ struct blkcg *blkcg;
+ struct blkcg_gq *blkg;
+ int ret = 0;
+
+ /* someone has already associated this bio with a blkg */
+ if (bio->bi_blkg)
+ return ret;
+
+ rcu_read_lock();
+
+ bio_associate_blkcg(bio, NULL);
+ blkcg = bio_blkcg(bio);
+
+ if (!blkcg->css.parent) {
+ ret = bio_associate_blkg(bio, q->root_blkg);
+ goto assoc_out;
+ }
+
+ blkg = blkg_lookup_create(blkcg, q);
+ if (IS_ERR(blkg))
+ blkg = q->root_blkg;
+
+ ret = bio_associate_blkg(bio, blkg);
+assoc_out:
+ rcu_read_unlock();
+ return ret;
+}
+
/**
* bio_disassociate_task - undo bio_associate_current()
* @bio: target bio
@@ -2028,6 +2066,9 @@ void bio_clone_blkcg_association(struct bio *dst, struct bio *src)
{
if (src->bi_css)
WARN_ON(bio_associate_blkcg(dst, src->bi_css));
+
+ if (src->bi_blkg)
+ bio_associate_blkg(dst, src->bi_blkg);
}
EXPORT_SYMBOL_GPL(bio_clone_blkcg_association);
#endif /* CONFIG_BLK_CGROUP */
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 51371740d2a8..d4626108b9d7 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -556,11 +556,16 @@ static inline int bio_associate_blkcg_from_page(struct bio *bio,
#ifdef CONFIG_BLK_CGROUP
int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css);
int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg);
+int bio_associate_create_blkg(struct request_queue *q, struct bio *bio);
void bio_disassociate_task(struct bio *bio);
void bio_clone_blkcg_association(struct bio *dst, struct bio *src);
#else /* CONFIG_BLK_CGROUP */
static inline int bio_associate_blkcg(struct bio *bio,
struct cgroup_subsys_state *blkcg_css) { return 0; }
+static inline int bio_associate_blkg(struct bio *bio,
+ struct blkcg_gq *blkg) { return 0; }
+static inline int bio_associate_create_blkg(struct request_queue *q,
+ struct bio *bio) { return 0; }
static inline void bio_disassociate_task(struct bio *bio) { }
static inline void bio_clone_blkcg_association(struct bio *dst,
struct bio *src) { }
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index ea2dd6e6baf2..9931ec2f4e9e 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -824,29 +824,15 @@ static inline bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg
static inline bool blkcg_bio_issue_check(struct request_queue *q,
struct bio *bio)
{
- struct blkcg *blkcg;
struct blkcg_gq *blkg;
bool throtl = false;

- rcu_read_lock();
-
- /* associate blkcg if bio hasn't attached one */
- bio_associate_blkcg(bio, NULL);
- blkcg = bio_blkcg(bio);
-
- blkg = blkg_lookup(blkcg, q);
- if (unlikely(!blkg)) {
- spin_lock_irq(q->queue_lock);
- blkg = __blkg_lookup_create(blkcg, q);
- if (IS_ERR(blkg))
- blkg = NULL;
- spin_unlock_irq(q->queue_lock);
- }
+ bio_associate_create_blkg(q, bio);
+ blkg = bio->bi_blkg;

throtl = blk_throtl_bio(q, blkg, bio);

if (!throtl) {
- blkg = blkg ?: q->root_blkg;
/*
* If the bio is flagged with BIO_QUEUE_ENTERED it means this
* is a split bio and we would have already accounted for the
@@ -858,7 +844,6 @@ static inline bool blkcg_bio_issue_check(struct request_queue *q,
blkg_rwstat_add(&blkg->stat_ios, bio->bi_opf, 1);
}

- rcu_read_unlock();
return !throtl;
}

--
2.17.1


2018-08-31 01:57:02

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 07/15] blkcg: consolidate bio_issue_init and blkg association

From: "Dennis Zhou (Facebook)" <[email protected]>

This removes the now duplicate association logic in blk-throttle and
blk-iolatency. bio_issue_init is moved into blkcg_bio_issue_check and
into the bio clone variants to allow for the future addition of a
latency moving average for IOs.

Signed-off-by: Dennis Zhou <[email protected]>
---
block/bio.c | 2 ++
block/blk-iolatency.c | 24 +-----------------------
block/blk-throttle.c | 13 +------------
block/bounce.c | 2 ++
include/linux/blk-cgroup.h | 2 ++
5 files changed, 8 insertions(+), 35 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index e937f9681188..ab41f5b7eb1f 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -610,6 +610,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
bio->bi_io_vec = bio_src->bi_io_vec;

bio_clone_blkcg_association(bio, bio_src);
+
+ bio_issue_init(&bio->bi_issue, bio_sectors(bio));
}
EXPORT_SYMBOL(__bio_clone_fast);

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 22b2ff0440cc..9d7052bad6f7 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -395,34 +395,12 @@ static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio,
spinlock_t *lock)
{
struct blk_iolatency *blkiolat = BLKIOLATENCY(rqos);
- struct blkcg *blkcg;
- struct blkcg_gq *blkg;
- struct request_queue *q = rqos->q;
+ struct blkcg_gq *blkg = bio->bi_blkg;
bool issue_as_root = bio_issue_as_root_blkg(bio);

if (!blk_iolatency_enabled(blkiolat))
return;

- rcu_read_lock();
- bio_associate_blkcg(bio, NULL);
- blkcg = bio_blkcg(bio);
- blkg = blkg_lookup(blkcg, q);
- if (unlikely(!blkg)) {
- if (!lock)
- spin_lock_irq(q->queue_lock);
- blkg = __blkg_lookup_create(blkcg, q);
- if (IS_ERR(blkg))
- blkg = NULL;
- if (!lock)
- spin_unlock_irq(q->queue_lock);
- }
- if (!blkg)
- goto out;
-
- bio_issue_init(&bio->bi_issue, bio_sectors(bio));
- bio_associate_blkg(bio, blkg);
-out:
- rcu_read_unlock();
while (blkg && blkg->parent) {
struct iolatency_grp *iolat = blkg_to_lat(blkg);
if (!iolat) {
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index c626e1f7cdcd..f2b355338894 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -2126,21 +2126,11 @@ static inline void throtl_update_latency_buckets(struct throtl_data *td)
}
#endif

-static void blk_throtl_assoc_bio(struct throtl_grp *tg, struct bio *bio)
-{
-#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
- /* fallback to root_blkg if we fail to get a blkg ref */
- if (bio->bi_css && bio_associate_blkg(bio, tg_to_blkg(tg)))
- bio_associate_blkg(bio, bio->bi_disk->queue->root_blkg);
- bio_issue_init(&bio->bi_issue, bio_sectors(bio));
-#endif
-}
-
bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
struct bio *bio)
{
struct throtl_qnode *qn = NULL;
- struct throtl_grp *tg = blkg_to_tg(blkg ?: q->root_blkg);
+ struct throtl_grp *tg = blkg_to_tg(blkg);
struct throtl_service_queue *sq;
bool rw = bio_data_dir(bio);
bool throttled = false;
@@ -2159,7 +2149,6 @@ bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
if (unlikely(blk_queue_bypass(q)))
goto out_unlock;

- blk_throtl_assoc_bio(tg, bio);
blk_throtl_update_idletime(tg);

sq = &tg->service_queue;
diff --git a/block/bounce.c b/block/bounce.c
index bc63b3a2d18c..bea3b0cbe4a7 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -259,6 +259,8 @@ static struct bio *bounce_clone_bio(struct bio *bio_src, gfp_t gfp_mask,

bio_clone_blkcg_association(bio, bio_src);

+ bio_issue_init(&bio->bi_issue, bio_sectors(bio));
+
return bio;
}

diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 9931ec2f4e9e..55c348d66372 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -844,6 +844,8 @@ static inline bool blkcg_bio_issue_check(struct request_queue *q,
blkg_rwstat_add(&blkg->stat_ios, bio->bi_opf, 1);
}

+ bio_issue_init(&bio->bi_issue, bio_sectors(bio));
+
return !throtl;
}

--
2.17.1


2018-08-31 01:57:13

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 04/15] blkcg: fix ref count issue with bio_blkcg using task_css

From: "Dennis Zhou (Facebook)" <[email protected]>

The accessor function bio_blkcg either returns the blkcg associated with
the bio or finds one in the current context. This can cause an issue
when trying to associate a bio with a blkcg. Particularly, it's the
third case that is problematic:

return css_to_blkcg(task_css(current, io_cgrp_id));

As the above may race against task migration and the cgroup exiting, it
is not always ok to take a reference on the blkcg returned from
bio_blkcg.

This patch adds association ahead of calling bio_blkcg rather than
after. This prevents makes association a required and explicit step
along the code paths for calling bio_blkcg. blk_get_rl is modified
as well to get a reference to the blkcg it may use and blk_put_rl
will always put the reference back. Association is also moved above the
bio_blkcg call to ensure it will not return NULL in blk-iolatency.

Signed-off-by: Dennis Zhou <[email protected]>
---
block/bio.c | 10 +++++--
block/blk-iolatency.c | 2 +-
include/linux/blk-cgroup.h | 53 ++++++++++++++++++++++++++++++++------
3 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 4473ccd22987..09a31e4d46bb 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1962,13 +1962,19 @@ int bio_associate_blkcg_from_page(struct bio *bio, struct page *page)
*
* This function takes an extra reference of @blkcg_css which will be put
* when @bio is released. The caller must own @bio and is responsible for
- * synchronizing calls to this function.
+ * synchronizing calls to this function. If @blkcg_css is NULL, a call to
+ * blkcg_get_css finds the current css from the kthread or task.
*/
int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css)
{
if (unlikely(bio->bi_css))
return -EBUSY;
- css_get(blkcg_css);
+
+ if (blkcg_css)
+ css_get(blkcg_css);
+ else
+ blkcg_css = blkcg_get_css();
+
bio->bi_css = blkcg_css;
return 0;
}
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 19923f8a029d..62fdd9002c29 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -404,8 +404,8 @@ static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio,
return;

rcu_read_lock();
+ bio_associate_blkcg(bio, NULL);
blkcg = bio_blkcg(bio);
- bio_associate_blkcg(bio, &blkcg->css);
blkg = blkg_lookup(blkcg, q);
if (unlikely(!blkg)) {
if (!lock)
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index c7386464ec4c..d3cafb1eda48 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -230,22 +230,52 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
char *input, struct blkg_conf_ctx *ctx);
void blkg_conf_finish(struct blkg_conf_ctx *ctx);

+/**
+ * blkcg_get_css - find and get a reference to the css
+ *
+ * Find the css associated with either the kthread or the current task.
+ */
+static inline struct cgroup_subsys_state *blkcg_get_css(void)
+{
+ struct cgroup_subsys_state *css;
+
+ rcu_read_lock();
+
+ css = kthread_blkcg();
+ if (css) {
+ css_get(css);
+ } else {
+ while (true) {
+ css = task_css(current, io_cgrp_id);
+ if (likely(css_tryget(css)))
+ break;
+ cpu_relax();
+ }
+ }
+
+ rcu_read_unlock();
+
+ return css;
+}

static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css)
{
return css ? container_of(css, struct blkcg, css) : NULL;
}

+/**
+ * bio_blkcg - grab the blkcg associated with a bio
+ * @bio: target bio
+ *
+ * This returns the blkcg associated with a bio, NULL if not associated.
+ * Callers are expected to either handle NULL or know association has been
+ * done prior to calling this.
+ */
static inline struct blkcg *bio_blkcg(struct bio *bio)
{
- struct cgroup_subsys_state *css;
-
if (bio && bio->bi_css)
return css_to_blkcg(bio->bi_css);
- css = kthread_blkcg();
- if (css)
- return css_to_blkcg(css);
- return css_to_blkcg(task_css(current, io_cgrp_id));
+ return NULL;
}

static inline bool blk_cgroup_congested(void)
@@ -519,6 +549,11 @@ static inline struct request_list *blk_get_rl(struct request_queue *q,
rcu_read_lock();

blkcg = bio_blkcg(bio);
+ if (blkcg) {
+ css_get(&blkcg->css);
+ } else {
+ blkcg = css_to_blkcg(blkcg_get_css());
+ }

/* bypass blkg lookup and use @q->root_rl directly for root */
if (blkcg == &blkcg_root)
@@ -550,6 +585,8 @@ static inline struct request_list *blk_get_rl(struct request_queue *q,
*/
static inline void blk_put_rl(struct request_list *rl)
{
+ /* an additional ref is always taken for rl */
+ css_put(&rl->blkg->blkcg->css);
if (rl->blkg->blkcg != &blkcg_root)
blkg_put(rl->blkg);
}
@@ -790,10 +827,10 @@ static inline bool blkcg_bio_issue_check(struct request_queue *q,
bool throtl = false;

rcu_read_lock();
- blkcg = bio_blkcg(bio);

/* associate blkcg if bio hasn't attached one */
- bio_associate_blkcg(bio, &blkcg->css);
+ bio_associate_blkcg(bio, NULL);
+ blkcg = bio_blkcg(bio);

blkg = blkg_lookup(blkcg, q);
if (unlikely(!blkg)) {
--
2.17.1


2018-08-31 01:57:21

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 01/15] Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()"

From: "Dennis Zhou (Facebook)" <[email protected]>

This reverts commit 4c6994806f708559c2812b73501406e21ae5dcd0.

Destroying blkgs is tricky because of the nature of the relationship. A
blkg should go away when either a blkcg or a request_queue goes away.
However, blkg's pin the blkcg to ensure they remain valid. To break this
cycle, when a blkcg is offlined, blkgs put back their css ref. This
eventually lets css_free() get called which frees the blkcg.

The above commit (4c6994806f70) breaks this order of events by trying to
destroy blkgs in css_free(). As the blkgs still hold references to the
blkcg, css_free() is never called.

The race between blkcg_bio_issue_check() and cgroup_rmdir() will be
addressed in the following patch by delaying destruction of a blkg until
all writeback associated with the blkcg has been finished.

Fixes: 4c6994806f70 ("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()")
Signed-off-by: Dennis Zhou <[email protected]>
Cc: Jiufei Xue <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Jens Axboe <[email protected]>
---
block/blk-cgroup.c | 78 ++++++++------------------------------
include/linux/blk-cgroup.h | 1 -
2 files changed, 16 insertions(+), 63 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 694595b29b8f..2998e4f095d1 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -310,28 +310,11 @@ struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
}
}

-static void blkg_pd_offline(struct blkcg_gq *blkg)
-{
- int i;
-
- lockdep_assert_held(blkg->q->queue_lock);
- lockdep_assert_held(&blkg->blkcg->lock);
-
- for (i = 0; i < BLKCG_MAX_POLS; i++) {
- struct blkcg_policy *pol = blkcg_policy[i];
-
- if (blkg->pd[i] && !blkg->pd[i]->offline &&
- pol->pd_offline_fn) {
- pol->pd_offline_fn(blkg->pd[i]);
- blkg->pd[i]->offline = true;
- }
- }
-}
-
static void blkg_destroy(struct blkcg_gq *blkg)
{
struct blkcg *blkcg = blkg->blkcg;
struct blkcg_gq *parent = blkg->parent;
+ int i;

lockdep_assert_held(blkg->q->queue_lock);
lockdep_assert_held(&blkcg->lock);
@@ -340,6 +323,13 @@ static void blkg_destroy(struct blkcg_gq *blkg)
WARN_ON_ONCE(list_empty(&blkg->q_node));
WARN_ON_ONCE(hlist_unhashed(&blkg->blkcg_node));

+ for (i = 0; i < BLKCG_MAX_POLS; i++) {
+ struct blkcg_policy *pol = blkcg_policy[i];
+
+ if (blkg->pd[i] && pol->pd_offline_fn)
+ pol->pd_offline_fn(blkg->pd[i]);
+ }
+
if (parent) {
blkg_rwstat_add_aux(&parent->stat_bytes, &blkg->stat_bytes);
blkg_rwstat_add_aux(&parent->stat_ios, &blkg->stat_ios);
@@ -382,7 +372,6 @@ static void blkg_destroy_all(struct request_queue *q)
struct blkcg *blkcg = blkg->blkcg;

spin_lock(&blkcg->lock);
- blkg_pd_offline(blkg);
blkg_destroy(blkg);
spin_unlock(&blkcg->lock);
}
@@ -1058,54 +1047,21 @@ static struct cftype blkcg_legacy_files[] = {
* @css: css of interest
*
* This function is called when @css is about to go away and responsible
- * for offlining all blkgs pd and killing all wbs associated with @css.
- * blkgs pd offline should be done while holding both q and blkcg locks.
- * As blkcg lock is nested inside q lock, this function performs reverse
- * double lock dancing.
+ * for shooting down all blkgs associated with @css. blkgs should be
+ * removed while holding both q and blkcg locks. As blkcg lock is nested
+ * inside q lock, this function performs reverse double lock dancing.
*
* This is the blkcg counterpart of ioc_release_fn().
*/
static void blkcg_css_offline(struct cgroup_subsys_state *css)
{
struct blkcg *blkcg = css_to_blkcg(css);
- struct blkcg_gq *blkg;

spin_lock_irq(&blkcg->lock);

- hlist_for_each_entry(blkg, &blkcg->blkg_list, blkcg_node) {
- struct request_queue *q = blkg->q;
-
- if (spin_trylock(q->queue_lock)) {
- blkg_pd_offline(blkg);
- spin_unlock(q->queue_lock);
- } else {
- spin_unlock_irq(&blkcg->lock);
- cpu_relax();
- spin_lock_irq(&blkcg->lock);
- }
- }
-
- spin_unlock_irq(&blkcg->lock);
-
- wb_blkcg_offline(blkcg);
-}
-
-/**
- * blkcg_destroy_all_blkgs - destroy all blkgs associated with a blkcg
- * @blkcg: blkcg of interest
- *
- * This function is called when blkcg css is about to free and responsible for
- * destroying all blkgs associated with @blkcg.
- * blkgs should be removed while holding both q and blkcg locks. As blkcg lock
- * is nested inside q lock, this function performs reverse double lock dancing.
- */
-static void blkcg_destroy_all_blkgs(struct blkcg *blkcg)
-{
- spin_lock_irq(&blkcg->lock);
while (!hlist_empty(&blkcg->blkg_list)) {
struct blkcg_gq *blkg = hlist_entry(blkcg->blkg_list.first,
- struct blkcg_gq,
- blkcg_node);
+ struct blkcg_gq, blkcg_node);
struct request_queue *q = blkg->q;

if (spin_trylock(q->queue_lock)) {
@@ -1117,7 +1073,10 @@ static void blkcg_destroy_all_blkgs(struct blkcg *blkcg)
spin_lock_irq(&blkcg->lock);
}
}
+
spin_unlock_irq(&blkcg->lock);
+
+ wb_blkcg_offline(blkcg);
}

static void blkcg_css_free(struct cgroup_subsys_state *css)
@@ -1125,8 +1084,6 @@ static void blkcg_css_free(struct cgroup_subsys_state *css)
struct blkcg *blkcg = css_to_blkcg(css);
int i;

- blkcg_destroy_all_blkgs(blkcg);
-
mutex_lock(&blkcg_pol_mutex);

list_del(&blkcg->all_blkcgs_node);
@@ -1480,11 +1437,8 @@ void blkcg_deactivate_policy(struct request_queue *q,

list_for_each_entry(blkg, &q->blkg_list, q_node) {
if (blkg->pd[pol->plid]) {
- if (!blkg->pd[pol->plid]->offline &&
- pol->pd_offline_fn) {
+ if (pol->pd_offline_fn)
pol->pd_offline_fn(blkg->pd[pol->plid]);
- blkg->pd[pol->plid]->offline = true;
- }
pol->pd_free_fn(blkg->pd[pol->plid]);
blkg->pd[pol->plid] = NULL;
}
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 34aec30e06c7..1615cdd4c797 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -89,7 +89,6 @@ struct blkg_policy_data {
/* the blkg and policy id this per-policy data belongs to */
struct blkcg_gq *blkg;
int plid;
- bool offline;
};

/*
--
2.17.1


2018-08-31 01:57:34

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 05/15] blkcg: update blkg_lookup_create to do locking

From: "Dennis Zhou (Facebook)" <[email protected]>

To know when to create a blkg, the general pattern is to do a
blkg_lookup and if that fails, lock and then do a lookup again and if
that fails finally create. It doesn't make much sense for everyone who
wants to do creation to write this themselves.

This changes blkg_lookup_create to do locking and implement this
pattern. The old blkg_lookup_create is renamed to __blkg_lookup_create.
If a call site wants to do its own error handling or already owns the
queue lock, they can use __blkg_lookup_create.

Signed-off-by: Dennis Zhou <[email protected]>
---
block/blk-cgroup.c | 31 ++++++++++++++++++++++++++++---
block/blk-iolatency.c | 2 +-
include/linux/blk-cgroup.h | 4 +++-
3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index d7114308a480..f678cd555814 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -259,7 +259,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
}

/**
- * blkg_lookup_create - lookup blkg, try to create one if not there
+ * __blkg_lookup_create - lookup blkg, try to create one if not there
* @blkcg: blkcg of interest
* @q: request_queue of interest
*
@@ -272,8 +272,8 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
* value on error. If @q is dead, returns ERR_PTR(-EINVAL). If @q is not
* dead and bypassing, returns ERR_PTR(-EBUSY).
*/
-struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
- struct request_queue *q)
+struct blkcg_gq *__blkg_lookup_create(struct blkcg *blkcg,
+ struct request_queue *q)
{
struct blkcg_gq *blkg;

@@ -310,6 +310,31 @@ struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
}
}

+/**
+ * blkg_lookup_create - find or create a blkg
+ * @blkcg: target block cgroup
+ * @q: target request_queue
+ *
+ * This looks up or creates the blkg representing the unique pair
+ * of the blkcg and the request_queue.
+ */
+struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
+ struct request_queue *q)
+{
+ struct blkcg_gq *blkg = blkg_lookup(blkcg, q);
+ unsigned long flags;
+
+ if (unlikely(!blkg)) {
+ spin_lock_irqsave(q->queue_lock, flags);
+
+ blkg = __blkg_lookup_create(blkcg, q);
+
+ spin_unlock_irqrestore(q->queue_lock, flags);
+ }
+
+ return blkg;
+}
+
static void blkg_destroy(struct blkcg_gq *blkg)
{
struct blkcg *blkcg = blkg->blkcg;
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 62fdd9002c29..22b2ff0440cc 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -410,7 +410,7 @@ static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio,
if (unlikely(!blkg)) {
if (!lock)
spin_lock_irq(q->queue_lock);
- blkg = blkg_lookup_create(blkcg, q);
+ blkg = __blkg_lookup_create(blkcg, q);
if (IS_ERR(blkg))
blkg = NULL;
if (!lock)
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index d3cafb1eda48..ea2dd6e6baf2 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -184,6 +184,8 @@ extern struct cgroup_subsys_state * const blkcg_root_css;

struct blkcg_gq *blkg_lookup_slowpath(struct blkcg *blkcg,
struct request_queue *q, bool update_hint);
+struct blkcg_gq *__blkg_lookup_create(struct blkcg *blkcg,
+ struct request_queue *q);
struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
struct request_queue *q);
int blkcg_init_queue(struct request_queue *q);
@@ -835,7 +837,7 @@ static inline bool blkcg_bio_issue_check(struct request_queue *q,
blkg = blkg_lookup(blkcg, q);
if (unlikely(!blkg)) {
spin_lock_irq(q->queue_lock);
- blkg = blkg_lookup_create(blkcg, q);
+ blkg = __blkg_lookup_create(blkcg, q);
if (IS_ERR(blkg))
blkg = NULL;
spin_unlock_irq(q->queue_lock);
--
2.17.1


2018-08-31 01:57:49

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 02/15] blkcg: delay blkg destruction until after writeback has finished

From: "Dennis Zhou (Facebook)" <[email protected]>

Currently, blkcg destruction relies on a sequence of events:
1. Destruction starts. blkcg_css_offline() is called and blkgs
release their reference to the blkcg. This immediately destroys
the cgwbs (writeback).
2. With blkgs giving up their reference, the blkcg ref count should
become zero and eventually call blkcg_css_free() which finally
frees the blkcg.

Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
on the completion of all writeback associated with the blkcg. A count of
the number of cgwbs is maintained and once that goes to zero, blkg
destruction can follow. This should prevent premature blkg destruction.

The new process for blkcg cleanup is as follows:
1. Destruction starts. blkcg_css_offline() is called which offlines
writeback. Blkg destruction is delayed on the nr_cgwbs count to
avoid punting potentially large amounts of outstanding writeback
to root while maintaining any ongoing policies.
2. When the nr_cgwbs becomes zero, blkcg_destroy_blkgs() is called and
handles destruction of blkgs. This is where the css reference held
by each blkg is released.
3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
This finally frees the blkg.

It seems in the past blk-throttle didn't do the most understandable
things with taking data from a blkg while associating with current. So,
the simplification and unification of what blk-throttle is doing caused
this.

Fixes: 08e18eab0c579 ("block: add bi_blkg to the bio for cgroups")
Signed-off-by: Dennis Zhou <[email protected]>
Cc: Jiufei Xue <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Jens Axboe <[email protected]>
---
block/blk-cgroup.c | 53 ++++++++++++++++++++++++++++++++------
include/linux/blk-cgroup.h | 29 +++++++++++++++++++++
mm/backing-dev.c | 5 ++++
3 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 2998e4f095d1..d7114308a480 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1042,21 +1042,59 @@ static struct cftype blkcg_legacy_files[] = {
{ } /* terminate */
};

+/*
+ * blkcg destruction is a three-stage process.
+ *
+ * 1. Destruction starts. The blkcg_css_offline() callback is invoked
+ * which offlines writeback. Here we tie the next stage of blkg destruction
+ * to the completion of writeback associated with the blkcg. This lets us
+ * avoid punting potentially large amounts of outstanding writeback to root
+ * while maintaining any ongoing policies. The next stage is triggered when
+ * the nr_cgwbs count goes to zero.
+ *
+ * 2. When the nr_cgwbs count goes to zero, blkcg_destroy_blkgs() is called
+ * and handles the destruction of blkgs. Here the css reference held by
+ * the blkg is put back eventually allowing blkcg_css_free() to be called.
+ * This work may occur in cgwb_release_workfn() on the cgwb_release
+ * workqueue. Any submitted ios that fail to get the blkg ref will be
+ * punted to the root_blkg.
+ *
+ * 3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
+ * This finally frees the blkcg.
+ */
+
/**
* blkcg_css_offline - cgroup css_offline callback
* @css: css of interest
*
- * This function is called when @css is about to go away and responsible
- * for shooting down all blkgs associated with @css. blkgs should be
- * removed while holding both q and blkcg locks. As blkcg lock is nested
- * inside q lock, this function performs reverse double lock dancing.
- *
- * This is the blkcg counterpart of ioc_release_fn().
+ * This function is called when @css is about to go away. Here the cgwbs are
+ * offlined first and only once writeback associated with the blkcg has
+ * finished do we start step 2 (see above).
*/
static void blkcg_css_offline(struct cgroup_subsys_state *css)
{
struct blkcg *blkcg = css_to_blkcg(css);

+ /* this prevents anyone from attaching or migrating to this blkcg */
+ wb_blkcg_offline(blkcg);
+
+ /* allow the count the count to go to zero */
+ blkcg_cgwb_dec(blkcg);
+}
+
+/**
+ * blkcg_destroy_blkgs - responsible for shooting down blkgs
+ * @blkcg: blkcg of interest
+ *
+ * blkgs should be removed while holding both q and blkcg locks. As blkcg lock
+ * is nested inside q lock, this function performs reverse double lock dancing.
+ * Destroying the blkgs releases the reference held on the blkcg's css allowing
+ * blkcg_css_free to eventually be called.
+ *
+ * This is the blkcg counterpart of ioc_release_fn().
+ */
+void blkcg_destroy_blkgs(struct blkcg *blkcg)
+{
spin_lock_irq(&blkcg->lock);

while (!hlist_empty(&blkcg->blkg_list)) {
@@ -1075,8 +1113,6 @@ static void blkcg_css_offline(struct cgroup_subsys_state *css)
}

spin_unlock_irq(&blkcg->lock);
-
- wb_blkcg_offline(blkcg);
}

static void blkcg_css_free(struct cgroup_subsys_state *css)
@@ -1146,6 +1182,7 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
INIT_HLIST_HEAD(&blkcg->blkg_list);
#ifdef CONFIG_CGROUP_WRITEBACK
INIT_LIST_HEAD(&blkcg->cgwb_list);
+ atomic_set(&blkcg->nr_cgwbs, 1);
#endif
list_add_tail(&blkcg->all_blkcgs_node, &all_blkcgs);

diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 1615cdd4c797..c7386464ec4c 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -56,6 +56,7 @@ struct blkcg {
struct list_head all_blkcgs_node;
#ifdef CONFIG_CGROUP_WRITEBACK
struct list_head cgwb_list;
+ atomic_t nr_cgwbs;
#endif
};

@@ -386,6 +387,34 @@ static inline struct blkcg *cpd_to_blkcg(struct blkcg_policy_data *cpd)
return cpd ? cpd->blkcg : NULL;
}

+/**
+ * blkcg_cgwb_inc - increment the count for cgwb_list
+ * @blkcg: blkcg of interest
+ *
+ * This is used to count the number of active wb's related to a blkcg.
+ */
+static inline void blkcg_cgwb_inc(struct blkcg *blkcg)
+{
+ atomic_inc(&blkcg->nr_cgwbs);
+}
+
+extern void blkcg_destroy_blkgs(struct blkcg *blkcg);
+
+/**
+ * blkcg_cgwb_dec - decrement the count for cgwb_list
+ * @blkcg: blkcg of interest
+ *
+ * This is used to count the number of active wb's related to a blkcg.
+ * When this count goes to zero, all active wb has finished so the
+ * blkcg can be destroyed. This does blkg destruction if the nr_cgwbs
+ * drops to zero.
+ */
+static inline void blkcg_cgwb_dec(struct blkcg *blkcg)
+{
+ if (atomic_dec_and_test(&blkcg->nr_cgwbs))
+ blkcg_destroy_blkgs(blkcg);
+}
+
/**
* blkg_path - format cgroup path of blkg
* @blkg: blkg of interest
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 2e5d3df0853d..92342d38f0c6 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -494,6 +494,7 @@ static void cgwb_release_workfn(struct work_struct *work)
{
struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
release_work);
+ struct blkcg *blkcg = css_to_blkcg(wb->blkcg_css);

mutex_lock(&wb->bdi->cgwb_release_mutex);
wb_shutdown(wb);
@@ -502,6 +503,9 @@ static void cgwb_release_workfn(struct work_struct *work)
css_put(wb->blkcg_css);
mutex_unlock(&wb->bdi->cgwb_release_mutex);

+ /* this triggers destruction of blkgs if nr_cgwbs becomes zero */
+ blkcg_cgwb_dec(blkcg);
+
fprop_local_destroy_percpu(&wb->memcg_completions);
percpu_ref_exit(&wb->refcnt);
wb_exit(wb);
@@ -600,6 +604,7 @@ static int cgwb_create(struct backing_dev_info *bdi,
list_add_tail_rcu(&wb->bdi_node, &bdi->wb_list);
list_add(&wb->memcg_node, memcg_cgwb_list);
list_add(&wb->blkcg_node, blkcg_cgwb_list);
+ blkcg_cgwb_inc(blkcg);
css_get(memcg_css);
css_get(blkcg_css);
}
--
2.17.1


2018-08-31 01:57:51

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 03/15] blkcg: use tryget logic when associating a blkg with a bio

From: "Dennis Zhou (Facebook)" <[email protected]>

There is a very small change a bio gets caught up in a really
unfortunate race between a task migration, cgroup exiting, and itself
trying to associate with a blkg. This is due to css offlining being
performed after the css->refcnt is killed which triggers removal of
blkgs that reach their blkg->refcnt of 0.

To avoid this, association with a blkg should use tryget and fallback to
using the root_blkg.

Fixes: 08e18eab0c579 ("block: add bi_blkg to the bio for cgroups")
Signed-off-by: Dennis Zhou <[email protected]>
Cc: Jiufei Xue <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Jens Axboe <[email protected]>
---
block/bio.c | 3 ++-
block/blk-throttle.c | 5 +++--
2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 04969b392c72..4473ccd22987 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1987,7 +1987,8 @@ int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
{
if (unlikely(bio->bi_blkg))
return -EBUSY;
- blkg_get(blkg);
+ if (!blkg_try_get(blkg))
+ return -ENODEV;
bio->bi_blkg = blkg;
return 0;
}
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index a3eede00d302..c626e1f7cdcd 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -2129,8 +2129,9 @@ static inline void throtl_update_latency_buckets(struct throtl_data *td)
static void blk_throtl_assoc_bio(struct throtl_grp *tg, struct bio *bio)
{
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
- if (bio->bi_css)
- bio_associate_blkg(bio, tg_to_blkg(tg));
+ /* fallback to root_blkg if we fail to get a blkg ref */
+ if (bio->bi_css && bio_associate_blkg(bio, tg_to_blkg(tg)))
+ bio_associate_blkg(bio, bio->bi_disk->queue->root_blkg);
bio_issue_init(&bio->bi_issue, bio_sectors(bio));
#endif
}
--
2.17.1


2018-08-31 09:04:10

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 06/15] blkcg: always associate a bio with a blkg

Hi Dennis,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on block/for-next]
[also build test WARNING on v4.19-rc1 next-20180831]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Dennis-Zhou/blkcg-ref-count-refactor-cleanup-blkcg-avg_lat/20180831-161742
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: x86_64-randconfig-x019-201834 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

All warnings (new ones prefixed by >>):

In file included from include/linux/blkdev.h:21:0,
from block//partitions/check.h:3,
from block//partitions/check.c:22:
>> include/linux/bio.h:566:17: warning: 'struct blkcg_gq' declared inside parameter list will not be visible outside of this definition or declaration
struct blkcg_gq *blkg) { return 0; }
^~~~~~~~

vim +566 include/linux/bio.h

555
556 #ifdef CONFIG_BLK_CGROUP
557 int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css);
558 int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg);
559 int bio_associate_create_blkg(struct request_queue *q, struct bio *bio);
560 void bio_disassociate_task(struct bio *bio);
561 void bio_clone_blkcg_association(struct bio *dst, struct bio *src);
562 #else /* CONFIG_BLK_CGROUP */
563 static inline int bio_associate_blkcg(struct bio *bio,
564 struct cgroup_subsys_state *blkcg_css) { return 0; }
565 static inline int bio_associate_blkg(struct bio *bio,
> 566 struct blkcg_gq *blkg) { return 0; }
567 static inline int bio_associate_create_blkg(struct request_queue *q,
568 struct bio *bio) { return 0; }
569 static inline void bio_disassociate_task(struct bio *bio) { }
570 static inline void bio_clone_blkcg_association(struct bio *dst,
571 struct bio *src) { }
572 #endif /* CONFIG_BLK_CGROUP */
573

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (2.31 kB)
.config.gz (31.84 kB)
Download all attachments

2018-08-31 09:33:06

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 07/15] blkcg: consolidate bio_issue_init and blkg association

Hi Dennis,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on block/for-next]
[also build test ERROR on v4.19-rc1 next-20180831]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Dennis-Zhou/blkcg-ref-count-refactor-cleanup-blkcg-avg_lat/20180831-161742
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: x86_64-randconfig-x017-201834 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

All errors (new ones prefixed by >>):

In file included from block/bounce.c:13:0:
include/linux/bio.h:566:17: warning: 'struct blkcg_gq' declared inside parameter list will not be visible outside of this definition or declaration
struct blkcg_gq *blkg) { return 0; }
^~~~~~~~
block/bounce.c: In function 'bounce_clone_bio':
>> block/bounce.c:262:23: error: 'struct bio' has no member named 'bi_issue'; did you mean 'bi_disk'?
bio_issue_init(&bio->bi_issue, bio_sectors(bio));
^~~~~~~~
bi_disk

vim +262 block/bounce.c

197
198 static struct bio *bounce_clone_bio(struct bio *bio_src, gfp_t gfp_mask,
199 struct bio_set *bs)
200 {
201 struct bvec_iter iter;
202 struct bio_vec bv;
203 struct bio *bio;
204
205 /*
206 * Pre immutable biovecs, __bio_clone() used to just do a memcpy from
207 * bio_src->bi_io_vec to bio->bi_io_vec.
208 *
209 * We can't do that anymore, because:
210 *
211 * - The point of cloning the biovec is to produce a bio with a biovec
212 * the caller can modify: bi_idx and bi_bvec_done should be 0.
213 *
214 * - The original bio could've had more than BIO_MAX_PAGES biovecs; if
215 * we tried to clone the whole thing bio_alloc_bioset() would fail.
216 * But the clone should succeed as long as the number of biovecs we
217 * actually need to allocate is fewer than BIO_MAX_PAGES.
218 *
219 * - Lastly, bi_vcnt should not be looked at or relied upon by code
220 * that does not own the bio - reason being drivers don't use it for
221 * iterating over the biovec anymore, so expecting it to be kept up
222 * to date (i.e. for clones that share the parent biovec) is just
223 * asking for trouble and would force extra work on
224 * __bio_clone_fast() anyways.
225 */
226
227 bio = bio_alloc_bioset(gfp_mask, bio_segments(bio_src), bs);
228 if (!bio)
229 return NULL;
230 bio->bi_disk = bio_src->bi_disk;
231 bio->bi_opf = bio_src->bi_opf;
232 bio->bi_write_hint = bio_src->bi_write_hint;
233 bio->bi_iter.bi_sector = bio_src->bi_iter.bi_sector;
234 bio->bi_iter.bi_size = bio_src->bi_iter.bi_size;
235
236 switch (bio_op(bio)) {
237 case REQ_OP_DISCARD:
238 case REQ_OP_SECURE_ERASE:
239 case REQ_OP_WRITE_ZEROES:
240 break;
241 case REQ_OP_WRITE_SAME:
242 bio->bi_io_vec[bio->bi_vcnt++] = bio_src->bi_io_vec[0];
243 break;
244 default:
245 bio_for_each_segment(bv, bio_src, iter)
246 bio->bi_io_vec[bio->bi_vcnt++] = bv;
247 break;
248 }
249
250 if (bio_integrity(bio_src)) {
251 int ret;
252
253 ret = bio_integrity_clone(bio, bio_src, gfp_mask);
254 if (ret < 0) {
255 bio_put(bio);
256 return NULL;
257 }
258 }
259
260 bio_clone_blkcg_association(bio, bio_src);
261
> 262 bio_issue_init(&bio->bi_issue, bio_sectors(bio));
263
264 return bio;
265 }
266

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (3.95 kB)
.config.gz (33.79 kB)
Download all attachments

2018-08-31 10:07:15

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 06/15] blkcg: always associate a bio with a blkg

Hi Dennis,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on block/for-next]
[also build test WARNING on v4.19-rc1 next-20180831]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Dennis-Zhou/blkcg-ref-count-refactor-cleanup-blkcg-avg_lat/20180831-161742
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-a1-201834 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All warnings (new ones prefixed by >>):

In file included from include/linux/blkdev.h:21:0,
from drivers/usb//image/microtek.c:135:
>> include/linux/bio.h:566:17: warning: 'struct blkcg_gq' declared inside parameter list
struct blkcg_gq *blkg) { return 0; }
^
>> include/linux/bio.h:566:17: warning: its scope is only this definition or declaration, which is probably not what you want

vim +566 include/linux/bio.h

555
556 #ifdef CONFIG_BLK_CGROUP
557 int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css);
558 int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg);
559 int bio_associate_create_blkg(struct request_queue *q, struct bio *bio);
560 void bio_disassociate_task(struct bio *bio);
561 void bio_clone_blkcg_association(struct bio *dst, struct bio *src);
562 #else /* CONFIG_BLK_CGROUP */
563 static inline int bio_associate_blkcg(struct bio *bio,
564 struct cgroup_subsys_state *blkcg_css) { return 0; }
565 static inline int bio_associate_blkg(struct bio *bio,
> 566 struct blkcg_gq *blkg) { return 0; }
567 static inline int bio_associate_create_blkg(struct request_queue *q,
568 struct bio *bio) { return 0; }
569 static inline void bio_disassociate_task(struct bio *bio) { }
570 static inline void bio_clone_blkcg_association(struct bio *dst,
571 struct bio *src) { }
572 #endif /* CONFIG_BLK_CGROUP */
573

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (2.32 kB)
.config.gz (33.57 kB)
Download all attachments

2018-08-31 10:35:01

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 15/15] blkcg: add average latency tracking to blk-cgroup

Hi Dennis,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on block/for-next]
[also build test ERROR on v4.19-rc1 next-20180831]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Dennis-Zhou/blkcg-ref-count-refactor-cleanup-blkcg-avg_lat/20180831-161742
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: x86_64-randconfig-x019-201834 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

All errors (new ones prefixed by >>):

In file included from block/bio.c:20:0:
include/linux/bio.h:565:17: warning: 'struct blkcg_gq' declared inside parameter list will not be visible outside of this definition or declaration
struct blkcg_gq *blkg) { return 0; }
^~~~~~~~
block/bio.c: In function '__bio_clone_fast':
block/bio.c:614:23: error: 'struct bio' has no member named 'bi_issue'; did you mean 'bi_disk'?
bio_issue_init(&bio->bi_issue, bio_sectors(bio));
^~~~~~~~
bi_disk
block/bio.c: In function 'bio_endio':
>> block/bio.c:1750:11: error: 'struct bio' has no member named 'bi_blkg'; did you mean 'bi_flags'?
if (bio->bi_blkg && bio->bi_blkg->parent)
^~~~~~~
bi_flags
block/bio.c:1750:27: error: 'struct bio' has no member named 'bi_blkg'; did you mean 'bi_flags'?
if (bio->bi_blkg && bio->bi_blkg->parent)
^~~~~~~
bi_flags

vim +1750 block/bio.c

1727
1728 /**
1729 * bio_endio - end I/O on a bio
1730 * @bio: bio
1731 *
1732 * Description:
1733 * bio_endio() will end I/O on the whole bio. bio_endio() is the preferred
1734 * way to end I/O on a bio. No one should call bi_end_io() directly on a
1735 * bio unless they own it and thus know that it has an end_io function.
1736 *
1737 * bio_endio() can be called several times on a bio that has been chained
1738 * using bio_chain(). The ->bi_end_io() function will only be called the
1739 * last time. At this point the BLK_TA_COMPLETE tracing event will be
1740 * generated if BIO_TRACE_COMPLETION is set.
1741 **/
1742 void bio_endio(struct bio *bio)
1743 {
1744 again:
1745 if (!bio_remaining_done(bio))
1746 return;
1747 if (!bio_integrity_endio(bio))
1748 return;
1749
> 1750 if (bio->bi_blkg && bio->bi_blkg->parent)
1751 blkg_record_latency(bio);
1752
1753 if (bio->bi_disk)
1754 rq_qos_done_bio(bio->bi_disk->queue, bio);
1755
1756 /*
1757 * Need to have a real endio function for chained bios, otherwise
1758 * various corner cases will break (like stacking block devices that
1759 * save/restore bi_end_io) - however, we want to avoid unbounded
1760 * recursion and blowing the stack. Tail call optimization would
1761 * handle this, but compiling with frame pointers also disables
1762 * gcc's sibling call optimization.
1763 */
1764 if (bio->bi_end_io == bio_chain_endio) {
1765 bio = __bio_chain_endio(bio);
1766 goto again;
1767 }
1768
1769 if (bio->bi_disk && bio_flagged(bio, BIO_TRACE_COMPLETION)) {
1770 trace_block_bio_complete(bio->bi_disk->queue, bio,
1771 blk_status_to_errno(bio->bi_status));
1772 bio_clear_flag(bio, BIO_TRACE_COMPLETION);
1773 }
1774
1775 blk_throtl_bio_endio(bio);
1776 /* release cgroup info */
1777 bio_uninit(bio);
1778 if (bio->bi_end_io)
1779 bio->bi_end_io(bio);
1780 }
1781 EXPORT_SYMBOL(bio_endio);
1782

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (3.95 kB)
.config.gz (31.84 kB)
Download all attachments

2018-08-31 11:19:40

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 07/15] blkcg: consolidate bio_issue_init and blkg association

Hi Dennis,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on block/for-next]
[also build test ERROR on v4.19-rc1 next-20180831]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Dennis-Zhou/blkcg-ref-count-refactor-cleanup-blkcg-avg_lat/20180831-161742
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-a0-201834 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All errors (new ones prefixed by >>):

In file included from block/bounce.c:13:0:
include/linux/bio.h:566:17: warning: 'struct blkcg_gq' declared inside parameter list
struct blkcg_gq *blkg) { return 0; }
^
include/linux/bio.h:566:17: warning: its scope is only this definition or declaration, which is probably not what you want
block/bounce.c: In function 'bounce_clone_bio':
>> block/bounce.c:262:21: error: 'struct bio' has no member named 'bi_issue'
bio_issue_init(&bio->bi_issue, bio_sectors(bio));
^

vim +262 block/bounce.c

197
198 static struct bio *bounce_clone_bio(struct bio *bio_src, gfp_t gfp_mask,
199 struct bio_set *bs)
200 {
201 struct bvec_iter iter;
202 struct bio_vec bv;
203 struct bio *bio;
204
205 /*
206 * Pre immutable biovecs, __bio_clone() used to just do a memcpy from
207 * bio_src->bi_io_vec to bio->bi_io_vec.
208 *
209 * We can't do that anymore, because:
210 *
211 * - The point of cloning the biovec is to produce a bio with a biovec
212 * the caller can modify: bi_idx and bi_bvec_done should be 0.
213 *
214 * - The original bio could've had more than BIO_MAX_PAGES biovecs; if
215 * we tried to clone the whole thing bio_alloc_bioset() would fail.
216 * But the clone should succeed as long as the number of biovecs we
217 * actually need to allocate is fewer than BIO_MAX_PAGES.
218 *
219 * - Lastly, bi_vcnt should not be looked at or relied upon by code
220 * that does not own the bio - reason being drivers don't use it for
221 * iterating over the biovec anymore, so expecting it to be kept up
222 * to date (i.e. for clones that share the parent biovec) is just
223 * asking for trouble and would force extra work on
224 * __bio_clone_fast() anyways.
225 */
226
227 bio = bio_alloc_bioset(gfp_mask, bio_segments(bio_src), bs);
228 if (!bio)
229 return NULL;
230 bio->bi_disk = bio_src->bi_disk;
231 bio->bi_opf = bio_src->bi_opf;
232 bio->bi_write_hint = bio_src->bi_write_hint;
233 bio->bi_iter.bi_sector = bio_src->bi_iter.bi_sector;
234 bio->bi_iter.bi_size = bio_src->bi_iter.bi_size;
235
236 switch (bio_op(bio)) {
237 case REQ_OP_DISCARD:
238 case REQ_OP_SECURE_ERASE:
239 case REQ_OP_WRITE_ZEROES:
240 break;
241 case REQ_OP_WRITE_SAME:
242 bio->bi_io_vec[bio->bi_vcnt++] = bio_src->bi_io_vec[0];
243 break;
244 default:
245 bio_for_each_segment(bv, bio_src, iter)
246 bio->bi_io_vec[bio->bi_vcnt++] = bv;
247 break;
248 }
249
250 if (bio_integrity(bio_src)) {
251 int ret;
252
253 ret = bio_integrity_clone(bio, bio_src, gfp_mask);
254 if (ret < 0) {
255 bio_put(bio);
256 return NULL;
257 }
258 }
259
260 bio_clone_blkcg_association(bio, bio_src);
261
> 262 bio_issue_init(&bio->bi_issue, bio_sectors(bio));
263
264 return bio;
265 }
266

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (3.94 kB)
.config.gz (26.29 kB)
Download all attachments

2018-08-31 12:54:03

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 15/15] blkcg: add average latency tracking to blk-cgroup

Hi Dennis,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on block/for-next]
[also build test WARNING on v4.19-rc1 next-20180831]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Dennis-Zhou/blkcg-ref-count-refactor-cleanup-blkcg-avg_lat/20180831-161742
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-x017-201834 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All warnings (new ones prefixed by >>):

In file included from block/bio.c:20:0:
include/linux/bio.h:565:17: warning: 'struct blkcg_gq' declared inside parameter list will not be visible outside of this definition or declaration
struct blkcg_gq *blkg) { return 0; }
^~~~~~~~
block/bio.c: In function '__bio_clone_fast':
block/bio.c:614:23: error: 'struct bio' has no member named 'bi_issue'; did you mean 'bi_disk'?
bio_issue_init(&bio->bi_issue, bio_sectors(bio));
^~~~~~~~
bi_disk
In file included from include/asm-generic/bug.h:5:0,
from arch/x86/include/asm/bug.h:83,
from include/linux/bug.h:5,
from include/linux/mmdebug.h:5,
from include/linux/mm.h:9,
from block/bio.c:18:
block/bio.c: In function 'bio_endio':
block/bio.c:1750:11: error: 'struct bio' has no member named 'bi_blkg'; did you mean 'bi_flags'?
if (bio->bi_blkg && bio->bi_blkg->parent)
^
include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^~~~
>> block/bio.c:1750:2: note: in expansion of macro 'if'
if (bio->bi_blkg && bio->bi_blkg->parent)
^~
block/bio.c:1750:27: error: 'struct bio' has no member named 'bi_blkg'; did you mean 'bi_flags'?
if (bio->bi_blkg && bio->bi_blkg->parent)
^
include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^~~~
>> block/bio.c:1750:2: note: in expansion of macro 'if'
if (bio->bi_blkg && bio->bi_blkg->parent)
^~
block/bio.c:1750:11: error: 'struct bio' has no member named 'bi_blkg'; did you mean 'bi_flags'?
if (bio->bi_blkg && bio->bi_blkg->parent)
^
include/linux/compiler.h:58:42: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^~~~
>> block/bio.c:1750:2: note: in expansion of macro 'if'
if (bio->bi_blkg && bio->bi_blkg->parent)
^~
block/bio.c:1750:27: error: 'struct bio' has no member named 'bi_blkg'; did you mean 'bi_flags'?
if (bio->bi_blkg && bio->bi_blkg->parent)
^
include/linux/compiler.h:58:42: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^~~~
>> block/bio.c:1750:2: note: in expansion of macro 'if'
if (bio->bi_blkg && bio->bi_blkg->parent)
^~
block/bio.c:1750:11: error: 'struct bio' has no member named 'bi_blkg'; did you mean 'bi_flags'?
if (bio->bi_blkg && bio->bi_blkg->parent)
^
include/linux/compiler.h:69:16: note: in definition of macro '__trace_if'
______r = !!(cond); \
^~~~
>> block/bio.c:1750:2: note: in expansion of macro 'if'
if (bio->bi_blkg && bio->bi_blkg->parent)
^~
block/bio.c:1750:27: error: 'struct bio' has no member named 'bi_blkg'; did you mean 'bi_flags'?
if (bio->bi_blkg && bio->bi_blkg->parent)
^
include/linux/compiler.h:69:16: note: in definition of macro '__trace_if'
______r = !!(cond); \
^~~~
>> block/bio.c:1750:2: note: in expansion of macro 'if'
if (bio->bi_blkg && bio->bi_blkg->parent)
^~

vim +/if +1750 block/bio.c

1727
1728 /**
1729 * bio_endio - end I/O on a bio
1730 * @bio: bio
1731 *
1732 * Description:
1733 * bio_endio() will end I/O on the whole bio. bio_endio() is the preferred
1734 * way to end I/O on a bio. No one should call bi_end_io() directly on a
1735 * bio unless they own it and thus know that it has an end_io function.
1736 *
1737 * bio_endio() can be called several times on a bio that has been chained
1738 * using bio_chain(). The ->bi_end_io() function will only be called the
1739 * last time. At this point the BLK_TA_COMPLETE tracing event will be
1740 * generated if BIO_TRACE_COMPLETION is set.
1741 **/
1742 void bio_endio(struct bio *bio)
1743 {
1744 again:
1745 if (!bio_remaining_done(bio))
1746 return;
1747 if (!bio_integrity_endio(bio))
1748 return;
1749
> 1750 if (bio->bi_blkg && bio->bi_blkg->parent)
1751 blkg_record_latency(bio);
1752
1753 if (bio->bi_disk)
1754 rq_qos_done_bio(bio->bi_disk->queue, bio);
1755
1756 /*
1757 * Need to have a real endio function for chained bios, otherwise
1758 * various corner cases will break (like stacking block devices that
1759 * save/restore bi_end_io) - however, we want to avoid unbounded
1760 * recursion and blowing the stack. Tail call optimization would
1761 * handle this, but compiling with frame pointers also disables
1762 * gcc's sibling call optimization.
1763 */
1764 if (bio->bi_end_io == bio_chain_endio) {
1765 bio = __bio_chain_endio(bio);
1766 goto again;
1767 }
1768
1769 if (bio->bi_disk && bio_flagged(bio, BIO_TRACE_COMPLETION)) {
1770 trace_block_bio_complete(bio->bi_disk->queue, bio,
1771 blk_status_to_errno(bio->bi_status));
1772 bio_clear_flag(bio, BIO_TRACE_COMPLETION);
1773 }
1774
1775 blk_throtl_bio_endio(bio);
1776 /* release cgroup info */
1777 bio_uninit(bio);
1778 if (bio->bi_end_io)
1779 bio->bi_end_io(bio);
1780 }
1781 EXPORT_SYMBOL(bio_endio);
1782

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (6.50 kB)
.config.gz (30.27 kB)
Download all attachments

2018-08-31 15:29:06

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH 02/15] blkcg: delay blkg destruction until after writeback has finished

On Thu, Aug 30, 2018 at 09:53:43PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> Currently, blkcg destruction relies on a sequence of events:
> 1. Destruction starts. blkcg_css_offline() is called and blkgs
> release their reference to the blkcg. This immediately destroys
> the cgwbs (writeback).
> 2. With blkgs giving up their reference, the blkcg ref count should
> become zero and eventually call blkcg_css_free() which finally
> frees the blkcg.
>
> Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
> and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
> on the completion of all writeback associated with the blkcg. A count of
> the number of cgwbs is maintained and once that goes to zero, blkg
> destruction can follow. This should prevent premature blkg destruction.
>
> The new process for blkcg cleanup is as follows:
> 1. Destruction starts. blkcg_css_offline() is called which offlines
> writeback. Blkg destruction is delayed on the nr_cgwbs count to
> avoid punting potentially large amounts of outstanding writeback
> to root while maintaining any ongoing policies.
> 2. When the nr_cgwbs becomes zero, blkcg_destroy_blkgs() is called and
> handles destruction of blkgs. This is where the css reference held
> by each blkg is released.
> 3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
> This finally frees the blkg.
>
> It seems in the past blk-throttle didn't do the most understandable
> things with taking data from a blkg while associating with current. So,
> the simplification and unification of what blk-throttle is doing caused
> this.
>

So the general approach is correct, but it's sort of confusing because you are
using nr_cgwbs as a reference counter, because it's set at 1 at blkg creation
time regardless of wether or not there's an assocated wb cg. So instead why not
just have a refcount_t ref, set it to 1 on creation and make the wb cg take a
ref when it's attached, and then just do the get/put like normal and cleanup as
you have below? What you are doing is a reference counter masquerading as a
count of the wb cg's, just add full ref counting to the blkcg and call it a day,
it'll be much less confusing. Thanks,

Josef

2018-08-31 15:32:05

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH 03/15] blkcg: use tryget logic when associating a blkg with a bio

On Thu, Aug 30, 2018 at 09:53:44PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> There is a very small change a bio gets caught up in a really
> unfortunate race between a task migration, cgroup exiting, and itself
> trying to associate with a blkg. This is due to css offlining being
> performed after the css->refcnt is killed which triggers removal of
> blkgs that reach their blkg->refcnt of 0.
>
> To avoid this, association with a blkg should use tryget and fallback to
> using the root_blkg.
>
> Fixes: 08e18eab0c579 ("block: add bi_blkg to the bio for cgroups")
> Signed-off-by: Dennis Zhou <[email protected]>
> Cc: Jiufei Xue <[email protected]>
> Cc: Joseph Qi <[email protected]>
> Cc: Tejun Heo <[email protected]>
> Cc: Josef Bacik <[email protected]>
> Cc: Jens Axboe <[email protected]>
> ---
> block/bio.c | 3 ++-
> block/blk-throttle.c | 5 +++--
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index 04969b392c72..4473ccd22987 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1987,7 +1987,8 @@ int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
> {
> if (unlikely(bio->bi_blkg))
> return -EBUSY;
> - blkg_get(blkg);
> + if (!blkg_try_get(blkg))
> + return -ENODEV;
> bio->bi_blkg = blkg;
> return 0;
> }
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index a3eede00d302..c626e1f7cdcd 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -2129,8 +2129,9 @@ static inline void throtl_update_latency_buckets(struct throtl_data *td)
> static void blk_throtl_assoc_bio(struct throtl_grp *tg, struct bio *bio)
> {
> #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
> - if (bio->bi_css)
> - bio_associate_blkg(bio, tg_to_blkg(tg));
> + /* fallback to root_blkg if we fail to get a blkg ref */
> + if (bio->bi_css && bio_associate_blkg(bio, tg_to_blkg(tg)))
> + bio_associate_blkg(bio, bio->bi_disk->queue->root_blkg);

Except if we've already assocated a blkg this is just extra, can we do

if (bio->bi_css && (bio_associate_blkg(bio, tg_to_blkg(tg)) == -ENODEV))

to make it clear that we're only attaching it to the root if we failed to attach
a blkg at all? Thanks,

Josef

2018-08-31 15:46:42

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH 08/15] blkcg: associate a blkg for pages being evicted by swap

On Thu, Aug 30, 2018 at 09:53:49PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> A prior patch in this series added blkg association to bios issued by
> cgroups. There are two other paths that we want to attribute work back
> to the appropriate cgroup: swap and writeback. Here we modify the way
> swap tags bios to include the blkg. Writeback will be tackle in the next
> patch.
>
> Signed-off-by: Dennis Zhou <[email protected]>

Reviewed-by: Josef Bacik <[email protected]>

Thanks,

Josef

2018-08-31 15:47:28

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH 09/15] blkcg: associate writeback bios with a blkg

On Thu, Aug 30, 2018 at 09:53:50PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> One of the goals of this series is to remove a separate reference to
> the css of the bio. This can and should be accessed via bio_blkcg. In
> this patch, the wbc_init_bio call is changed such that it must be called
> after a queue has been associated with the bio.
>
> Signed-off-by: Dennis Zhou <[email protected]>

Reviewed-by: Josef Bacik <[email protected]>

Thanks,

Josef

2018-08-31 15:48:49

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH 10/15] blkcg: remove bio->bi_css and instead use bio->bi_blkg

On Thu, Aug 30, 2018 at 09:53:51PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> Prior patches ensured that all bios are now associated with some blkg.
> This now makes bio->bi_css unnecessary as blkg maintains a reference to
> the blkcg already.
>
> This patch removes the field bi_css and transfers corresponding uses to
> access via bi_blkg.
>
> Signed-off-by: Dennis Zhou <[email protected]>

Reviewed-by: Josef Bacik <[email protected]>

Thanks,

Josef

2018-08-31 15:51:19

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH 13/15] blkcg: change blkg reference counting to use percpu_ref

On Thu, Aug 30, 2018 at 09:53:54PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> Now that every bio is associated with a blkg, this puts the use of
> blkg_get, blkg_try_get, and blkg_put on the hot path. This switches over
> the refcnt in blkg to use percpu_ref.
>
> Signed-off-by: Dennis Zhou <[email protected]>

Reviewed-by: Josef Bacik <[email protected]>

Thanks,

Josef

2018-08-31 15:53:27

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH 14/15] blkcg: rename blkg_try_get to blkg_tryget

On Thu, Aug 30, 2018 at 09:53:55PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> blkg reference counting now uses percpu_ref rather than atomic_t. Let's
> make this consistent with css_tryget. This renames blkg_try_get to
> blkg_tryget and now returns a bool rather than the blkg or NULL.
>
> Signed-off-by: Dennis Zhou <[email protected]>

Reviewed-by: Josef Bacik <[email protected]>

Thanks,

Josef

2018-08-31 16:40:46

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH 04/15] blkcg: fix ref count issue with bio_blkcg using task_css

On Thu, Aug 30, 2018 at 09:53:45PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> The accessor function bio_blkcg either returns the blkcg associated with
> the bio or finds one in the current context. This can cause an issue
> when trying to associate a bio with a blkcg. Particularly, it's the
> third case that is problematic:
>
> return css_to_blkcg(task_css(current, io_cgrp_id));
>
> As the above may race against task migration and the cgroup exiting, it
> is not always ok to take a reference on the blkcg returned from
> bio_blkcg.
>
> This patch adds association ahead of calling bio_blkcg rather than
> after. This prevents makes association a required and explicit step
> along the code paths for calling bio_blkcg. blk_get_rl is modified
> as well to get a reference to the blkcg it may use and blk_put_rl
> will always put the reference back. Association is also moved above the
> bio_blkcg call to ensure it will not return NULL in blk-iolatency.
>
> Signed-off-by: Dennis Zhou <[email protected]>
> ---
> block/bio.c | 10 +++++--
> block/blk-iolatency.c | 2 +-
> include/linux/blk-cgroup.h | 53 ++++++++++++++++++++++++++++++++------
> 3 files changed, 54 insertions(+), 11 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index 4473ccd22987..09a31e4d46bb 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1962,13 +1962,19 @@ int bio_associate_blkcg_from_page(struct bio *bio, struct page *page)
> *
> * This function takes an extra reference of @blkcg_css which will be put
> * when @bio is released. The caller must own @bio and is responsible for
> - * synchronizing calls to this function.
> + * synchronizing calls to this function. If @blkcg_css is NULL, a call to
> + * blkcg_get_css finds the current css from the kthread or task.
> */
> int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css)
> {
> if (unlikely(bio->bi_css))
> return -EBUSY;
> - css_get(blkcg_css);
> +
> + if (blkcg_css)
> + css_get(blkcg_css);
> + else
> + blkcg_css = blkcg_get_css();
> +
> bio->bi_css = blkcg_css;
> return 0;
> }
> diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
> index 19923f8a029d..62fdd9002c29 100644
> --- a/block/blk-iolatency.c
> +++ b/block/blk-iolatency.c
> @@ -404,8 +404,8 @@ static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio,
> return;
>
> rcu_read_lock();
> + bio_associate_blkcg(bio, NULL);
> blkcg = bio_blkcg(bio);
> - bio_associate_blkcg(bio, &blkcg->css);
> blkg = blkg_lookup(blkcg, q);
> if (unlikely(!blkg)) {
> if (!lock)
> diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
> index c7386464ec4c..d3cafb1eda48 100644
> --- a/include/linux/blk-cgroup.h
> +++ b/include/linux/blk-cgroup.h
> @@ -230,22 +230,52 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
> char *input, struct blkg_conf_ctx *ctx);
> void blkg_conf_finish(struct blkg_conf_ctx *ctx);
>
> +/**
> + * blkcg_get_css - find and get a reference to the css
> + *
> + * Find the css associated with either the kthread or the current task.
> + */
> +static inline struct cgroup_subsys_state *blkcg_get_css(void)
> +{
> + struct cgroup_subsys_state *css;
> +
> + rcu_read_lock();
> +
> + css = kthread_blkcg();
> + if (css) {
> + css_get(css);
> + } else {
> + while (true) {
> + css = task_css(current, io_cgrp_id);
> + if (likely(css_tryget(css)))
> + break;
> + cpu_relax();

Does this work? I'm ignorant of what cpu_relax() does, but it seems if we're
rcu_read_lock()'ed here we aren't going to queisce so if we fail to get the css
here we just simply aren't going to get it unless we go to sleep right? An
honest question, because this is all magic to me, I'd like to understand how
this isn't going to infinite loop on us if css_tryget(css) fails.

> + }
> + }
> +
> + rcu_read_unlock();
> +
> + return css;
> +}
>
> static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css)
> {
> return css ? container_of(css, struct blkcg, css) : NULL;
> }
>
> +/**
> + * bio_blkcg - grab the blkcg associated with a bio
> + * @bio: target bio
> + *
> + * This returns the blkcg associated with a bio, NULL if not associated.
> + * Callers are expected to either handle NULL or know association has been
> + * done prior to calling this.
> + */
> static inline struct blkcg *bio_blkcg(struct bio *bio)
> {
> - struct cgroup_subsys_state *css;
> -
> if (bio && bio->bi_css)
> return css_to_blkcg(bio->bi_css);
> - css = kthread_blkcg();
> - if (css)
> - return css_to_blkcg(css);
> - return css_to_blkcg(task_css(current, io_cgrp_id));
> + return NULL;
> }
>

So this is fine per se, but I know recently I was doing a bio_blkcg(NULL) to get
whatever the blkcg was for the current task. I threw that work away so I'm not
worried about me, but have you made sure nobody else is doing something similar?

> static inline bool blk_cgroup_congested(void)
> @@ -519,6 +549,11 @@ static inline struct request_list *blk_get_rl(struct request_queue *q,
> rcu_read_lock();
>
> blkcg = bio_blkcg(bio);
> + if (blkcg) {
> + css_get(&blkcg->css);
> + } else {
> + blkcg = css_to_blkcg(blkcg_get_css());
> + }

Kill these extra braces please. Thanks,

Josef

2018-08-31 16:40:48

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH 05/15] blkcg: update blkg_lookup_create to do locking

On Thu, Aug 30, 2018 at 09:53:46PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> To know when to create a blkg, the general pattern is to do a
> blkg_lookup and if that fails, lock and then do a lookup again and if
> that fails finally create. It doesn't make much sense for everyone who
> wants to do creation to write this themselves.
>
> This changes blkg_lookup_create to do locking and implement this
> pattern. The old blkg_lookup_create is renamed to __blkg_lookup_create.
> If a call site wants to do its own error handling or already owns the
> queue lock, they can use __blkg_lookup_create.
>
> Signed-off-by: Dennis Zhou <[email protected]>

Reviewed-by: Josef Bacik <[email protected]>

Thanks,

Josef

2018-08-31 16:59:38

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH 07/15] blkcg: consolidate bio_issue_init and blkg association

On Thu, Aug 30, 2018 at 09:53:48PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> This removes the now duplicate association logic in blk-throttle and
> blk-iolatency. bio_issue_init is moved into blkcg_bio_issue_check and
> into the bio clone variants to allow for the future addition of a
> latency moving average for IOs.
>
> Signed-off-by: Dennis Zhou <[email protected]>
> ---
> block/bio.c | 2 ++
> block/blk-iolatency.c | 24 +-----------------------
> block/blk-throttle.c | 13 +------------
> block/bounce.c | 2 ++
> include/linux/blk-cgroup.h | 2 ++
> 5 files changed, 8 insertions(+), 35 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index e937f9681188..ab41f5b7eb1f 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -610,6 +610,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
> bio->bi_io_vec = bio_src->bi_io_vec;
>
> bio_clone_blkcg_association(bio, bio_src);
> +
> + bio_issue_init(&bio->bi_issue, bio_sectors(bio));
> }
> EXPORT_SYMBOL(__bio_clone_fast);
>
> diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
> index 22b2ff0440cc..9d7052bad6f7 100644
> --- a/block/blk-iolatency.c
> +++ b/block/blk-iolatency.c
> @@ -395,34 +395,12 @@ static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio,
> spinlock_t *lock)
> {
> struct blk_iolatency *blkiolat = BLKIOLATENCY(rqos);
> - struct blkcg *blkcg;
> - struct blkcg_gq *blkg;
> - struct request_queue *q = rqos->q;
> + struct blkcg_gq *blkg = bio->bi_blkg;
> bool issue_as_root = bio_issue_as_root_blkg(bio);
>
> if (!blk_iolatency_enabled(blkiolat))
> return;
>
> - rcu_read_lock();
> - bio_associate_blkcg(bio, NULL);
> - blkcg = bio_blkcg(bio);
> - blkg = blkg_lookup(blkcg, q);
> - if (unlikely(!blkg)) {
> - if (!lock)
> - spin_lock_irq(q->queue_lock);
> - blkg = __blkg_lookup_create(blkcg, q);
> - if (IS_ERR(blkg))
> - blkg = NULL;
> - if (!lock)
> - spin_unlock_irq(q->queue_lock);
> - }
> - if (!blkg)
> - goto out;
> -
> - bio_issue_init(&bio->bi_issue, bio_sectors(bio));
> - bio_associate_blkg(bio, blkg);
> -out:
> - rcu_read_unlock();

Move this removal to the previous patch, so you keep this patch soley about the
bio_issue_init. Thanks,

Josef

2018-08-31 20:20:35

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH 02/15] blkcg: delay blkg destruction until after writeback has finished

Hi Josef,

On Fri, Aug 31, 2018 at 11:27:07AM -0400, Josef Bacik wrote:
> So the general approach is correct, but it's sort of confusing because you are
> using nr_cgwbs as a reference counter, because it's set at 1 at blkg creation
> time regardless of wether or not there's an assocated wb cg. So instead why not
> just have a refcount_t ref, set it to 1 on creation and make the wb cg take a
> ref when it's attached, and then just do the get/put like normal and cleanup as
> you have below? What you are doing is a reference counter masquerading as a
> count of the wb cg's, just add full ref counting to the blkcg and call it a day,
> it'll be much less confusing. Thanks,

Yeah, that makes more sense. I've switched to using refcount_t and
renamed it to wbcg_refcnt. The corresponding actions have been renamed.

I've also fixed the kbuild error in v2.

Thanks,
Dennis

2018-08-31 20:22:01

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH 03/15] blkcg: use tryget logic when associating a blkg with a bio

Hi Josef,

On Fri, Aug 31, 2018 at 11:30:08AM -0400, Josef Bacik wrote:
> On Thu, Aug 30, 2018 at 09:53:44PM -0400, Dennis Zhou wrote:
> > diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> > index a3eede00d302..c626e1f7cdcd 100644
> > --- a/block/blk-throttle.c
> > +++ b/block/blk-throttle.c
> > @@ -2129,8 +2129,9 @@ static inline void throtl_update_latency_buckets(struct throtl_data *td)
> > static void blk_throtl_assoc_bio(struct throtl_grp *tg, struct bio *bio)
> > {
> > #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
> > - if (bio->bi_css)
> > - bio_associate_blkg(bio, tg_to_blkg(tg));
> > + /* fallback to root_blkg if we fail to get a blkg ref */
> > + if (bio->bi_css && bio_associate_blkg(bio, tg_to_blkg(tg)))
> > + bio_associate_blkg(bio, bio->bi_disk->queue->root_blkg);
>
> Except if we've already assocated a blkg this is just extra, can we do
>
> if (bio->bi_css && (bio_associate_blkg(bio, tg_to_blkg(tg)) == -ENODEV))
>
> to make it clear that we're only attaching it to the root if we failed to attach
> a blkg at all? Thanks,

Done in v2.

Thanks,
Dennis

2018-08-31 23:06:48

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 04/15] blkcg: fix ref count issue with bio_blkcg using task_css

Hello,

On Fri, Aug 31, 2018 at 11:35:39AM -0400, Josef Bacik wrote:
> > +static inline struct cgroup_subsys_state *blkcg_get_css(void)
> > +{
> > + struct cgroup_subsys_state *css;
> > +
> > + rcu_read_lock();
> > +
> > + css = kthread_blkcg();
> > + if (css) {
> > + css_get(css);
> > + } else {
> > + while (true) {
> > + css = task_css(current, io_cgrp_id);
> > + if (likely(css_tryget(css)))
> > + break;
> > + cpu_relax();
>
> Does this work? I'm ignorant of what cpu_relax() does, but it seems if we're
> rcu_read_lock()'ed here we aren't going to queisce so if we fail to get the css
> here we just simply aren't going to get it unless we go to sleep right? An
> honest question, because this is all magic to me, I'd like to understand how
> this isn't going to infinite loop on us if css_tryget(css) fails.

The only time css_tryget() on task_css(current, xxx) can fail is if it
races against the current thread migrating away from that cgroup and
that cgroup is now getting destroyed. IOW,

1. For css_tryget() to fail, the cgroup must be dying.
2. The cgroup must be empty for it to be dying.
3. current must have already been migrated away to a different cgroup.

So, the above happens only when racing against css_set_move_task() -
it's seeing the old css pointer. As the membership pointer switching
must already have happened, all it's waiting for is the new css
membership pointer to be propagated on the polling cpu, making
cpu_relax() busy loop the right thing to do.

This pattern is also used in task_get_css() and cgroup_sk_alloc().
Given that it's a bit tricky, it probably would be worthwhile to
factor out and document it.

Once Josef's other concerns are addressed, please feel free to add

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2018-08-31 23:11:39

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 05/15] blkcg: update blkg_lookup_create to do locking

On Thu, Aug 30, 2018 at 09:53:46PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> To know when to create a blkg, the general pattern is to do a
> blkg_lookup and if that fails, lock and then do a lookup again and if
> that fails finally create. It doesn't make much sense for everyone who
> wants to do creation to write this themselves.
>
> This changes blkg_lookup_create to do locking and implement this
> pattern. The old blkg_lookup_create is renamed to __blkg_lookup_create.
> If a call site wants to do its own error handling or already owns the
> queue lock, they can use __blkg_lookup_create.
>
> Signed-off-by: Dennis Zhou <[email protected]>

It looks a bit weird w/o actual users, might be worthwhile to mention
that future patches will add users.

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2018-08-31 23:17:40

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 06/15] blkcg: always associate a bio with a blkg

Hello,

On Thu, Aug 30, 2018 at 09:53:47PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> Previously, blkg's were only assigned as needed by blk-iolatency and
> blk-throttle. bio->css was also always being associated while blkg was
> being looked up and then thrown away in blkcg_bio_issue_check.
>
> This patch beings the cleanup of bio->css and bio->bi_blkg by always
^
begins

> +int bio_associate_create_blkg(struct request_queue *q, struct bio *bio)
> +{
> + struct blkcg *blkcg;
> + struct blkcg_gq *blkg;
> + int ret = 0;
> +
> + /* someone has already associated this bio with a blkg */
> + if (bio->bi_blkg)
> + return ret;
> +
> + rcu_read_lock();
> +
> + bio_associate_blkcg(bio, NULL);
> + blkcg = bio_blkcg(bio);
> +
> + if (!blkcg->css.parent) {
> + ret = bio_associate_blkg(bio, q->root_blkg);
> + goto assoc_out;
> + }
> +
> + blkg = blkg_lookup_create(blkcg, q);
> + if (IS_ERR(blkg))
> + blkg = q->root_blkg;
> +
> + ret = bio_associate_blkg(bio, blkg);
> +assoc_out:

Maybe if/else instead of goto?

Other than that,

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2018-08-31 23:46:54

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 07/15] blkcg: consolidate bio_issue_init and blkg association

On Thu, Aug 30, 2018 at 09:53:48PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> This removes the now duplicate association logic in blk-throttle and
> blk-iolatency. bio_issue_init is moved into blkcg_bio_issue_check and
> into the bio clone variants to allow for the future addition of a
> latency moving average for IOs.
>
> Signed-off-by: Dennis Zhou <[email protected]>

I don't have strong feelings about where the removal goes, so please
feel free to reorganize patches per Josef's suggestion.

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2018-08-31 23:49:50

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 08/15] blkcg: associate a blkg for pages being evicted by swap

On Thu, Aug 30, 2018 at 09:53:49PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> A prior patch in this series added blkg association to bios issued by
> cgroups. There are two other paths that we want to attribute work back
> to the appropriate cgroup: swap and writeback. Here we modify the way
> swap tags bios to include the blkg. Writeback will be tackle in the next
> patch.
>
> Signed-off-by: Dennis Zhou <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2018-08-31 23:55:18

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 09/15] blkcg: associate writeback bios with a blkg

On Thu, Aug 30, 2018 at 09:53:50PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> One of the goals of this series is to remove a separate reference to
> the css of the bio. This can and should be accessed via bio_blkcg. In
> this patch, the wbc_init_bio call is changed such that it must be called
> after a queue has been associated with the bio.
>
> Signed-off-by: Dennis Zhou <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2018-09-01 00:14:34

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 10/15] blkcg: remove bio->bi_css and instead use bio->bi_blkg

On Thu, Aug 30, 2018 at 09:53:51PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> Prior patches ensured that all bios are now associated with some blkg.
> This now makes bio->bi_css unnecessary as blkg maintains a reference to
> the blkcg already.
>
> This patch removes the field bi_css and transfers corresponding uses to
> access via bi_blkg.
>
> Signed-off-by: Dennis Zhou <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2018-09-01 00:27:52

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 11/15] blkcg: remove additional reference to the css

Hello,

On Thu, Aug 30, 2018 at 09:53:52PM -0400, Dennis Zhou wrote:
> - css = cgroup_get_e_css(page->mem_cgroup->css.cgroup, &io_cgrp_subsys);
>
> - return __bio_associate_blkg_from_css(bio, css);
> + rcu_read_lock();
> +
> + while (true) {
> + css = cgroup_e_css(page->mem_cgroup->css.cgroup,
> + &io_cgrp_subsys);

So, while they seem similar cgroup_e_css() and cgroup_get_e_css()
behave very differently in terms of locking. cgroup_e_css() can only
be used under cgroup_mutex because it is used during migration and has
to test cgroup_ss_mask(). The right thing to do here would be
renaming cgroup_e_css() to something else and add a new implementation
which operates in the same way as cgroup_get_e_css().

BTW, this should have triggered lockdep warning. I'd strongly
recommend testing with lockdep enabled.

Other than that, looks good to me.

Thanks.

--
tejun

2018-09-01 00:30:23

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 12/15] blkcg: cleanup and make blk_get_rl use blkg_lookup_create

On Thu, Aug 30, 2018 at 09:53:53PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> blk_get_rl is responsible for identifying which request_list a request
> should be allocated to. Try get logic was added earlier, but
> semantically the logic was not changed.
>
> This patch makes better use of the bio already having a reference to the
> blkg in the hot path. The cold path uses a better fallback of
> blkg_lookup_create rather than just blkg_lookup and then falling back to
> the q->root_rl. If lookup_create fails with anything but -ENODEV, it
> falls back to q->root_rl.
>
> A clarifying comment is added to explain why q->root_rl is used rather
> than the root blkg's rl.
>
> Signed-off-by: Dennis Zhou <[email protected]>

Acked-by: Tejun Heo <[email protected]>

We're replicating that retry busy loop a lot. It'd be really great to
factor that out and document what it's doing.

Thanks.

--
tejun

2018-09-01 00:33:22

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 13/15] blkcg: change blkg reference counting to use percpu_ref

Hello,

On Thu, Aug 30, 2018 at 09:53:54PM -0400, Dennis Zhou wrote:
> @@ -217,6 +240,10 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
> blkg_get(blkg->parent);
> }
>
> + ret = percpu_ref_init(&blkg->refcnt, __blkg_release, 0, GFP_KERNEL);

So, while this would work in some configs, you can't depend on RCU
grace period inside percpu_ref. blkg is now percpu-reference counted
and rcu protected object - it has to explicitly go through a rcu grace
period before release.

Thanks.

--
tejun

2018-09-01 00:34:13

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 14/15] blkcg: rename blkg_try_get to blkg_tryget

On Thu, Aug 30, 2018 at 09:53:55PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <[email protected]>
>
> blkg reference counting now uses percpu_ref rather than atomic_t. Let's
> make this consistent with css_tryget. This renames blkg_try_get to
> blkg_tryget and now returns a bool rather than the blkg or NULL.
>
> Signed-off-by: Dennis Zhou <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2018-09-01 00:36:36

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 00/15] blkcg ref count refactor/cleanup + blkcg avg_lat

Hello,

On Thu, Aug 30, 2018 at 09:53:41PM -0400, Dennis Zhou wrote:
> This is a fairly lengthy patchset that aims to cleanup reference
> counting for blkcgs and blkgs. There are 4 problems that this patchset
> tries to address:
> 1. fix blkcg destruction
> 2. always associate a bio with a blkg
> 3. remove the extra css ref held by bios and utilize the blkg ref
> 4. add average latency tracking to blkcg core in io.stat.

1 is already merged. Reviewed 2, 3 parts. Generally looks great.
Let's try to address the found issues and get these two parts merged
first.

Thanks.

--
tejun

2018-09-06 15:23:45

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH 04/15] blkcg: fix ref count issue with bio_blkcg using task_css

On Fri, Aug 31, 2018 at 11:35:39AM -0400, Josef Bacik wrote:
> On Thu, Aug 30, 2018 at 09:53:45PM -0400, Dennis Zhou wrote:
> > From: "Dennis Zhou (Facebook)" <[email protected]>
> > +/**
> > + * blkcg_get_css - find and get a reference to the css
> > + *
> > + * Find the css associated with either the kthread or the current task.
> > + */
> > +static inline struct cgroup_subsys_state *blkcg_get_css(void)
> > +{
> > + struct cgroup_subsys_state *css;
> > +
> > + rcu_read_lock();
> > +
> > + css = kthread_blkcg();
> > + if (css) {
> > + css_get(css);
> > + } else {
> > + while (true) {
> > + css = task_css(current, io_cgrp_id);
> > + if (likely(css_tryget(css)))
> > + break;
> > + cpu_relax();
>
> Does this work? I'm ignorant of what cpu_relax() does, but it seems if we're
> rcu_read_lock()'ed here we aren't going to queisce so if we fail to get the css
> here we just simply aren't going to get it unless we go to sleep right? An
> honest question, because this is all magic to me, I'd like to understand how
> this isn't going to infinite loop on us if css_tryget(css) fails.
>

Tejun replied earlier with an indepth answer. Thanks Tejun! I'll make
sure to add a comment detailing what's going on.

> > +/**
> > + * bio_blkcg - grab the blkcg associated with a bio
> > + * @bio: target bio
> > + *
> > + * This returns the blkcg associated with a bio, NULL if not associated.
> > + * Callers are expected to either handle NULL or know association has been
> > + * done prior to calling this.
> > + */
> > static inline struct blkcg *bio_blkcg(struct bio *bio)
> > {
> > - struct cgroup_subsys_state *css;
> > -
> > if (bio && bio->bi_css)
> > return css_to_blkcg(bio->bi_css);
> > - css = kthread_blkcg();
> > - if (css)
> > - return css_to_blkcg(css);
> > - return css_to_blkcg(task_css(current, io_cgrp_id));
> > + return NULL;
> > }
> >
>
> So this is fine per se, but I know recently I was doing a bio_blkcg(NULL) to get
> whatever the blkcg was for the current task. I threw that work away so I'm not
> worried about me, but have you made sure nobody else is doing something similar?
>

Initially I thought the BFQ and CFQ stuff only interacted with bios
which should already be associated. Turns out during init, they rely on
that bio_blkcg to read from current and then do the wrong thing of hard
associating with it (_get vs _tryget).

I've created a __bio_blkcg which is identical to the old function with
notes to not use it. Making changes to BFQ and CFQ would take a good bit
more work to make sure I'm not breaking what they're expecting to do, so
I leave that to future work.

> > static inline bool blk_cgroup_congested(void)
> > @@ -519,6 +549,11 @@ static inline struct request_list *blk_get_rl(struct request_queue *q,
> > rcu_read_lock();
> >
> > blkcg = bio_blkcg(bio);
> > + if (blkcg) {
> > + css_get(&blkcg->css);
> > + } else {
> > + blkcg = css_to_blkcg(blkcg_get_css());
> > + }
>
> Kill these extra braces please. Thanks,

Done.

Thanks,
Dennis

2018-09-06 21:30:34

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH 07/15] blkcg: consolidate bio_issue_init and blkg association

On Fri, Aug 31, 2018 at 11:42:26AM -0400, Josef Bacik wrote:
> On Thu, Aug 30, 2018 at 09:53:48PM -0400, Dennis Zhou wrote:
> > diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
> > index 22b2ff0440cc..9d7052bad6f7 100644
> > --- a/block/blk-iolatency.c
> > +++ b/block/blk-iolatency.c
> > @@ -395,34 +395,12 @@ static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio,
> > spinlock_t *lock)
> > {
> > struct blk_iolatency *blkiolat = BLKIOLATENCY(rqos);
> > - struct blkcg *blkcg;
> > - struct blkcg_gq *blkg;
> > - struct request_queue *q = rqos->q;
> > + struct blkcg_gq *blkg = bio->bi_blkg;
> > bool issue_as_root = bio_issue_as_root_blkg(bio);
> >
> > if (!blk_iolatency_enabled(blkiolat))
> > return;
> >
> > - rcu_read_lock();
> > - bio_associate_blkcg(bio, NULL);
> > - blkcg = bio_blkcg(bio);
> > - blkg = blkg_lookup(blkcg, q);
> > - if (unlikely(!blkg)) {
> > - if (!lock)
> > - spin_lock_irq(q->queue_lock);
> > - blkg = __blkg_lookup_create(blkcg, q);
> > - if (IS_ERR(blkg))
> > - blkg = NULL;
> > - if (!lock)
> > - spin_unlock_irq(q->queue_lock);
> > - }
> > - if (!blkg)
> > - goto out;
> > -
> > - bio_issue_init(&bio->bi_issue, bio_sectors(bio));
> > - bio_associate_blkg(bio, blkg);
> > -out:
> > - rcu_read_unlock();
>
> Move this removal to the previous patch, so you keep this patch soley about the
> bio_issue_init. Thanks,
>

I've moved this removal to the previous patch and addressed the kbuild
errors.

Thanks,
Dennis

2018-09-06 21:31:34

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH 11/15] blkcg: remove additional reference to the css

On Fri, Aug 31, 2018 at 05:26:03PM -0700, Tejun Heo wrote:
> Hello,
>
> On Thu, Aug 30, 2018 at 09:53:52PM -0400, Dennis Zhou wrote:
> > - css = cgroup_get_e_css(page->mem_cgroup->css.cgroup, &io_cgrp_subsys);
> >
> > - return __bio_associate_blkg_from_css(bio, css);
> > + rcu_read_lock();
> > +
> > + while (true) {
> > + css = cgroup_e_css(page->mem_cgroup->css.cgroup,
> > + &io_cgrp_subsys);
>
> So, while they seem similar cgroup_e_css() and cgroup_get_e_css()
> behave very differently in terms of locking. cgroup_e_css() can only
> be used under cgroup_mutex because it is used during migration and has
> to test cgroup_ss_mask(). The right thing to do here would be
> renaming cgroup_e_css() to something else and add a new implementation
> which operates in the same way as cgroup_get_e_css().
>
> BTW, this should have triggered lockdep warning. I'd strongly
> recommend testing with lockdep enabled.
>
> Other than that, looks good to me.
>

I see. I've renamed the original cgroup_e_css() to
cgroup_e_css_by_mask() and then did what cgroup_get_e_css() did without
the get part in the new cgroup_e_css().

Thanks,
Dennis

2018-09-06 21:32:35

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH 13/15] blkcg: change blkg reference counting to use percpu_ref

On Fri, Aug 31, 2018 at 05:31:59PM -0700, Tejun Heo wrote:
> Hello,
>
> On Thu, Aug 30, 2018 at 09:53:54PM -0400, Dennis Zhou wrote:
> > @@ -217,6 +240,10 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
> > blkg_get(blkg->parent);
> > }
> >
> > + ret = percpu_ref_init(&blkg->refcnt, __blkg_release, 0, GFP_KERNEL);
>
> So, while this would work in some configs, you can't depend on RCU
> grace period inside percpu_ref. blkg is now percpu-reference counted
> and rcu protected object - it has to explicitly go through a rcu grace
> period before release.
>

Ah ok. I've made it into a call_rcu which should work.

Thanks,
Dennis

2018-09-06 23:01:42

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH 06/15] blkcg: always associate a bio with a blkg

On Fri, Aug 31, 2018 at 04:16:09PM -0700, Tejun Heo wrote:
> Hello,
>
> On Thu, Aug 30, 2018 at 09:53:47PM -0400, Dennis Zhou wrote:
> > From: "Dennis Zhou (Facebook)" <[email protected]>
> >
> > Previously, blkg's were only assigned as needed by blk-iolatency and
> > blk-throttle. bio->css was also always being associated while blkg was
> > being looked up and then thrown away in blkcg_bio_issue_check.
> >
> > This patch beings the cleanup of bio->css and bio->bi_blkg by always
> ^
> begins
>

Thanks!

> > +int bio_associate_create_blkg(struct request_queue *q, struct bio *bio)
> > +{
> > + struct blkcg *blkcg;
> > + struct blkcg_gq *blkg;
> > + int ret = 0;
> > +
> > + /* someone has already associated this bio with a blkg */
> > + if (bio->bi_blkg)
> > + return ret;
> > +
> > + rcu_read_lock();
> > +
> > + bio_associate_blkcg(bio, NULL);
> > + blkcg = bio_blkcg(bio);
> > +
> > + if (!blkcg->css.parent) {
> > + ret = bio_associate_blkg(bio, q->root_blkg);
> > + goto assoc_out;
> > + }
> > +
> > + blkg = blkg_lookup_create(blkcg, q);
> > + if (IS_ERR(blkg))
> > + blkg = q->root_blkg;
> > +
> > + ret = bio_associate_blkg(bio, blkg);
> > +assoc_out:
>
> Maybe if/else instead of goto?
>

Yeah, no need for goto.

Thanks,
Dennis

2018-09-07 03:18:08

by Chen, Rong A

[permalink] [raw]
Subject: [LKP] [blkcg] 6ef69a3a0b: WARNING:suspicious_RCU_usage

FYI, we noticed the following commit (built with gcc-7):

commit: 6ef69a3a0b4ac904f7c3b9cb78b5d51520dc84f4 ("[PATCH 13/15] blkcg: change blkg reference counting to use percpu_ref")
url: https://github.com/0day-ci/linux/commits/Dennis-Zhou/blkcg-ref-count-refactor-cleanup-blkcg-avg_lat/20180831-161742
base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -m 256M

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+------------------------------------------------------------------------------------+------------+------------+
| | 22f657e287 | 6ef69a3a0b |
+------------------------------------------------------------------------------------+------------+------------+
| boot_successes | 0 | 0 |
| boot_failures | 14 | 33 |
| WARNING:at_mm/slab_common.c:#kmalloc_slab | 14 | 33 |
| EIP:kmalloc_slab | 14 | 33 |
| Mem-Info | 14 | 33 |
| WARNING:at_arch/x86/mm/dump_pagetables.c:#note_page | 14 | 31 |
| EIP:note_page | 14 | 31 |
| WARNING:suspicious_RCU_usage | 0 | 33 |
| include/linux/rcupdate.h:#Illegal_context_switch_in_RCU_read-side_critical_section | 0 | 33 |
| BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c | 0 | 33 |
+------------------------------------------------------------------------------------+------------+------------+



[ 5.313007] WARNING: suspicious RCU usage
[ 5.313705] 4.19.0-rc1-00175-g6ef69a3 #633 Tainted: G W
[ 5.314812] -----------------------------
[ 5.315231] include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section!
[ 5.315231]
[ 5.315231] other info that might help us debug this:
[ 5.315231]
[ 5.315231]
[ 5.315231] rcu_scheduler_active = 2, debug_locks = 1
[ 5.315231] 4 locks held by swapper/1:
[ 5.315231] #0: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x45/0xb0
[ 5.315231] #1: (ptrval) (ide_cfg_mtx){+.+.}, at: ide_port_setup_devices+0x1c/0x120
[ 5.315231] #2: (ptrval) (rcu_read_lock){....}, at: blkcg_init_queue+0x21/0x160
[ 5.315231] #3: (ptrval) (&(&q->__queue_lock)->rlock){....}, at: blkcg_init_queue+0x5e/0x160
[ 5.315231]
[ 5.315231] stack backtrace:
[ 5.315231] CPU: 0 PID: 1 Comm: swapper Tainted: G W 4.19.0-rc1-00175-g6ef69a3 #633
[ 5.315231] Call Trace:
[ 5.315231] ? dump_stack+0x16/0x26
[ 5.315231] ? lockdep_rcu_suspicious+0x91/0xa0
[ 5.315231] ? ___might_sleep+0x182/0x230
[ 5.315231] ? blkg_alloc+0x140/0x140
[ 5.315231] ? __might_sleep+0x2d/0x80
[ 5.315231] ? __mutex_lock+0x21/0x4e0
[ 5.315231] ? kvm_sched_clock_read+0x14/0x30
[ 5.315231] ? sched_clock+0x9/0x10
[ 5.315231] ? sched_clock_local+0x87/0x160
[ 5.315231] ? blkg_alloc+0x140/0x140
[ 5.315231] ? mutex_lock_killable_nested+0x14/0x20
[ 5.315231] ? pcpu_alloc+0x2c5/0x610
[ 5.315231] ? pcpu_alloc+0x2c5/0x610
[ 5.315231] ? kfree+0xdd/0x140
[ 5.315231] ? blkg_alloc+0x140/0x140
[ 5.315231] ? __alloc_percpu_gfp+0xb/0x10
[ 5.315231] ? percpu_ref_init+0x1e/0x90
[ 5.315231] ? blkg_create+0x18f/0x510
[ 5.315231] ? blkcg_init_queue+0x6c/0x160
[ 5.315231] ? blkcg_init_queue+0x21/0x160
[ 5.315231] ? blk_alloc_queue_node+0x2c5/0x370
[ 5.315231] ? ide_port_setup_devices+0x77/0x120
[ 5.315231] ? ide_host_register+0x567/0x5e0
[ 5.315231] ? ide_pci_init_two+0x56b/0x800
[ 5.315231] ? sched_clock_local+0x87/0x160
[ 5.315231] ? _raw_spin_unlock_irqrestore+0x2a/0x50
[ 5.315231] ? lockdep_hardirqs_on+0xec/0x1a0
[ 5.315231] ? _raw_spin_unlock_irqrestore+0x2a/0x50
[ 5.315231] ? trace_hardirqs_on+0x36/0xe0
[ 5.315231] ? __pm_runtime_resume+0x4e/0x80
[ 5.315231] ? ide_pci_init_one+0xd/0x10
[ 5.315231] ? piix_init_one+0x16/0x20
[ 5.315231] ? pci_device_probe+0xb5/0x140
[ 5.315231] ? really_probe+0x19b/0x290
[ 5.315231] ? driver_probe_device+0x49/0x140
[ 5.315231] ? __driver_attach+0xa9/0xb0
[ 5.315231] ? driver_probe_device+0x140/0x140
[ 5.315231] ? bus_for_each_dev+0x4f/0x80
[ 5.315231] ? driver_attach+0x14/0x20
[ 5.315231] ? driver_probe_device+0x140/0x140
[ 5.315231] ? bus_add_driver+0x157/0x1e0
[ 5.315231] ? pci_bus_num_vf+0x10/0x10
[ 5.315231] ? driver_register+0x51/0xe0
[ 5.315231] ? pdc202new_ide_init+0x16/0x16
[ 5.315231] ? __pci_register_driver+0x4b/0x50
[ 5.315231] ? piix_ide_init+0x8f/0x94
[ 5.315231] ? do_one_initcall+0xa1/0x1a7
[ 5.315231] ? rcu_read_lock_sched_held+0x4f/0x70
[ 5.315231] ? trace_initcall_level+0x57/0x80
[ 5.315231] ? kernel_init_freeable+0xdb/0x180
[ 5.315231] ? kernel_init_freeable+0x100/0x180
[ 5.315231] ? rest_init+0x90/0x90
[ 5.315231] ? kernel_init+0x8/0xf0
[ 5.315231] ? ret_from_fork+0x19/0x24
[ 5.315231] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:908
[ 5.315231] in_atomic(): 1, irqs_disabled(): 1, pid: 1, name: swapper
[ 5.315231] 4 locks held by swapper/1:
[ 5.315231] #0: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x45/0xb0
[ 5.315231] #1: (ptrval) (ide_cfg_mtx){+.+.}, at: ide_port_setup_devices+0x1c/0x120
[ 5.315231] #2: (ptrval) (rcu_read_lock){....}, at: blkcg_init_queue+0x21/0x160
[ 5.315231] #3: (ptrval) (&(&q->__queue_lock)->rlock){....}, at: blkcg_init_queue+0x5e/0x160
[ 5.315231] irq event stamp: 996210
[ 5.315231] hardirqs last enabled at (996209): [<47b49149>] kmem_cache_alloc_trace+0xa9/0x250
[ 5.315231] hardirqs last disabled at (996210): [<48cfea62>] _raw_spin_lock_irq+0x12/0x60
[ 5.315231] softirqs last enabled at (996106): [<48d01516>] __do_softirq+0x246/0x344
[ 5.315231] softirqs last disabled at (996097): [<47a0a74c>] do_softirq_own_stack+0x1c/0x30
[ 5.315231] CPU: 0 PID: 1 Comm: swapper Tainted: G W 4.19.0-rc1-00175-g6ef69a3 #633
[ 5.315231] Call Trace:
[ 5.315231] ? dump_stack+0x16/0x26
[ 5.315231] ? ___might_sleep+0x13b/0x230
[ 5.315231] ? blkg_alloc+0x140/0x140
[ 5.315231] ? __might_sleep+0x2d/0x80
[ 5.315231] ? __mutex_lock+0x21/0x4e0
[ 5.315231] ? kvm_sched_clock_read+0x14/0x30
[ 5.315231] ? sched_clock+0x9/0x10
[ 5.315231] ? sched_clock_local+0x87/0x160
[ 5.315231] ? blkg_alloc+0x140/0x140
[ 5.315231] ? mutex_lock_killable_nested+0x14/0x20
[ 5.315231] ? pcpu_alloc+0x2c5/0x610
[ 5.315231] ? pcpu_alloc+0x2c5/0x610
[ 5.315231] ? kfree+0xdd/0x140
[ 5.315231] ? blkg_alloc+0x140/0x140
[ 5.315231] ? __alloc_percpu_gfp+0xb/0x10
[ 5.315231] ? percpu_ref_init+0x1e/0x90
[ 5.315231] ? blkg_create+0x18f/0x510
[ 5.315231] ? blkcg_init_queue+0x6c/0x160
[ 5.315231] ? blkcg_init_queue+0x21/0x160
[ 5.315231] ? blk_alloc_queue_node+0x2c5/0x370
[ 5.315231] ? ide_port_setup_devices+0x77/0x120
[ 5.315231] ? ide_host_register+0x567/0x5e0
[ 5.315231] ? ide_pci_init_two+0x56b/0x800
[ 5.315231] ? sched_clock_local+0x87/0x160
[ 5.315231] ? _raw_spin_unlock_irqrestore+0x2a/0x50
[ 5.315231] ? lockdep_hardirqs_on+0xec/0x1a0
[ 5.315231] ? _raw_spin_unlock_irqrestore+0x2a/0x50
[ 5.315231] ? trace_hardirqs_on+0x36/0xe0
[ 5.315231] ? __pm_runtime_resume+0x4e/0x80
[ 5.315231] ? ide_pci_init_one+0xd/0x10
[ 5.315231] ? piix_init_one+0x16/0x20
[ 5.315231] ? pci_device_probe+0xb5/0x140
[ 5.315231] ? really_probe+0x19b/0x290
[ 5.315231] ? driver_probe_device+0x49/0x140
[ 5.315231] ? __driver_attach+0xa9/0xb0
[ 5.315231] ? driver_probe_device+0x140/0x140
[ 5.315231] ? bus_for_each_dev+0x4f/0x80
[ 5.315231] ? driver_attach+0x14/0x20
[ 5.315231] ? driver_probe_device+0x140/0x140
[ 5.315231] ? bus_add_driver+0x157/0x1e0
[ 5.315231] ? pci_bus_num_vf+0x10/0x10
[ 5.315231] ? driver_register+0x51/0xe0
[ 5.315231] ? pdc202new_ide_init+0x16/0x16
[ 5.315231] ? __pci_register_driver+0x4b/0x50
[ 5.315231] ? piix_ide_init+0x8f/0x94
[ 5.315231] ? do_one_initcall+0xa1/0x1a7
[ 5.315231] ? rcu_read_lock_sched_held+0x4f/0x70
[ 5.315231] ? trace_initcall_level+0x57/0x80
[ 5.315231] ? kernel_init_freeable+0xdb/0x180
[ 5.315231] ? kernel_init_freeable+0x100/0x180
[ 5.315231] ? rest_init+0x90/0x90
[ 5.315231] ? kernel_init+0x8/0xf0
[ 5.315231] ? ret_from_fork+0x19/0x24
[ 5.418590] ide_generic: please use "probe_mask=0x3f" module parameter for probing all legacy ISA IDE ports
[ 5.420208] Loading iSCSI transport class v2.0-870.
[ 5.424773] rdac: device handler registered
[ 5.425612] hp_sw: device handler registered
[ 5.426442] alua: device handler registered
[ 5.427168] st: Version 20160209, fixed bufsize 32768, s/g segs 256
[ 5.428294] osst :I: Tape driver with OnStream support version 0.99.4
[ 5.428294] osst :I: $Id: osst.c,v 1.73 2005/01/01 21:13:34 wriede Exp $
[ 5.431683] Rounding down aligned max_sectors from 4294967295 to 4294967288
[ 5.433091] db_root: cannot open: /etc/target
[ 5.434136] SSFDC read-only Flash Translation layer
[ 5.435107] L440GX flash mapping: failed to find PIIX4 ISA bridge, cannot continue
[ 5.436418] device id = 2440
[ 5.436921] device id = 2480
[ 5.437430] device id = 24c0
[ 5.437931] device id = 24d0
[ 5.438464] device id = 25a1
[ 5.438975] device id = 2670
[ 5.439673] slram: not enough parameters.
[ 5.557989] No valid DiskOnChip devices found
[ 5.575575] [nandsim] warning: read_byte: unexpected data output cycle, state is STATE_READY return 0x0
[ 5.577267] [nandsim] warning: read_byte: unexpected data output cycle, state is STATE_READY return 0x0
[ 5.578759] [nandsim] warning: read_byte: unexpected data output cycle, state is STATE_READY return 0x0
[ 5.580241] [nandsim] warning: read_byte: unexpected data output cycle, state is STATE_READY return 0x0
[ 5.581727] [nandsim] warning: read_byte: unexpected data output cycle, state is STATE_READY return 0x0
[ 5.583206] [nandsim] warning: read_byte: unexpected data output cycle, state is STATE_READY return 0x0
[ 5.584700] nand: device found, Manufacturer ID: 0x98, Chip ID: 0x39
[ 5.585730] nand: Toshiba NAND 128MiB 1,8V 8-bit
[ 5.586474] nand: 128 MiB, SLC, erase size: 16 KiB, page size: 512, OOB size: 16
[ 5.588284] flash size: 128 MiB
[ 5.588800] page size: 512 bytes
[ 5.589327] OOB area size: 16 bytes
[ 5.589887] sector size: 16 KiB
[ 5.591573] pages number: 262144


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Rong, Chen


Attachments:
(No filename) (11.55 kB)
config-4.19.0-rc1-00175-g6ef69a3 (132.76 kB)
job-script (3.96 kB)
dmesg.xz (31.46 kB)
Download all attachments

2018-09-07 04:00:04

by Chen, Rong A

[permalink] [raw]
Subject: [LKP] [blkcg] c02c58dab2: WARNING:at_block/blk-throttle.c:#blk_throtl_bio

FYI, we noticed the following commit (built with gcc-6):

commit: c02c58dab2480ec45dc43e1e10970d763e6b7f1f ("[PATCH 06/15] blkcg: always associate a bio with a blkg")
url: https://github.com/0day-ci/linux/commits/Dennis-Zhou/blkcg-ref-count-refactor-cleanup-blkcg-avg_lat/20180831-161742
base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 1G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+------------------------------------------------------------------+------------+------------+
| | 1a3eeea831 | c02c58dab2 |
+------------------------------------------------------------------+------------+------------+
| boot_successes | 6 | 0 |
| boot_failures | 6 | 16 |
| invoked_oom-killer:gfp_mask=0x | 4 | 6 |
| Mem-Info | 5 | 9 |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 4 | 6 |
| Out_of_memory:Kill_process | 2 | 3 |
| WARNING:at_block/blk-throttle.c:#blk_throtl_bio | 0 | 10 |
| RIP:blk_throtl_bio | 0 | 10 |
+------------------------------------------------------------------+------------+------------+



[ 120.023103] WARNING: CPU: 1 PID: 1 at block/blk-throttle.c:2149 blk_throtl_bio+0xdaf/0x2490
[ 120.051033] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc1-00168-gc02c58d #1
[ 120.074200] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 120.100912] RIP: 0010:blk_throtl_bio+0xdaf/0x2490
[ 120.114291] Code: 08 84 d2 0f 85 1a 13 00 00 66 41 81 4f 14 00 02 e9 ed f3 ff ff 48 83 05 ce 31 ff 05 01 e9 f3 fb ff ff 48 83 05 39 45 ff 05 01 <0f> 0b 48 83 05 37 45 ff 05 01 e9 25 f3 ff ff 49 8d bf 28 02 00 00
[ 120.167531] RSP: 0000:ffff880030e36fd0 EFLAGS: 00010202
[ 120.184191] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 120.204216] RDX: 0000000000000000 RSI: ffffffffa72f04c0 RDI: 0000000000000206
[ 120.224215] RBP: ffff880030e370a0 R08: 0000000002384946 R09: 00000000023a2d90
[ 120.247539] R10: ffffed00062bc4fa R11: ffff8800315e27d3 R12: ffff880030f60150
[ 120.267576] R13: ffff88001c7a5b00 R14: ffff880030794400 R15: ffff880030f60140
[ 120.287560] FS: 0000000000000000(0000) GS:ffff880031400000(0000) knlGS:0000000000000000
[ 120.314198] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 120.330963] CR2: 0000000000000000 CR3: 000000000fc6d000 CR4: 00000000000006a0
[ 120.350859] Call Trace:
[ 120.360930] ? bio_associate_create_blkg+0x30d/0x950
[ 120.374222] ? reacquire_held_locks+0x400/0x400
[ 120.387606] ? bio_associate_blkg+0x240/0x240
[ 120.404166] ? do_raw_spin_unlock+0x16a/0x2d0
[ 120.417570] ? _raw_spin_unlock+0x5c/0xa0
[ 120.430924] generic_make_request_checks+0x8fa/0x14b0
[ 120.444270] ? percpu_ref_put_many+0x1c0/0x1c0
[ 120.460957] ? kasan_check_write+0x24/0x30
[ 120.474243] ? sched_clock_local+0x99/0x1c0
[ 120.487567] generic_make_request+0x237/0xdf0
[ 120.500949] ? sched_clock_cpu+0x20c/0x2a0
[ 120.514268] ? blk_plug_queued_count+0x180/0x180
[ 120.530925] ? debug_smp_processor_id+0x1f/0x30
[ 120.544243] submit_bio+0x2a0/0x410
[ 120.557564] ? submit_bio+0x2a0/0x410
[ 120.568470] ? lock_acquire+0x112/0x1d0
[ 120.580903] ? guard_bio_eod+0xb0/0x420
[ 120.590936] ? direct_make_request+0x240/0x240
[ 120.607501] ? guard_bio_eod+0x1b6/0x420
[ 120.617592] ? bio_add_page+0xd0/0x100
[ 120.630913] submit_bh_wbc+0x526/0x840
[ 120.640978] ? unlock_buffer+0x40/0x40
[ 120.654304] block_read_full_page+0x807/0xba0
[ 120.667591] ? bh_submit_read+0x240/0x240
[ 120.673290] ? create_page_buffers+0x210/0x210
[ 120.691014] ? add_to_page_cache_locked+0x20/0x20
[ 120.707532] ? alloc_page_interleave+0x139/0x1b0
[ 120.724230] ? __next_node_in+0x59/0x70
[ 120.734304] blkdev_readpage+0x1b/0x30
[ 120.747597] do_read_cache_page+0x795/0x1150
[ 120.760923] ? blkdev_writepages+0x20/0x20
[ 120.774234] ? kasan_unpoison_shadow+0x3d/0x60
[ 120.787530] ? preempt_count_add+0x159/0x210
[ 120.800913] ? pagecache_get_page+0x6f0/0x6f0
[ 120.814162] ? __this_cpu_preempt_check+0x1b/0x30
[ 120.830921] ? kasan_unpoison_shadow+0x3d/0x60
[ 120.844270] ? kasan_alloc_pages+0x40/0x50
[ 120.857909] ? get_page_from_freelist+0x2023/0x33b0
[ 120.870909] read_cache_page+0x53/0x90
[ 120.884132] read_dev_sector+0xc8/0x2c0
[ 120.897509] ? set_info+0x110/0x110
[ 120.907509] msdos_partition+0x231/0x2610
[ 120.920894] ? memcpy+0x6d/0x80
[ 120.930915] ? vsnprintf+0x96f/0x1cf0
[ 120.944176] ? set_info+0x110/0x110
[ 120.984325] ? snprintf+0x8f/0xb0
[ 120.997559] ? snprintf+0x8f/0xb0
[ 121.007498] ? vscnprintf+0x40/0x40
[ 121.017520] ? __next_node_in+0x59/0x70
[ 121.030968] ? set_info+0x110/0x110
[ 121.044221] ? set_info+0x110/0x110
[ 121.057506] check_partition+0x3db/0x7d0
[ 121.070920] rescan_partitions+0x192/0xac0
[ 121.080900] ? __might_sleep+0xad/0x1e0
[ 121.094193] ? bd_set_size+0x305/0x3c0
[ 121.107505] __blkdev_get+0x8c3/0x13b0
[ 121.117547] ? bd_set_size+0x3c0/0x3c0
[ 121.130907] ? debug_smp_processor_id+0x1f/0x30
[ 121.144195] blkdev_get+0x41c/0x9f0
[ 121.157679] ? refcount_sub_and_test_checked+0x100/0x1e0
[ 121.170883] ? __blkdev_get+0x13b0/0x13b0
[ 121.184233] ? do_raw_spin_unlock+0x16a/0x2d0
[ 121.197551] ? refcount_dec_and_test_checked+0x19/0x30
[ 121.214214] ? kobject_put+0x61/0x5a0
[ 121.227913] __device_add_disk+0xfc6/0x12d0
[ 121.241719] ? lock_acquire+0x112/0x1d0
[ 121.254191] ? bdget_disk+0xb0/0xb0
[ 121.291132] ? lockdep_init_map+0x11/0x20
[ 121.304263] ? lockdep_init_map+0x11/0x20
[ 121.317606] ? __raw_spin_lock_init+0x3d/0x120
[ 121.331152] ? device_initialize+0x2d3/0x3e0
[ 121.344254] device_add_disk+0x16/0x20
[ 121.357510] null_add_dev+0xcd5/0x1fe0
[ 121.370892] null_init+0x4cb/0x6c6
[ 121.380903] ? pkt_init+0x578/0x578
[ 121.394231] do_one_initcall+0x191/0x3f3
[ 121.407482] ? start_kernel+0xa52/0xa52
[ 121.417522] ? kasan_unpoison_shadow+0x3d/0x60
[ 121.434217] kernel_init_freeable+0x52f/0x6bb
[ 121.447500] ? rest_init+0x1a0/0x1a0
[ 121.457489] kernel_init+0x16/0x220
[ 121.470867] ? rest_init+0x1a0/0x1a0
[ 121.480914] ret_from_fork+0x1f/0x30
[ 121.494348] ---[ end trace bb75a6ff13d6153b ]---


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Rong, Chen


Attachments:
(No filename) (7.11 kB)
config-4.19.0-rc1-00168-gc02c58d (119.98 kB)
job-script (4.01 kB)
dmesg.xz (19.66 kB)
trinity (31.07 kB)
Download all attachments

2018-09-11 02:37:14

by Chen, Rong A

[permalink] [raw]
Subject: [LKP] [blkcg] 22f657e287: general_protection_fault:#[##]

FYI, we noticed the following commit (built with gcc-7):

commit: 22f657e2876612270ad346b7f5ba2493ba434d41 ("[PATCH 12/15] blkcg: cleanup and make blk_get_rl use blkg_lookup_create")
url: https://github.com/0day-ci/linux/commits/Dennis-Zhou/blkcg-ref-count-refactor-cleanup-blkcg-avg_lat/20180831-161742
base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -cpu Haswell,+smep,+smap -smp 2 -m 512M

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+------------------------------------------------------------------+------------+------------+
| | f743a58719 | 22f657e287 |
+------------------------------------------------------------------+------------+------------+
| boot_successes | 3 | 0 |
| boot_failures | 10 | 16 |
| invoked_oom-killer:gfp_mask=0x | 6 | 6 |
| Mem-Info | 6 | 6 |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 6 | 6 |
| IP-Config:Auto-configuration_of_network_failed | 4 | |
| general_protection_fault:#[##] | 0 | 10 |
| RIP:get_request | 0 | 10 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 10 |
+------------------------------------------------------------------+------------+------------+



[ 93.607840] SCSI Media Changer driver v0.25
[ 93.667470] scsi host0: scsi_debug: version 0188 [20180128]
[ 93.667470] dev_size_mb=8, opts=0x0, submit_queues=1, statistics=0
[ 93.756552] kasan: CONFIG_KASAN_INLINE enabled
[ 93.766196] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ 93.766196] general protection fault: 0000 [#1] PREEMPT KASAN
[ 93.766196] CPU: 0 PID: 27 Comm: kworker/u2:1 Not tainted 4.19.0-rc1-00174-g22f657e #1
[ 93.766196] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 93.766196] Workqueue: events_unbound async_run_entry_fn
[ 93.766196] RIP: 0010:get_request+0x11f/0xe24
[ 93.766196] Code: 83 b8 f0 00 00 00 00 74 02 0f 0b e8 6b 78 46 ff 48 8b 44 24 10 48 8d 78 60 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 74 05 e8 6d 16 63 ff 48 8b 44 24 10 48 bd 00 00 00 00
[ 93.766196] RSP: 0000:ffff880016c07850 EFLAGS: 00010006
[ 93.766196] RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: 0000000000000008
[ 93.766196] RDX: 000000000000000c RSI: 0000000000000020 RDI: 0000000000000060
[ 93.766196] RBP: ffff88001463b390 R08: 0000000000600000 R09: ffffed0002d80f0f
[ 93.766196] R10: 0000000000000000 R11: ffff880016c07877 R12: 0000000000600000
[ 93.766196] R13: 0000000000000000 R14: 0000000000000020 R15: ffff880014639540
[ 93.766196] FS: 0000000000000000(0000) GS:ffffffff8427e000(0000) knlGS:0000000000000000
[ 93.766196] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 93.766196] CR2: 0000000000000000 CR3: 000000000422c001 CR4: 00000000000206b0
[ 93.766196] Call Trace:
[ 93.766196] ? blk_rq_init+0x27c/0x27c
[ 93.766196] ? blk_exit_rl+0x55/0x55
[ 93.766196] ? __wake_up_common_lock+0x140/0x140
[ 93.766196] ? tracer_preempt_on+0x16/0x25
[ 93.766196] ? preempt_count_sub+0x12d/0x136
[ 93.766196] ? task_unlock+0xa/0x1a
[ 93.766196] ? create_task_io_context+0x2c7/0x2cf
[ 93.766196] blk_get_request+0x14d/0x277
[ 93.766196] __scsi_execute+0x67/0x466
[ 93.766196] scsi_probe_and_add_lun+0x399/0x1d14
[ 93.766196] ? rpm_resume+0xad5/0xb05
[ 93.766196] ? scsi_sanitize_inquiry_string+0x77/0x77
[ 93.766196] ? rpm_put_suppliers+0x10e/0x10e
[ 93.766196] ? scsi_target_reap_ref_release+0x6a/0x6a
[ 93.766196] ? tracer_preempt_on+0x16/0x25
[ 93.766196] ? preempt_count_sub+0x12d/0x136
[ 93.766196] __scsi_scan_target+0x130/0x6af
[ 93.766196] ? __free_pages+0x3c/0x3c
[ 93.766196] ? scsi_probe_and_add_lun+0x1d14/0x1d14
[ 93.766196] ? rpm_resume+0xad5/0xb05
[ 93.766196] ? rpm_put_suppliers+0x10e/0x10e
[ 93.766196] ? __switch_to_asm+0x30/0x60
[ 93.766196] ? ___might_sleep+0xac/0x33e
[ 93.766196] scsi_scan_channel+0xcb/0xe8
[ 93.766196] scsi_scan_host_selected+0x1ca/0x201
[ 93.766196] ? do_scsi_scan_host+0x18a/0x18a
[ 93.766196] do_scan_async+0x3e/0x2ff
[ 93.766196] ? do_scsi_scan_host+0x18a/0x18a
[ 93.766196] async_run_entry_fn+0x1c5/0x33c
[ 93.766196] process_one_work+0x4c0/0x6cd
[ 93.766196] ? preempt_count_sub+0x12d/0x136
[ 93.766196] worker_thread+0x4b3/0x610
[ 93.766196] ? __kthread_parkme+0x9f/0x148
[ 93.766196] kthread+0x2c5/0x2d4
[ 93.766196] ? process_scheduled_works+0x6d/0x6d
[ 93.766196] ? __kthread_cancel_work+0x16b/0x16b
[ 93.766196] ret_from_fork+0x35/0x40
[ 93.766196] ---[ end trace a8869917661828b0 ]---
[ 93.766196] RIP: 0010:get_request+0x11f/0xe24
[ 93.766196] Code: 83 b8 f0 00 00 00 00 74 02 0f 0b e8 6b 78 46 ff 48 8b 44 24 10 48 8d 78 60 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 74 05 e8 6d 16 63 ff 48 8b 44 24 10 48 bd 00 00 00 00
[ 93.766196] RSP: 0000:ffff880016c07850 EFLAGS: 00010006
[ 93.766196] RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: 0000000000000008
[ 93.766196] RDX: 000000000000000c RSI: 0000000000000020 RDI: 0000000000000060
[ 93.766196] RBP: ffff88001463b390 R08: 0000000000600000 R09: ffffed0002d80f0f
[ 93.766196] R10: 0000000000000000 R11: ffff880016c07877 R12: 0000000000600000
[ 93.766196] R13: 0000000000000000 R14: 0000000000000020 R15: ffff880014639540
[ 93.766196] FS: 0000000000000000(0000) GS:ffffffff8427e000(0000) knlGS:0000000000000000
[ 93.766196] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 93.766196] CR2: 0000000000000000 CR3: 000000000422c001 CR4: 00000000000206b0
[ 93.766196] Kernel panic - not syncing: Fatal exception
[ 93.766196] Kernel Offset: disabled

Elapsed time: 100

#!/bin/bash


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Rong, Chen


Attachments:
(No filename) (6.63 kB)
config-4.19.0-rc1-00174-g22f657e (116.92 kB)
job-script (3.77 kB)
dmesg.xz (10.59 kB)
Download all attachments