2017-09-13 21:02:06

by Shaohua Li

[permalink] [raw]
Subject: [PATCH V2 0/4] block: make loop block device cgroup aware

From: Shaohua Li <[email protected]>

Hi,

The IO dispatched to under layer disk by loop block device isn't cloned from
original bio, so the IO loses cgroup information of original bio. These IO
escapes from cgroup control. The patches try to address this issue. The idea is
quite generic, but we currently only make it work for blkcg.

Thanks,
Shaohua

V1->V2:
- Address a couple of issues pointed out by Tejun

Shaohua Li (4):
kthread: add a mechanism to store cgroup info
blkcg: delete unused APIs
block: make blkcg aware of kthread stored original cgroup info
block/loop: make loop cgroup aware

block/bio.c | 31 -------------------------------
drivers/block/loop.c | 13 +++++++++++++
drivers/block/loop.h | 1 +
include/linux/bio.h | 2 --
include/linux/blk-cgroup.h | 25 +++++++------------------
include/linux/kthread.h | 11 +++++++++++
kernel/kthread.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
7 files changed, 75 insertions(+), 52 deletions(-)

--
2.9.5


2017-09-13 21:02:12

by Shaohua Li

[permalink] [raw]
Subject: [PATCH V2 4/4] block/loop: make loop cgroup aware

From: Shaohua Li <[email protected]>

loop block device handles IO in a separate thread. The actual IO
dispatched isn't cloned from the IO loop device received, so the
dispatched IO loses the cgroup context.

I'm ignoring buffer IO case now, which is quite complicated. Making the
loop thread aware cgroup context doesn't really help. The loop device
only writes to a single file. In current writeback cgroup
implementation, the file can only belong to one cgroup.

For direct IO case, we could workaround the issue in theory. For
example, say we assign cgroup1 5M/s BW for loop device and cgroup2
10M/s. We can create a special cgroup for loop thread and assign at
least 15M/s for the underlayer disk. In this way, we correctly throttle
the two cgroups. But this is tricky to setup.

This patch tries to address the issue. We record bio's css in loop
command. When loop thread is handling the command, we then use the API
provided in patch 1 to set the css for current task. The bio layer will
use the css for new IO (from patch 3).

Signed-off-by: Shaohua Li <[email protected]>
---
drivers/block/loop.c | 13 +++++++++++++
drivers/block/loop.h | 1 +
2 files changed, 14 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 8934e25..37c4da7 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -479,6 +479,8 @@ static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2)
{
struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb);

+ if (cmd->css)
+ css_put(cmd->css);
cmd->ret = ret > 0 ? 0 : ret;
lo_rw_aio_do_completion(cmd);
}
@@ -538,6 +540,8 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
cmd->iocb.ki_filp = file;
cmd->iocb.ki_complete = lo_rw_aio_complete;
cmd->iocb.ki_flags = IOCB_DIRECT;
+ if (cmd->css)
+ kthread_associate_blkcg(cmd->css);

if (rw == WRITE)
ret = call_write_iter(file, &cmd->iocb, &iter);
@@ -545,6 +549,7 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
ret = call_read_iter(file, &cmd->iocb, &iter);

lo_rw_aio_do_completion(cmd);
+ kthread_associate_blkcg(NULL);

if (ret != -EIOCBQUEUED)
cmd->iocb.ki_complete(&cmd->iocb, ret, 0);
@@ -1689,6 +1694,14 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
break;
}

+ /* always use the first bio's css */
+#ifdef CONFIG_CGROUPS
+ if (cmd->use_aio && cmd->rq->bio && cmd->rq->bio->bi_css) {
+ cmd->css = cmd->rq->bio->bi_css;
+ css_get(cmd->css);
+ } else
+#endif
+ cmd->css = NULL;
kthread_queue_work(&lo->worker, &cmd->work);

return BLK_STS_OK;
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index f68c1d5..d93b669 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -74,6 +74,7 @@ struct loop_cmd {
long ret;
struct kiocb iocb;
struct bio_vec *bvec;
+ struct cgroup_subsys_state *css;
};

/* Support for loadable transfer modules */
--
2.9.5

2017-09-13 21:02:30

by Shaohua Li

[permalink] [raw]
Subject: [PATCH V2 3/4] block: make blkcg aware of kthread stored original cgroup info

From: Shaohua Li <[email protected]>

bio_blkcg is the only API to get cgroup info for a bio right now. If
bio_blkcg finds current task is a kthread and has original blkcg
associated, it will use the css instead of associating the bio to
current task. This makes it possible that kthread dispatches bios on
behalf of other threads.

Signed-off-by: Shaohua Li <[email protected]>
---
include/linux/blk-cgroup.h | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 0cfa8d2..f57e54d 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -19,6 +19,7 @@
#include <linux/radix-tree.h>
#include <linux/blkdev.h>
#include <linux/atomic.h>
+#include <linux/kthread.h>

/* percpu_counter batch for blkg_[rw]stats, per-cpu drift doesn't matter */
#define BLKG_STAT_CPU_BATCH (INT_MAX / 2)
@@ -223,16 +224,16 @@ static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css)
return css ? container_of(css, struct blkcg, css) : NULL;
}

-static inline struct blkcg *task_blkcg(struct task_struct *tsk)
-{
- return css_to_blkcg(task_css(tsk, io_cgrp_id));
-}
-
static inline struct blkcg *bio_blkcg(struct bio *bio)
{
+ struct cgroup_subsys_state *css;
+
if (bio && bio->bi_css)
return css_to_blkcg(bio->bi_css);
- return task_blkcg(current);
+ css = kthread_blkcg();
+ if (css)
+ return css_to_blkcg(css);
+ return css_to_blkcg(task_css(current, io_cgrp_id));
}

/**
--
2.9.5

2017-09-13 21:02:57

by Shaohua Li

[permalink] [raw]
Subject: [PATCH V2 1/4] kthread: add a mechanism to store cgroup info

From: Shaohua Li <[email protected]>

kthread usually runs jobs on behalf of other threads. The jobs should be
charged to cgroup of original threads. But the jobs run in a kthread,
where we lose the cgroup context of original threads. The patch adds a
machanism to record cgroup info of original threads in kthread context.
Later we can retrieve the cgroup info and attach the cgroup info to jobs.

Since this mechanism is only required by kthread, we store the cgroup
info in kthread data instead of generic task_struct.

Signed-off-by: Shaohua Li <[email protected]>
---
include/linux/kthread.h | 11 +++++++++++
kernel/kthread.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 82e197e..bd4369c 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -3,6 +3,7 @@
/* Simple interface for creating and stopping kernel threads without mess. */
#include <linux/err.h>
#include <linux/sched.h>
+#include <linux/cgroup.h>

__printf(4, 5)
struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
@@ -198,4 +199,14 @@ bool kthread_cancel_delayed_work_sync(struct kthread_delayed_work *work);

void kthread_destroy_worker(struct kthread_worker *worker);

+#ifdef CONFIG_CGROUPS
+void kthread_associate_blkcg(struct cgroup_subsys_state *css);
+struct cgroup_subsys_state *kthread_blkcg(void);
+#else
+static inline void kthread_associate_blkcg(struct cgroup_subsys_state *css) { }
+static inline struct cgroup_subsys_state *kthread_blkcg(void)
+{
+ return NULL;
+}
+#endif
#endif /* _LINUX_KTHREAD_H */
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 26db528..3107eee 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -20,7 +20,6 @@
#include <linux/freezer.h>
#include <linux/ptrace.h>
#include <linux/uaccess.h>
-#include <linux/cgroup.h>
#include <trace/events/sched.h>

static DEFINE_SPINLOCK(kthread_create_lock);
@@ -47,6 +46,7 @@ struct kthread {
void *data;
struct completion parked;
struct completion exited;
+ struct cgroup_subsys_state *blkcg_css;
};

enum KTHREAD_BITS {
@@ -1153,3 +1153,45 @@ void kthread_destroy_worker(struct kthread_worker *worker)
kfree(worker);
}
EXPORT_SYMBOL(kthread_destroy_worker);
+
+#ifdef CONFIG_CGROUPS
+/**
+ * kthread_associate_blkcg - associate blkcg to current kthread
+ * @css: the cgroup info
+ *
+ * Current thread must be a kthread. The thread is running jobs on behalf of
+ * other threads. In some cases, we expect the jobs attach cgroup info of
+ * original threads instead of that of current thread. This function stores
+ * original thread's cgroup info in current kthread context for later
+ * retrieval.
+ */
+void kthread_associate_blkcg(struct cgroup_subsys_state *css)
+{
+ struct kthread *kthread;
+
+ if (!(current->flags & PF_KTHREAD) || !current->set_child_tid)
+ return;
+ kthread = to_kthread(current);
+ if (css) {
+ css_get(css);
+ kthread->blkcg_css = css;
+ } else if (kthread->blkcg_css) {
+ css_put(kthread->blkcg_css);
+ kthread->blkcg_css = NULL;
+ }
+}
+EXPORT_SYMBOL(kthread_associate_blkcg);
+
+/**
+ * kthread_blkcg - get associated blkcg css of current kthread
+ *
+ * Current thread must be a kthread.
+ */
+struct cgroup_subsys_state *kthread_blkcg(void)
+{
+ if ((current->flags & PF_KTHREAD) && current->set_child_tid)
+ return to_kthread(current)->blkcg_css;
+ return NULL;
+}
+EXPORT_SYMBOL(kthread_blkcg);
+#endif
--
2.9.5

2017-09-13 21:02:56

by Shaohua Li

[permalink] [raw]
Subject: [PATCH V2 2/4] blkcg: delete unused APIs

From: Shaohua Li <[email protected]>

Nobody uses the APIs right now.

Signed-off-by: Shaohua Li <[email protected]>
---
block/bio.c | 31 -------------------------------
include/linux/bio.h | 2 --
include/linux/blk-cgroup.h | 12 ------------
3 files changed, 45 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 6745759..9271fa3 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -2033,37 +2033,6 @@ int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css)
EXPORT_SYMBOL_GPL(bio_associate_blkcg);

/**
- * bio_associate_current - associate a bio with %current
- * @bio: target bio
- *
- * Associate @bio with %current if it hasn't been associated yet. Block
- * layer will treat @bio as if it were issued by %current no matter which
- * task actually issues it.
- *
- * This function takes an extra reference of @task's io_context and blkcg
- * which will be put when @bio is released. The caller must own @bio,
- * ensure %current->io_context exists, and is responsible for synchronizing
- * calls to this function.
- */
-int bio_associate_current(struct bio *bio)
-{
- struct io_context *ioc;
-
- if (bio->bi_css)
- return -EBUSY;
-
- ioc = current->io_context;
- if (!ioc)
- return -ENOENT;
-
- get_io_context_active(ioc);
- bio->bi_ioc = ioc;
- bio->bi_css = task_get_css(current, io_cgrp_id);
- return 0;
-}
-EXPORT_SYMBOL_GPL(bio_associate_current);
-
-/**
* bio_disassociate_task - undo bio_associate_current()
* @bio: target bio
*/
diff --git a/include/linux/bio.h b/include/linux/bio.h
index a8fe793..d795cdd 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -514,13 +514,11 @@ do { \

#ifdef CONFIG_BLK_CGROUP
int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css);
-int bio_associate_current(struct bio *bio);
void bio_disassociate_task(struct bio *bio);
void bio_clone_blkcg_association(struct bio *dst, struct bio *src);
#else /* CONFIG_BLK_CGROUP */
static inline int bio_associate_blkcg(struct bio *bio,
struct cgroup_subsys_state *blkcg_css) { return 0; }
-static inline int bio_associate_current(struct bio *bio) { return -ENOENT; }
static inline void bio_disassociate_task(struct bio *bio) { }
static inline void bio_clone_blkcg_association(struct bio *dst,
struct bio *src) { }
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 9d92153..0cfa8d2 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -235,12 +235,6 @@ static inline struct blkcg *bio_blkcg(struct bio *bio)
return task_blkcg(current);
}

-static inline struct cgroup_subsys_state *
-task_get_blkcg_css(struct task_struct *task)
-{
- return task_get_css(task, io_cgrp_id);
-}
-
/**
* blkcg_parent - get the parent of a blkcg
* @blkcg: blkcg of interest
@@ -735,12 +729,6 @@ struct blkcg_policy {

#define blkcg_root_css ((struct cgroup_subsys_state *)ERR_PTR(-EINVAL))

-static inline struct cgroup_subsys_state *
-task_get_blkcg_css(struct task_struct *task)
-{
- return NULL;
-}
-
#ifdef CONFIG_BLOCK

static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, void *key) { return NULL; }
--
2.9.5

2017-09-13 21:38:30

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH V2 1/4] kthread: add a mechanism to store cgroup info

Hello,

On Wed, Sep 13, 2017 at 02:01:26PM -0700, Shaohua Li wrote:
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 26db528..3107eee 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -20,7 +20,6 @@
> #include <linux/freezer.h>
> #include <linux/ptrace.h>
> #include <linux/uaccess.h>
> -#include <linux/cgroup.h>
> #include <trace/events/sched.h>
>
> static DEFINE_SPINLOCK(kthread_create_lock);
> @@ -47,6 +46,7 @@ struct kthread {
> void *data;
> struct completion parked;
> struct completion exited;

maybe #ifdef CONFIG_CGROUPS?

> + struct cgroup_subsys_state *blkcg_css;
> };
...
> +void kthread_associate_blkcg(struct cgroup_subsys_state *css)
> +{
> + struct kthread *kthread;
> +
> + if (!(current->flags & PF_KTHREAD) || !current->set_child_tid)
> + return;
> + kthread = to_kthread(current);
> + if (css) {
> + css_get(css);
> + kthread->blkcg_css = css;
> + } else if (kthread->blkcg_css) {
> + css_put(kthread->blkcg_css);
> + kthread->blkcg_css = NULL;
> + }
> +}
> +EXPORT_SYMBOL(kthread_associate_blkcg);

Maybe doing sth like the following is less error-prone?

kthread_associate_blkcg(@css)
{
if (current's kthread->blkcg_css)
put kthread->blkcg_css and set it to NULL;
if (@css)
get @css and set kthread->blkcg;
}

Thanks.

--
tejun

2017-09-13 21:40:27

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH V2 2/4] blkcg: delete unused APIs

On Wed, Sep 13, 2017 at 02:01:27PM -0700, Shaohua Li wrote:
> From: Shaohua Li <[email protected]>
>
> Nobody uses the APIs right now.
>
> Signed-off-by: Shaohua Li <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2017-09-13 21:42:09

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH V2 3/4] block: make blkcg aware of kthread stored original cgroup info

On Wed, Sep 13, 2017 at 02:01:28PM -0700, Shaohua Li wrote:
> From: Shaohua Li <[email protected]>
>
> bio_blkcg is the only API to get cgroup info for a bio right now. If
> bio_blkcg finds current task is a kthread and has original blkcg
> associated, it will use the css instead of associating the bio to
> current task. This makes it possible that kthread dispatches bios on
> behalf of other threads.
>
> Signed-off-by: Shaohua Li <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2017-09-13 21:42:35

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH V2 4/4] block/loop: make loop cgroup aware

On Wed, Sep 13, 2017 at 02:01:29PM -0700, Shaohua Li wrote:
> From: Shaohua Li <[email protected]>
>
> loop block device handles IO in a separate thread. The actual IO
> dispatched isn't cloned from the IO loop device received, so the
> dispatched IO loses the cgroup context.
>
> I'm ignoring buffer IO case now, which is quite complicated. Making the
> loop thread aware cgroup context doesn't really help. The loop device
> only writes to a single file. In current writeback cgroup
> implementation, the file can only belong to one cgroup.
>
> For direct IO case, we could workaround the issue in theory. For
> example, say we assign cgroup1 5M/s BW for loop device and cgroup2
> 10M/s. We can create a special cgroup for loop thread and assign at
> least 15M/s for the underlayer disk. In this way, we correctly throttle
> the two cgroups. But this is tricky to setup.
>
> This patch tries to address the issue. We record bio's css in loop
> command. When loop thread is handling the command, we then use the API
> provided in patch 1 to set the css for current task. The bio layer will
> use the css for new IO (from patch 3).
>
> Signed-off-by: Shaohua Li <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2017-09-13 21:43:39

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH V2 1/4] kthread: add a mechanism to store cgroup info

On Wed, Sep 13, 2017 at 02:38:20PM -0700, Tejun Heo wrote:
> Maybe doing sth like the following is less error-prone?
>
> kthread_associate_blkcg(@css)
> {
> if (current's kthread->blkcg_css)
> put kthread->blkcg_css and set it to NULL;
> if (@css)
> get @css and set kthread->blkcg;
> }

Ooh, also, maybe add a WARN_ON_ONCE on non-NULL blkcg_css on kthread
exit?

Thanks.

--
tejun

2017-09-13 22:30:08

by Shaohua Li

[permalink] [raw]
Subject: Re: [PATCH V2 1/4] kthread: add a mechanism to store cgroup info

On Wed, Sep 13, 2017 at 02:38:20PM -0700, Tejun Heo wrote:
> Hello,
>
> On Wed, Sep 13, 2017 at 02:01:26PM -0700, Shaohua Li wrote:
> > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > index 26db528..3107eee 100644
> > --- a/kernel/kthread.c
> > +++ b/kernel/kthread.c
> > @@ -20,7 +20,6 @@
> > #include <linux/freezer.h>
> > #include <linux/ptrace.h>
> > #include <linux/uaccess.h>
> > -#include <linux/cgroup.h>
> > #include <trace/events/sched.h>
> >
> > static DEFINE_SPINLOCK(kthread_create_lock);
> > @@ -47,6 +46,7 @@ struct kthread {
> > void *data;
> > struct completion parked;
> > struct completion exited;
>
> maybe #ifdef CONFIG_CGROUPS?
>
> > + struct cgroup_subsys_state *blkcg_css;

Ah, I thought cgroup_subsys_state is always defined, let me double check.

> > };
> ...
> > +void kthread_associate_blkcg(struct cgroup_subsys_state *css)
> > +{
> > + struct kthread *kthread;
> > +
> > + if (!(current->flags & PF_KTHREAD) || !current->set_child_tid)
> > + return;
> > + kthread = to_kthread(current);
> > + if (css) {
> > + css_get(css);
> > + kthread->blkcg_css = css;
> > + } else if (kthread->blkcg_css) {
> > + css_put(kthread->blkcg_css);
> > + kthread->blkcg_css = NULL;
> > + }
> > +}
> > +EXPORT_SYMBOL(kthread_associate_blkcg);
>
> Maybe doing sth like the following is less error-prone?
>
> kthread_associate_blkcg(@css)
> {
> if (current's kthread->blkcg_css)
> put kthread->blkcg_css and set it to NULL;
> if (@css)
> get @css and set kthread->blkcg;
> }

Sounds good.

Thanks,
Shaohua