This is v2 of blkcg_policy methods cleanup patchset. Changes from the
last take [L] are
* Rebased on top of v4.2-rc1 and other pending patches. v4.2-rc1
added a patch to add per-blkcg policy data which is allocated by
blkcg core as per-blkcg_gq policy data was before this patchset
making the two inconsistent. I'll follow up with per-blkcg policy
data update.
This patchset contains assorted cleanups for blkcg_policy methods.
* alloc/free are added. exit is dropped.
* blk-throttle's async percpu allocation is replaced with direct
allocation.
* all methods now take blkcg_policy_data instead of blkcg_gq.
Nothing too controversial.
Jens, as we're at the beginning of the 4.2 merge window, I'll ping /
repost this patchset along with other pending patchsets once -rc1
drops.
This patchset contains the following seven patches.
0001-blkcg-remove-unnecessary-request_list-blkg-NULL-test.patch
0002-blkcg-use-blkg_free-in-blkcg_init_queue-failure-path.patch
0003-blkcg-make-blkcg_activate_policy-allow-NULL-pd_init_.patch
0004-blkcg-replace-blkcg_policy-pd_size-with-pd_alloc-fre.patch
0005-blk-throttle-remove-asynchrnous-percpu-stats-allocat.patch
0006-blk-throttle-clean-up-blkg_policy_data-alloc-init-ex.patch
0007-blkcg-make-blkcg_policy-methods-take-a-pointer-to-bl.patch
0001-0003 are misc cleanups. 0004-0006 add alloc/free methods and
remove blk-throttle's async percpu allocation mechanism. 0007 makes
all methods take blkcg_policy_data.
This patchset is also available in the following git branch.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-blkcg-methods-cleanup
and is on top of
d770e558e219 ("Linux 4.2-rc1")
+ [1] [PATCHSET block/for-4.3] writeback: cgroup writeback updates
+ [2] [PATCHSET v2 block/for-4.3] block, cgroup: make cfq charge async IOs to the appropriate blkcgs
diffstat follows, thanks.
block/blk-cgroup.c | 52 +++++--------
block/blk-throttle.c | 173 +++++++++++++--------------------------------
block/cfq-iosched.c | 38 +++++++--
include/linux/blk-cgroup.h | 31 +++-----
4 files changed, 116 insertions(+), 178 deletions(-)
--
tejun
[L] http://lkml.kernel.org/g/[email protected]
[1] http://lkml.kernel.org/g/[email protected]
[2] http://lkml.kernel.org/g/[email protected]
Since ec13b1d6f0a0 ("blkcg: always create the blkcg_gq for the root
blkcg"), a request_list always has its blkg associated. Drop
unnecessary rl->blkg NULL test from blk_put_rl().
Signed-off-by: Tejun Heo <[email protected]>
Cc: Vivek Goyal <[email protected]>
---
include/linux/blk-cgroup.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 58cfab8..2e41f8c 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -399,8 +399,7 @@ static inline struct request_list *blk_get_rl(struct request_queue *q,
*/
static inline void blk_put_rl(struct request_list *rl)
{
- /* root_rl may not have blkg set */
- if (rl->blkg && rl->blkg->blkcg != &blkcg_root)
+ if (rl->blkg->blkcg != &blkcg_root)
blkg_put(rl->blkg);
}
--
2.4.3
When blkcg_init_queue() fails midway after creating a new blkg, it
performs kfree() directly; however, this doesn't free the policy data
areas. Make it use blkg_free() instead. In turn, blkg_free() is
updated to handle root request_list special case.
While this fixes a possible memory leak, it's on an unlikely failure
path of an already cold path and the size leaked per occurrence is
miniscule too. I don't think it needs to be tagged for -stable.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Vivek Goyal <[email protected]>
---
block/blk-cgroup.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index dacd95d..73209a8 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -60,7 +60,8 @@ static void blkg_free(struct blkcg_gq *blkg)
for (i = 0; i < BLKCG_MAX_POLS; i++)
kfree(blkg->pd[i]);
- blk_exit_rl(&blkg->rl);
+ if (blkg->blkcg != &blkcg_root)
+ blk_exit_rl(&blkg->rl);
kfree(blkg);
}
@@ -923,7 +924,7 @@ int blkcg_init_queue(struct request_queue *q)
radix_tree_preload_end();
if (IS_ERR(blkg)) {
- kfree(new_blkg);
+ blkg_free(new_blkg);
return PTR_ERR(blkg);
}
--
2.4.3
blkg_create() allows NULL ->pd_init_fn() but blkcg_activate_policy()
doesn't. As both in-kernel policies implement ->pd_init_fn, it
currently doesn't break anything. Update blkcg_activate_policy() so
that its behavior is consistent with blkg_create().
Signed-off-by: Tejun Heo <[email protected]>
Cc: Vivek Goyal <[email protected]>
---
block/blk-cgroup.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 73209a8..e16d1dc 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1112,7 +1112,8 @@ int blkcg_activate_policy(struct request_queue *q,
blkg->pd[pol->plid] = pd;
pd->blkg = blkg;
pd->plid = pol->plid;
- pol->pd_init_fn(blkg);
+ if (pol->pd_init_fn)
+ pol->pd_init_fn(blkg);
spin_unlock(&blkg->blkcg->lock);
}
--
2.4.3
A blkg (blkcg_gq) represents the relationship between a cgroup and
request_queue. Each active policy has a pd (blkg_policy_data) on each
blkg. The pd's were allocated by blkcg core and each policy could
request to allocate extra space at the end by setting
blkcg_policy->pd_size larger than the size of pd.
This is a bit unusual but was done this way mostly to simplify error
handling and all the existing use cases could be handled this way;
however, this is becoming too restrictive now that percpu memory can
be allocated without blocking.
This introduces two new mandatory blkcg_policy methods - pd_alloc_fn()
and pd_free_fn() - which are used to allocate and release pd for a
given policy. As pd allocation is now done from policy side, it can
simply allocate a larger area which embeds pd at the beginning. This
change makes ->pd_size pointless. Removed.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Vivek Goyal <[email protected]>
---
block/blk-cgroup.c | 19 ++++++++++---------
block/blk-throttle.c | 13 ++++++++++++-
block/cfq-iosched.c | 13 ++++++++++++-
include/linux/blk-cgroup.h | 18 +++++++++---------
4 files changed, 43 insertions(+), 20 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index e16d1dc..0bf76e7 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -58,7 +58,8 @@ static void blkg_free(struct blkcg_gq *blkg)
return;
for (i = 0; i < BLKCG_MAX_POLS; i++)
- kfree(blkg->pd[i]);
+ if (blkg->pd[i])
+ blkcg_policy[i]->pd_free_fn(blkg->pd[i]);
if (blkg->blkcg != &blkcg_root)
blk_exit_rl(&blkg->rl);
@@ -104,7 +105,7 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q,
continue;
/* alloc per-policy data and attach it to blkg */
- pd = kzalloc_node(pol->pd_size, gfp_mask, q->node);
+ pd = pol->pd_alloc_fn(gfp_mask, q->node);
if (!pd)
goto err_free;
@@ -1059,7 +1060,7 @@ int blkcg_activate_policy(struct request_queue *q,
* for all existing blkgs.
*/
while (cnt--) {
- pd = kzalloc_node(pol->pd_size, GFP_KERNEL, q->node);
+ pd = pol->pd_alloc_fn(GFP_KERNEL, q->node);
if (!pd) {
ret = -ENOMEM;
goto out_free;
@@ -1125,7 +1126,7 @@ int blkcg_activate_policy(struct request_queue *q,
out_free:
blk_queue_bypass_end(q);
list_for_each_entry_safe(pd, nd, &pds, alloc_node)
- kfree(pd);
+ pol->pd_free_fn(pd);
list_for_each_entry_safe(cpd, cnd, &cpds, alloc_node)
kfree(cpd);
return ret;
@@ -1162,8 +1163,11 @@ void blkcg_deactivate_policy(struct request_queue *q,
if (pol->pd_exit_fn)
pol->pd_exit_fn(blkg);
- kfree(blkg->pd[pol->plid]);
- blkg->pd[pol->plid] = NULL;
+ if (blkg->pd[pol->plid]) {
+ pol->pd_free_fn(blkg->pd[pol->plid]);
+ blkg->pd[pol->plid] = NULL;
+ }
+
kfree(blkg->blkcg->pd[pol->plid]);
blkg->blkcg->pd[pol->plid] = NULL;
@@ -1186,9 +1190,6 @@ int blkcg_policy_register(struct blkcg_policy *pol)
{
int i, ret;
- if (WARN_ON(pol->pd_size < sizeof(struct blkg_policy_data)))
- return -EINVAL;
-
mutex_lock(&blkcg_pol_mutex);
/* find an empty slot */
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index b231935..f1dd691 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -403,6 +403,11 @@ static void throtl_service_queue_exit(struct throtl_service_queue *sq)
del_timer_sync(&sq->pending_timer);
}
+static struct blkg_policy_data *throtl_pd_alloc(gfp_t gfp, int node)
+{
+ return kzalloc_node(sizeof(struct throtl_grp), gfp, node);
+}
+
static void throtl_pd_init(struct blkcg_gq *blkg)
{
struct throtl_grp *tg = blkg_to_tg(blkg);
@@ -493,6 +498,11 @@ static void throtl_pd_exit(struct blkcg_gq *blkg)
throtl_service_queue_exit(&tg->service_queue);
}
+static void throtl_pd_free(struct blkg_policy_data *pd)
+{
+ kfree(pd);
+}
+
static void throtl_pd_reset_stats(struct blkcg_gq *blkg)
{
struct throtl_grp *tg = blkg_to_tg(blkg);
@@ -1468,12 +1478,13 @@ static void throtl_shutdown_wq(struct request_queue *q)
}
static struct blkcg_policy blkcg_policy_throtl = {
- .pd_size = sizeof(struct throtl_grp),
.cftypes = throtl_files,
+ .pd_alloc_fn = throtl_pd_alloc,
.pd_init_fn = throtl_pd_init,
.pd_online_fn = throtl_pd_online,
.pd_exit_fn = throtl_pd_exit,
+ .pd_free_fn = throtl_pd_free,
.pd_reset_stats_fn = throtl_pd_reset_stats,
};
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 9c9ec7c..69ce288 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1582,6 +1582,11 @@ static void cfq_cpd_init(const struct blkcg *blkcg)
}
}
+static struct blkg_policy_data *cfq_pd_alloc(gfp_t gfp, int node)
+{
+ return kzalloc_node(sizeof(struct cfq_group), gfp, node);
+}
+
static void cfq_pd_init(struct blkcg_gq *blkg)
{
struct cfq_group *cfqg = blkg_to_cfqg(blkg);
@@ -1618,6 +1623,11 @@ static void cfq_pd_offline(struct blkcg_gq *blkg)
cfqg_stats_xfer_dead(cfqg);
}
+static void cfq_pd_free(struct blkg_policy_data *pd)
+{
+ return kfree(pd);
+}
+
/* offset delta from cfqg->stats to cfqg->dead_stats */
static const int dead_stats_off_delta = offsetof(struct cfq_group, dead_stats) -
offsetof(struct cfq_group, stats);
@@ -4633,13 +4643,14 @@ static struct elevator_type iosched_cfq = {
#ifdef CONFIG_CFQ_GROUP_IOSCHED
static struct blkcg_policy blkcg_policy_cfq = {
- .pd_size = sizeof(struct cfq_group),
.cpd_size = sizeof(struct cfq_group_data),
.cftypes = cfq_blkcg_files,
.cpd_init_fn = cfq_cpd_init,
+ .pd_alloc_fn = cfq_pd_alloc,
.pd_init_fn = cfq_pd_init,
.pd_offline_fn = cfq_pd_offline,
+ .pd_free_fn = cfq_pd_free,
.pd_reset_stats_fn = cfq_pd_reset_stats,
};
#endif
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 2e41f8c..a4d0998 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -67,13 +67,11 @@ struct blkg_rwstat {
* request_queue (q). This is used by blkcg policies which need to track
* information per blkcg - q pair.
*
- * There can be multiple active blkcg policies and each has its private
- * data on each blkg, the size of which is determined by
- * blkcg_policy->pd_size. blkcg core allocates and frees such areas
- * together with blkg and invokes pd_init/exit_fn() methods.
- *
- * Such private data must embed struct blkg_policy_data (pd) at the
- * beginning and pd_size can't be smaller than pd.
+ * There can be multiple active blkcg policies and each blkg:policy pair is
+ * represented by a blkg_policy_data which is allocated and freed by each
+ * policy's pd_alloc/free_fn() methods. A policy can allocate private data
+ * area by allocating larger data structure which embeds blkg_policy_data
+ * at the beginning.
*/
struct blkg_policy_data {
/* the blkg and policy id this per-policy data belongs to */
@@ -134,16 +132,16 @@ struct blkcg_gq {
};
typedef void (blkcg_pol_init_cpd_fn)(const struct blkcg *blkcg);
+typedef struct blkg_policy_data *(blkcg_pol_alloc_pd_fn)(gfp_t gfp, int node);
typedef void (blkcg_pol_init_pd_fn)(struct blkcg_gq *blkg);
typedef void (blkcg_pol_online_pd_fn)(struct blkcg_gq *blkg);
typedef void (blkcg_pol_offline_pd_fn)(struct blkcg_gq *blkg);
typedef void (blkcg_pol_exit_pd_fn)(struct blkcg_gq *blkg);
+typedef void (blkcg_pol_free_pd_fn)(struct blkg_policy_data *pd);
typedef void (blkcg_pol_reset_pd_stats_fn)(struct blkcg_gq *blkg);
struct blkcg_policy {
int plid;
- /* policy specific private data size */
- size_t pd_size;
/* policy specific per-blkcg data size */
size_t cpd_size;
/* cgroup files for the policy */
@@ -151,10 +149,12 @@ struct blkcg_policy {
/* operations */
blkcg_pol_init_cpd_fn *cpd_init_fn;
+ blkcg_pol_alloc_pd_fn *pd_alloc_fn;
blkcg_pol_init_pd_fn *pd_init_fn;
blkcg_pol_online_pd_fn *pd_online_fn;
blkcg_pol_offline_pd_fn *pd_offline_fn;
blkcg_pol_exit_pd_fn *pd_exit_fn;
+ blkcg_pol_free_pd_fn *pd_free_fn;
blkcg_pol_reset_pd_stats_fn *pd_reset_stats_fn;
};
--
2.4.3
Because percpu allocator couldn't do non-blocking allocations,
blk-throttle was forced to implement an ad-hoc asynchronous allocation
mechanism for its percpu stats for cases where blkg's (blkcg_gq's) are
allocated from an IO path without sleepable context.
Now that percpu allocator can handle gfp_mask and blkg_policy_data
alloc / free are handled by policy methods, the ad-hoc asynchronous
allocation mechanism can be replaced with direct allocation from
tg_stats_alloc_fn(). Rit it out.
This ensures that an active throtl_grp always has valid non-NULL
->stats_cpu. Remove checks on it.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Vivek Goyal <[email protected]>
---
block/blk-throttle.c | 112 ++++++++++++---------------------------------------
1 file changed, 25 insertions(+), 87 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index f1dd691..3c86976 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -144,9 +144,6 @@ struct throtl_grp {
/* Per cpu stats pointer */
struct tg_stats_cpu __percpu *stats_cpu;
-
- /* List of tgs waiting for per cpu stats memory to be allocated */
- struct list_head stats_alloc_node;
};
struct throtl_data
@@ -168,13 +165,6 @@ struct throtl_data
struct work_struct dispatch_work;
};
-/* list and work item to allocate percpu group stats */
-static DEFINE_SPINLOCK(tg_stats_alloc_lock);
-static LIST_HEAD(tg_stats_alloc_list);
-
-static void tg_stats_alloc_fn(struct work_struct *);
-static DECLARE_DELAYED_WORK(tg_stats_alloc_work, tg_stats_alloc_fn);
-
static void throtl_pending_timer_fn(unsigned long arg);
static inline struct throtl_grp *pd_to_tg(struct blkg_policy_data *pd)
@@ -256,53 +246,6 @@ static struct throtl_data *sq_to_td(struct throtl_service_queue *sq)
} \
} while (0)
-static void tg_stats_init(struct tg_stats_cpu *tg_stats)
-{
- blkg_rwstat_init(&tg_stats->service_bytes);
- blkg_rwstat_init(&tg_stats->serviced);
-}
-
-/*
- * Worker for allocating per cpu stat for tgs. This is scheduled on the
- * system_wq once there are some groups on the alloc_list waiting for
- * allocation.
- */
-static void tg_stats_alloc_fn(struct work_struct *work)
-{
- static struct tg_stats_cpu *stats_cpu; /* this fn is non-reentrant */
- struct delayed_work *dwork = to_delayed_work(work);
- bool empty = false;
-
-alloc_stats:
- if (!stats_cpu) {
- int cpu;
-
- stats_cpu = alloc_percpu(struct tg_stats_cpu);
- if (!stats_cpu) {
- /* allocation failed, try again after some time */
- schedule_delayed_work(dwork, msecs_to_jiffies(10));
- return;
- }
- for_each_possible_cpu(cpu)
- tg_stats_init(per_cpu_ptr(stats_cpu, cpu));
- }
-
- spin_lock_irq(&tg_stats_alloc_lock);
-
- if (!list_empty(&tg_stats_alloc_list)) {
- struct throtl_grp *tg = list_first_entry(&tg_stats_alloc_list,
- struct throtl_grp,
- stats_alloc_node);
- swap(tg->stats_cpu, stats_cpu);
- list_del_init(&tg->stats_alloc_node);
- }
-
- empty = list_empty(&tg_stats_alloc_list);
- spin_unlock_irq(&tg_stats_alloc_lock);
- if (!empty)
- goto alloc_stats;
-}
-
static void throtl_qnode_init(struct throtl_qnode *qn, struct throtl_grp *tg)
{
INIT_LIST_HEAD(&qn->node);
@@ -405,7 +348,27 @@ static void throtl_service_queue_exit(struct throtl_service_queue *sq)
static struct blkg_policy_data *throtl_pd_alloc(gfp_t gfp, int node)
{
- return kzalloc_node(sizeof(struct throtl_grp), gfp, node);
+ struct throtl_grp *tg;
+ int cpu;
+
+ tg = kzalloc_node(sizeof(*tg), gfp, node);
+ if (!tg)
+ return NULL;
+
+ tg->stats_cpu = alloc_percpu_gfp(struct tg_stats_cpu, gfp);
+ if (!tg->stats_cpu) {
+ kfree(tg);
+ return NULL;
+ }
+
+ for_each_possible_cpu(cpu) {
+ struct tg_stats_cpu *stats_cpu = per_cpu_ptr(tg->stats_cpu, cpu);
+
+ blkg_rwstat_init(&stats_cpu->service_bytes);
+ blkg_rwstat_init(&stats_cpu->serviced);
+ }
+
+ return &tg->pd;
}
static void throtl_pd_init(struct blkcg_gq *blkg)
@@ -413,7 +376,6 @@ static void throtl_pd_init(struct blkcg_gq *blkg)
struct throtl_grp *tg = blkg_to_tg(blkg);
struct throtl_data *td = blkg->q->td;
struct throtl_service_queue *parent_sq;
- unsigned long flags;
int rw;
/*
@@ -448,16 +410,6 @@ static void throtl_pd_init(struct blkcg_gq *blkg)
tg->bps[WRITE] = -1;
tg->iops[READ] = -1;
tg->iops[WRITE] = -1;
-
- /*
- * Ugh... We need to perform per-cpu allocation for tg->stats_cpu
- * but percpu allocator can't be called from IO path. Queue tg on
- * tg_stats_alloc_list and allocate from work item.
- */
- spin_lock_irqsave(&tg_stats_alloc_lock, flags);
- list_add(&tg->stats_alloc_node, &tg_stats_alloc_list);
- schedule_delayed_work(&tg_stats_alloc_work, 0);
- spin_unlock_irqrestore(&tg_stats_alloc_lock, flags);
}
/*
@@ -487,20 +439,16 @@ static void throtl_pd_online(struct blkcg_gq *blkg)
static void throtl_pd_exit(struct blkcg_gq *blkg)
{
struct throtl_grp *tg = blkg_to_tg(blkg);
- unsigned long flags;
-
- spin_lock_irqsave(&tg_stats_alloc_lock, flags);
- list_del_init(&tg->stats_alloc_node);
- spin_unlock_irqrestore(&tg_stats_alloc_lock, flags);
-
- free_percpu(tg->stats_cpu);
throtl_service_queue_exit(&tg->service_queue);
}
static void throtl_pd_free(struct blkg_policy_data *pd)
{
- kfree(pd);
+ struct throtl_grp *tg = pd_to_tg(pd);
+
+ free_percpu(tg->stats_cpu);
+ kfree(tg);
}
static void throtl_pd_reset_stats(struct blkcg_gq *blkg)
@@ -508,9 +456,6 @@ static void throtl_pd_reset_stats(struct blkcg_gq *blkg)
struct throtl_grp *tg = blkg_to_tg(blkg);
int cpu;
- if (tg->stats_cpu == NULL)
- return;
-
for_each_possible_cpu(cpu) {
struct tg_stats_cpu *sc = per_cpu_ptr(tg->stats_cpu, cpu);
@@ -973,10 +918,6 @@ static void throtl_update_dispatch_stats(struct blkcg_gq *blkg, u64 bytes,
struct tg_stats_cpu *stats_cpu;
unsigned long flags;
- /* If per cpu stats are not allocated yet, don't do any accounting. */
- if (tg->stats_cpu == NULL)
- return;
-
/*
* Disabling interrupts to provide mutual exclusion between two
* writes on same cpu. It probably is not needed for 64bit. Not
@@ -1302,9 +1243,6 @@ static u64 tg_prfill_cpu_rwstat(struct seq_file *sf,
struct blkg_rwstat rwstat = { }, tmp;
int i, cpu;
- if (tg->stats_cpu == NULL)
- return 0;
-
for_each_possible_cpu(cpu) {
struct tg_stats_cpu *sc = per_cpu_ptr(tg->stats_cpu, cpu);
--
2.4.3
With the recent addition of alloc and free methods, things became
messier. This patch reorganizes them according to the followings.
* ->pd_alloc_fn()
Responsible for allocation and static initializations - the ones
which can be done independent of where the pd might be attached.
* ->pd_init_fn()
Initializations which require the knowledge of where the pd is
attached.
* ->pd_free_fn()
The counter part of pd_alloc_fn(). Static de-init and freeing.
This leaves ->pd_exit_fn() without any users. Removed.
While at it, collapse an one liner function throtl_pd_exit(), which
has only one user, into its user.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Vivek Goyal <[email protected]>
---
block/blk-cgroup.c | 11 ---------
block/blk-throttle.c | 57 ++++++++++++++++------------------------------
block/cfq-iosched.c | 15 ++++++++----
include/linux/blk-cgroup.h | 2 --
4 files changed, 31 insertions(+), 54 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 0bf76e7..9180735 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -392,15 +392,6 @@ static void blkg_destroy_all(struct request_queue *q)
void __blkg_release_rcu(struct rcu_head *rcu_head)
{
struct blkcg_gq *blkg = container_of(rcu_head, struct blkcg_gq, rcu_head);
- int i;
-
- /* tell policies that this one is being freed */
- for (i = 0; i < BLKCG_MAX_POLS; i++) {
- struct blkcg_policy *pol = blkcg_policy[i];
-
- if (blkg->pd[i] && pol->pd_exit_fn)
- pol->pd_exit_fn(blkg);
- }
/* release the blkcg and parent blkg refs this blkg has been holding */
css_put(&blkg->blkcg->css);
@@ -1160,8 +1151,6 @@ void blkcg_deactivate_policy(struct request_queue *q,
if (pol->pd_offline_fn)
pol->pd_offline_fn(blkg);
- if (pol->pd_exit_fn)
- pol->pd_exit_fn(blkg);
if (blkg->pd[pol->plid]) {
pol->pd_free_fn(blkg->pd[pol->plid]);
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 3c86976..c3a235b 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -330,26 +330,19 @@ static struct bio *throtl_pop_queued(struct list_head *queued,
}
/* init a service_queue, assumes the caller zeroed it */
-static void throtl_service_queue_init(struct throtl_service_queue *sq,
- struct throtl_service_queue *parent_sq)
+static void throtl_service_queue_init(struct throtl_service_queue *sq)
{
INIT_LIST_HEAD(&sq->queued[0]);
INIT_LIST_HEAD(&sq->queued[1]);
sq->pending_tree = RB_ROOT;
- sq->parent_sq = parent_sq;
setup_timer(&sq->pending_timer, throtl_pending_timer_fn,
(unsigned long)sq);
}
-static void throtl_service_queue_exit(struct throtl_service_queue *sq)
-{
- del_timer_sync(&sq->pending_timer);
-}
-
static struct blkg_policy_data *throtl_pd_alloc(gfp_t gfp, int node)
{
struct throtl_grp *tg;
- int cpu;
+ int rw, cpu;
tg = kzalloc_node(sizeof(*tg), gfp, node);
if (!tg)
@@ -361,6 +354,19 @@ static struct blkg_policy_data *throtl_pd_alloc(gfp_t gfp, int node)
return NULL;
}
+ throtl_service_queue_init(&tg->service_queue);
+
+ for (rw = READ; rw <= WRITE; rw++) {
+ throtl_qnode_init(&tg->qnode_on_self[rw], tg);
+ throtl_qnode_init(&tg->qnode_on_parent[rw], tg);
+ }
+
+ RB_CLEAR_NODE(&tg->rb_node);
+ tg->bps[READ] = -1;
+ tg->bps[WRITE] = -1;
+ tg->iops[READ] = -1;
+ tg->iops[WRITE] = -1;
+
for_each_possible_cpu(cpu) {
struct tg_stats_cpu *stats_cpu = per_cpu_ptr(tg->stats_cpu, cpu);
@@ -375,8 +381,7 @@ static void throtl_pd_init(struct blkcg_gq *blkg)
{
struct throtl_grp *tg = blkg_to_tg(blkg);
struct throtl_data *td = blkg->q->td;
- struct throtl_service_queue *parent_sq;
- int rw;
+ struct throtl_service_queue *sq = &tg->service_queue;
/*
* If on the default hierarchy, we switch to properly hierarchical
@@ -391,25 +396,10 @@ static void throtl_pd_init(struct blkcg_gq *blkg)
* Limits of a group don't interact with limits of other groups
* regardless of the position of the group in the hierarchy.
*/
- parent_sq = &td->service_queue;
-
+ sq->parent_sq = &td->service_queue;
if (cgroup_on_dfl(blkg->blkcg->css.cgroup) && blkg->parent)
- parent_sq = &blkg_to_tg(blkg->parent)->service_queue;
-
- throtl_service_queue_init(&tg->service_queue, parent_sq);
-
- for (rw = READ; rw <= WRITE; rw++) {
- throtl_qnode_init(&tg->qnode_on_self[rw], tg);
- throtl_qnode_init(&tg->qnode_on_parent[rw], tg);
- }
-
- RB_CLEAR_NODE(&tg->rb_node);
+ sq->parent_sq = &blkg_to_tg(blkg->parent)->service_queue;
tg->td = td;
-
- tg->bps[READ] = -1;
- tg->bps[WRITE] = -1;
- tg->iops[READ] = -1;
- tg->iops[WRITE] = -1;
}
/*
@@ -436,17 +426,11 @@ static void throtl_pd_online(struct blkcg_gq *blkg)
tg_update_has_rules(blkg_to_tg(blkg));
}
-static void throtl_pd_exit(struct blkcg_gq *blkg)
-{
- struct throtl_grp *tg = blkg_to_tg(blkg);
-
- throtl_service_queue_exit(&tg->service_queue);
-}
-
static void throtl_pd_free(struct blkg_policy_data *pd)
{
struct throtl_grp *tg = pd_to_tg(pd);
+ del_timer_sync(&tg->service_queue.pending_timer);
free_percpu(tg->stats_cpu);
kfree(tg);
}
@@ -1421,7 +1405,6 @@ static struct blkcg_policy blkcg_policy_throtl = {
.pd_alloc_fn = throtl_pd_alloc,
.pd_init_fn = throtl_pd_init,
.pd_online_fn = throtl_pd_online,
- .pd_exit_fn = throtl_pd_exit,
.pd_free_fn = throtl_pd_free,
.pd_reset_stats_fn = throtl_pd_reset_stats,
};
@@ -1616,7 +1599,7 @@ int blk_throtl_init(struct request_queue *q)
return -ENOMEM;
INIT_WORK(&td->dispatch_work, blk_throtl_dispatch_work_fn);
- throtl_service_queue_init(&td->service_queue, NULL);
+ throtl_service_queue_init(&td->service_queue);
q->td = td;
td->queue = q;
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 69ce288..4b795c7 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1584,7 +1584,17 @@ static void cfq_cpd_init(const struct blkcg *blkcg)
static struct blkg_policy_data *cfq_pd_alloc(gfp_t gfp, int node)
{
- return kzalloc_node(sizeof(struct cfq_group), gfp, node);
+ struct cfq_group *cfqg;
+
+ cfqg = kzalloc_node(sizeof(*cfqg), gfp, node);
+ if (!cfqg)
+ return NULL;
+
+ cfq_init_cfqg_base(cfqg);
+ cfqg_stats_init(&cfqg->stats);
+ cfqg_stats_init(&cfqg->dead_stats);
+
+ return &cfqg->pd;
}
static void cfq_pd_init(struct blkcg_gq *blkg)
@@ -1592,11 +1602,8 @@ static void cfq_pd_init(struct blkcg_gq *blkg)
struct cfq_group *cfqg = blkg_to_cfqg(blkg);
struct cfq_group_data *cgd = blkcg_to_cfqgd(blkg->blkcg);
- cfq_init_cfqg_base(cfqg);
cfqg->weight = cgd->weight;
cfqg->leaf_weight = cgd->leaf_weight;
- cfqg_stats_init(&cfqg->stats);
- cfqg_stats_init(&cfqg->dead_stats);
}
static void cfq_pd_offline(struct blkcg_gq *blkg)
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index a4d0998..06c0297 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -136,7 +136,6 @@ typedef struct blkg_policy_data *(blkcg_pol_alloc_pd_fn)(gfp_t gfp, int node);
typedef void (blkcg_pol_init_pd_fn)(struct blkcg_gq *blkg);
typedef void (blkcg_pol_online_pd_fn)(struct blkcg_gq *blkg);
typedef void (blkcg_pol_offline_pd_fn)(struct blkcg_gq *blkg);
-typedef void (blkcg_pol_exit_pd_fn)(struct blkcg_gq *blkg);
typedef void (blkcg_pol_free_pd_fn)(struct blkg_policy_data *pd);
typedef void (blkcg_pol_reset_pd_stats_fn)(struct blkcg_gq *blkg);
@@ -153,7 +152,6 @@ struct blkcg_policy {
blkcg_pol_init_pd_fn *pd_init_fn;
blkcg_pol_online_pd_fn *pd_online_fn;
blkcg_pol_offline_pd_fn *pd_offline_fn;
- blkcg_pol_exit_pd_fn *pd_exit_fn;
blkcg_pol_free_pd_fn *pd_free_fn;
blkcg_pol_reset_pd_stats_fn *pd_reset_stats_fn;
};
--
2.4.3
The newly added ->pd_alloc_fn() and ->pd_free_fn() deal with pd
(blkg_policy_data) while the older ones use blkg (blkcg_gq). As using
blkg doesn't make sense for ->pd_alloc_fn() and after allocation pd
can always be mapped to blkg and given that these are policy-specific
methods, it makes sense to converge on pd.
This patch makes all methods deal with pd instead of blkg. Most
conversions are trivial. In blk-cgroup.c, a couple method invocation
sites now test whether pd exists instead of policy state for
consistency. This shouldn't cause any behavioral differences.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Vivek Goyal <[email protected]>
---
block/blk-cgroup.c | 18 ++++++++----------
block/blk-throttle.c | 13 +++++++------
block/cfq-iosched.c | 14 +++++++-------
include/linux/blk-cgroup.h | 8 ++++----
4 files changed, 26 insertions(+), 27 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 9180735..92c2a96 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -232,7 +232,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
struct blkcg_policy *pol = blkcg_policy[i];
if (blkg->pd[i] && pol->pd_init_fn)
- pol->pd_init_fn(blkg);
+ pol->pd_init_fn(blkg->pd[i]);
}
/* insert */
@@ -246,7 +246,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
struct blkcg_policy *pol = blkcg_policy[i];
if (blkg->pd[i] && pol->pd_online_fn)
- pol->pd_online_fn(blkg);
+ pol->pd_online_fn(blkg->pd[i]);
}
}
blkg->online = true;
@@ -337,7 +337,7 @@ static void blkg_destroy(struct blkcg_gq *blkg)
struct blkcg_policy *pol = blkcg_policy[i];
if (blkg->pd[i] && pol->pd_offline_fn)
- pol->pd_offline_fn(blkg);
+ pol->pd_offline_fn(blkg->pd[i]);
}
blkg->online = false;
@@ -471,9 +471,8 @@ static int blkcg_reset_stats(struct cgroup_subsys_state *css,
for (i = 0; i < BLKCG_MAX_POLS; i++) {
struct blkcg_policy *pol = blkcg_policy[i];
- if (blkcg_policy_enabled(blkg->q, pol) &&
- pol->pd_reset_stats_fn)
- pol->pd_reset_stats_fn(blkg);
+ if (blkg->pd[i] && pol->pd_reset_stats_fn)
+ pol->pd_reset_stats_fn(blkg->pd[i]);
}
}
@@ -1105,7 +1104,7 @@ int blkcg_activate_policy(struct request_queue *q,
pd->blkg = blkg;
pd->plid = pol->plid;
if (pol->pd_init_fn)
- pol->pd_init_fn(blkg);
+ pol->pd_init_fn(pd);
spin_unlock(&blkg->blkcg->lock);
}
@@ -1149,10 +1148,9 @@ void blkcg_deactivate_policy(struct request_queue *q,
/* grab blkcg lock too while removing @pd from @blkg */
spin_lock(&blkg->blkcg->lock);
- if (pol->pd_offline_fn)
- pol->pd_offline_fn(blkg);
-
if (blkg->pd[pol->plid]) {
+ if (pol->pd_offline_fn)
+ pol->pd_offline_fn(blkg->pd[pol->plid]);
pol->pd_free_fn(blkg->pd[pol->plid]);
blkg->pd[pol->plid] = NULL;
}
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index c3a235b..c2c7547 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -377,9 +377,10 @@ static struct blkg_policy_data *throtl_pd_alloc(gfp_t gfp, int node)
return &tg->pd;
}
-static void throtl_pd_init(struct blkcg_gq *blkg)
+static void throtl_pd_init(struct blkg_policy_data *pd)
{
- struct throtl_grp *tg = blkg_to_tg(blkg);
+ struct throtl_grp *tg = pd_to_tg(pd);
+ struct blkcg_gq *blkg = tg_to_blkg(tg);
struct throtl_data *td = blkg->q->td;
struct throtl_service_queue *sq = &tg->service_queue;
@@ -417,13 +418,13 @@ static void tg_update_has_rules(struct throtl_grp *tg)
(tg->bps[rw] != -1 || tg->iops[rw] != -1);
}
-static void throtl_pd_online(struct blkcg_gq *blkg)
+static void throtl_pd_online(struct blkg_policy_data *pd)
{
/*
* We don't want new groups to escape the limits of its ancestors.
* Update has_rules[] after a new group is brought online.
*/
- tg_update_has_rules(blkg_to_tg(blkg));
+ tg_update_has_rules(pd_to_tg(pd));
}
static void throtl_pd_free(struct blkg_policy_data *pd)
@@ -435,9 +436,9 @@ static void throtl_pd_free(struct blkg_policy_data *pd)
kfree(tg);
}
-static void throtl_pd_reset_stats(struct blkcg_gq *blkg)
+static void throtl_pd_reset_stats(struct blkg_policy_data *pd)
{
- struct throtl_grp *tg = blkg_to_tg(blkg);
+ struct throtl_grp *tg = pd_to_tg(pd);
int cpu;
for_each_possible_cpu(cpu) {
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 4b795c7..95e6b0c 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1597,18 +1597,18 @@ static struct blkg_policy_data *cfq_pd_alloc(gfp_t gfp, int node)
return &cfqg->pd;
}
-static void cfq_pd_init(struct blkcg_gq *blkg)
+static void cfq_pd_init(struct blkg_policy_data *pd)
{
- struct cfq_group *cfqg = blkg_to_cfqg(blkg);
- struct cfq_group_data *cgd = blkcg_to_cfqgd(blkg->blkcg);
+ struct cfq_group *cfqg = pd_to_cfqg(pd);
+ struct cfq_group_data *cgd = blkcg_to_cfqgd(pd->blkg->blkcg);
cfqg->weight = cgd->weight;
cfqg->leaf_weight = cgd->leaf_weight;
}
-static void cfq_pd_offline(struct blkcg_gq *blkg)
+static void cfq_pd_offline(struct blkg_policy_data *pd)
{
- struct cfq_group *cfqg = blkg_to_cfqg(blkg);
+ struct cfq_group *cfqg = pd_to_cfqg(pd);
int i;
for (i = 0; i < IOPRIO_BE_NR; i++) {
@@ -1661,9 +1661,9 @@ static struct blkg_rwstat cfqg_rwstat_pd_recursive_sum(struct blkg_policy_data *
return a;
}
-static void cfq_pd_reset_stats(struct blkcg_gq *blkg)
+static void cfq_pd_reset_stats(struct blkg_policy_data *pd)
{
- struct cfq_group *cfqg = blkg_to_cfqg(blkg);
+ struct cfq_group *cfqg = pd_to_cfqg(pd);
cfqg_stats_reset(&cfqg->stats);
cfqg_stats_reset(&cfqg->dead_stats);
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 06c0297..2db2c62 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -133,11 +133,11 @@ struct blkcg_gq {
typedef void (blkcg_pol_init_cpd_fn)(const struct blkcg *blkcg);
typedef struct blkg_policy_data *(blkcg_pol_alloc_pd_fn)(gfp_t gfp, int node);
-typedef void (blkcg_pol_init_pd_fn)(struct blkcg_gq *blkg);
-typedef void (blkcg_pol_online_pd_fn)(struct blkcg_gq *blkg);
-typedef void (blkcg_pol_offline_pd_fn)(struct blkcg_gq *blkg);
+typedef void (blkcg_pol_init_pd_fn)(struct blkg_policy_data *pd);
+typedef void (blkcg_pol_online_pd_fn)(struct blkg_policy_data *pd);
+typedef void (blkcg_pol_offline_pd_fn)(struct blkg_policy_data *pd);
typedef void (blkcg_pol_free_pd_fn)(struct blkg_policy_data *pd);
-typedef void (blkcg_pol_reset_pd_stats_fn)(struct blkcg_gq *blkg);
+typedef void (blkcg_pol_reset_pd_stats_fn)(struct blkg_policy_data *pd);
struct blkcg_policy {
int plid;
--
2.4.3
On Tue, Jul 07, 2015 at 11:51:26AM -0400, Tejun Heo wrote:
> This is v2 of blkcg_policy methods cleanup patchset. Changes from the
> last take [L] are
>
> * Rebased on top of v4.2-rc1 and other pending patches. v4.2-rc1
> added a patch to add per-blkcg policy data which is allocated by
> blkcg core as per-blkcg_gq policy data was before this patchset
> making the two inconsistent. I'll follow up with per-blkcg policy
> data update.
While updating per-blkcg policy data handling, I discovered a couple
bugs in policy data handling which should be backported through
-stable. I'll put those earlier in the series and repost this
patchset. Please ignore this one for now.
Thanks.
--
tejun
Hi Tejun,
Il 07/07/2015 22:00, Tejun Heo ha scritto:
> On Tue, Jul 07, 2015 at 11:51:26AM -0400, Tejun Heo wrote:
>> This is v2 of blkcg_policy methods cleanup patchset. Changes from the
>> last take [L] are
>>
>> * Rebased on top of v4.2-rc1 and other pending patches. v4.2-rc1
>> added a patch to add per-blkcg policy data which is allocated by
>> blkcg core as per-blkcg_gq policy data was before this patchset
>> making the two inconsistent. I'll follow up with per-blkcg policy
>> data update.
>
> While updating per-blkcg policy data handling, I discovered a couple
> bugs in policy data handling which should be backported through
> -stable. I'll put those earlier in the series and repost this
> patchset. Please ignore this one for now.
>
I think Jens just applied a bugfix for per-blkcg policy data handling. Please
have a look at https://lkml.org/lkml/2015/7/6/950. Should have Cc'd you, sorry
for that.
Thanks,
Arianna
Hello, Arianna.
On Tue, Jul 07, 2015 at 10:31:14PM +0200, Arianna Avanzini wrote:
> I think Jens just applied a bugfix for per-blkcg policy data handling.
> Please have a look at https://lkml.org/lkml/2015/7/6/950. Should have Cc'd
> you, sorry for that.
Yeah, that too but there were a couple more. Just posted patches to
fix them. It'd be great if you can go over those.
Thanks!
--
tejun