2008-11-12 08:19:50

by Satoshi UCHIDA

[permalink] [raw]
Subject: [PATCH][RFC][12+2][v3] A expanded CFQ scheduler for cgroups


This patchset expands traditional CFQ scheduler in order to support cgroups,
and improves old version.

Improvements are as following.

* Modularizing our new CFQ scheduler.
The expanded CFQ scheduler is registered/unregistered as new I/O
elevator scheduler called "cfq-cgroups". By this, the traditional CFQ
scheduler, which does not handle cgroups, and our new CFQ scheduler, which
handles cgroups, can be used at the same time for different devices.

* Allowing to set parameter per device.
The expanded CFQ scheduler allows users to set parameter per device.
By this, users can decide share (priority) per device.

--- Optional functions ---

* Adding a validation flag for 'think time'. (Opt-1 patch)
CFQ show poor scalability. One of its causes is the think time.
The think time is used to improve the I/O performance by handling queues
with poor I/O as IDLE class. However, when many tasks have I/O requests,
think time for their tasks became long and then all queues are handled as
IDLE class. As a result, dispatching I/O requests is dispersed, and then
the I/O performance falls. The think time valid flag controls think time
judgment.

* Adding ioprio class for cgroups. (Opt-2 patch)
The previous expanded CFQ scheduler can not implement ioprio class.
This optional patch implements its proto-type. This patch gives a basic
service tree control for ioprio class of cgroups and does not give preempt
function, completed function and so on yet.



1. Introduction.

This patchset introduce "Yet Another" I/O bandwidth controlling
subsystem for cgroups based on CFQ (called 2 layer CFQ).

The idea of 2 layer CFQ is to build fairness control per group on the top of
existing CFQ control.
We added a new data structure called CFQ driver data on the top of
cfqd in order to control I/O bandwidth for cgroups.
CFQ driver data control cfq_datas by service tree (rb-tree) and
CFQ algorithm when synchronous I/O.
An active cfqd controls queue for cfq by service tree.
Namely, the CFQ meta-data control traditional CFQ data.
the CFQ data runs conventionally.

cfqdd cfqdd (cfqmd = cfq driver data)
| |
cfqc -- cfqd ----- cfqd (cfqd = cfq data,
| | cfqc = cfq cgroup data)
cfqc --[cfqd]----- cfqd
^
|
conventional control.

This patchset is against 2.6.28-rc2


2. Build

i. Apply this patchset (series 01 - 12) to kernel 2.6.28-rc2.

If you want to use optional functions, apply opt-1/opt-2 patches
to kernel 2.6.28-rc2.

ii. Build kernel with IOSCHED_CFQ_CGROUP=y option.

iii. Restart new kernel.


3. Usage of 2 layer CFQ

* Preparation for using 2 layer CFQ

i. Mount cfq_cgroup special device to device directory.
ex.
mkdir /dev/cgroup
mount -t cgroup -o cfq cfq /dev/cgroup

ii. Change elevator scheduler for device to "cfq-cgroups"
ex.
echo cfq-cgorups > /sys/block/sda/queue/scheduler


* Usage of grouping control.
- Create a new group.
Make a new directory under /dev/cgroup.
For example, the following command generates a 'test1' group.
mkdir /dev/cgroup/test1

- Insert a task to a group.
Write process id(pid) on "tasks" entry in the corresponding group.
For example, the following command sets task with pid 1100 into test1
group.
echo 1100 > /dev/cgroup/test1/tasks

New child tasks of this task is also inserted into test1 group.

- Change I/O priorities of a group.
Write priority on "cfq.ioprio" entry in the corresponding group.
For example, the following command sets priority of rank 2 to 'test1'
group.

echo 2 > /dev/cgroup/test1/cfq.ioprio

I/O priority for cgroups takes the value from 0 to 7. It is same as
existing per-task CFQ.

If you want to change only I/O priority of a specific device and group,
add its device name as a second parameter.
For example, the following command sets priority of rank 2 to 'test1'
group for 'sda' device.

echo 2 sda > /dev/cgroup/test1/cfq.ioprio


If you want to change I/O priority of a specific device and group via
sysfs. If you can change its priority, Add its path for cgroup as a
second parameter.
For example, the following command sets priority of rank 2 to 'test1'
group for 'sda' device via sysfs.

echo 2 /test1 > /sys/block/sda/queue/iosched/ioprio

If you can change parameters of cfq_data (slice_sync, back_seek_penalty
and so on) for a specific device and group.
If you write only one parameter via sysfs, its setting reflects all
groups.

If you set elevator scheduler as cfq-cgroups, I/O priorities of its
new device set a default priority with groups. If you want to change
this default priority, write priority and "default" as second parameter
on "cfq.ioprio" entry in the corresponding group.
For example,

echo 2 default > /dev/cgroup/test1/cfq.ioprio

- Change I/O priority of task
Use existing "ionice" command.


4. Usage of Optional Functions.

i. Usage of a validation flag for 'think time'

This parameter can use via sysfs as similar as other cfq data parameter.
Its entry name is 'ttime_valid'.

This flag is decide to check think time.
The value 0 is always handled queues as idle class.
In practice, idie_window flag is clear.
The value 1 is handled as same as traditional CFQ.
The value 2 makes the think time invalid.


ii. Usage of ioprio class for cgroups.

The ioprio class use via cgroupfs as similar as ioprio.
Its entry name is 'cfq.ioprio_class'

The values of ioprio class are as same as I/O class of traditional CFQ.
0: IOPRIO_CLASS_NONE (is equal to IOPRIO_CLASS_BE)
1: IOPRIO_CLASS_RT
2: IOPRIO_CLASS_BE
3: IOPRIO_CLASS_IDLE


5. Future work.
We must implement the follows.
* Handle buffered I/O.


2008-11-12 08:33:49

by Satoshi UCHIDA

[permalink] [raw]
Subject: [PATCH][cfq-cgroups][01/12] Move basic strcture variable to header file.


The "cfq_data" structure and few definition are moved into header file.


Signed-off-by: Satoshi UCHIDA <[email protected]>

---
block/cfq-iosched.c | 68 +-------------------------------------
include/linux/cfq-iosched.h | 77 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 78 insertions(+), 67 deletions(-)
create mode 100644 include/linux/cfq-iosched.h

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 6a062ee..024d392 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -12,6 +12,7 @@
#include <linux/rbtree.h>
#include <linux/ioprio.h>
#include <linux/blktrace_api.h>
+#include <linux/cfq-iosched.h>

/*
* tunables
@@ -62,73 +63,6 @@ static DEFINE_SPINLOCK(ioc_gone_lock);
#define sample_valid(samples) ((samples) > 80)

/*
- * Most of our rbtree usage is for sorting with min extraction, so
- * if we cache the leftmost node we don't have to walk down the tree
- * to find it. Idea borrowed from Ingo Molnars CFS scheduler. We should
- * move this into the elevator for the rq sorting as well.
- */
-struct cfq_rb_root {
- struct rb_root rb;
- struct rb_node *left;
-};
-#define CFQ_RB_ROOT (struct cfq_rb_root) { RB_ROOT, NULL, }
-
-/*
- * Per block device queue structure
- */
-struct cfq_data {
- struct request_queue *queue;
-
- /*
- * rr list of queues with requests and the count of them
- */
- struct cfq_rb_root service_tree;
- unsigned int busy_queues;
-
- int rq_in_driver;
- int sync_flight;
-
- /*
- * queue-depth detection
- */
- int rq_queued;
- int hw_tag;
- int hw_tag_samples;
- int rq_in_driver_peak;
-
- /*
- * idle window management
- */
- struct timer_list idle_slice_timer;
- struct work_struct unplug_work;
-
- struct cfq_queue *active_queue;
- struct cfq_io_context *active_cic;
-
- /*
- * async queue for each priority case
- */
- struct cfq_queue *async_cfqq[2][IOPRIO_BE_NR];
- struct cfq_queue *async_idle_cfqq;
-
- sector_t last_position;
- unsigned long last_end_request;
-
- /*
- * tunables, see top of file
- */
- unsigned int cfq_quantum;
- unsigned int cfq_fifo_expire[2];
- unsigned int cfq_back_penalty;
- unsigned int cfq_back_max;
- unsigned int cfq_slice[2];
- unsigned int cfq_slice_async_rq;
- unsigned int cfq_slice_idle;
-
- struct list_head cic_list;
-};
-
-/*
* Per process-grouping structure
*/
struct cfq_queue {
diff --git a/include/linux/cfq-iosched.h b/include/linux/cfq-iosched.h
new file mode 100644
index 0000000..adb2410
--- /dev/null
+++ b/include/linux/cfq-iosched.h
@@ -0,0 +1,77 @@
+#ifndef _LINUX_CFQ_IOSCHED_H
+#define _LINUX_CFQ_IOSCHED_H
+
+#include <linux/rbtree.h>
+#include <linux/list.h>
+
+struct request_queue;
+struct cfq_io_context;
+
+/*
+ * Most of our rbtree usage is for sorting with min extraction, so
+ * if we cache the leftmost node we don't have to walk down the tree
+ * to find it. Idea borrowed from Ingo Molnars CFS scheduler. We should
+ * move this into the elevator for the rq sorting as well.
+ */
+struct cfq_rb_root {
+ struct rb_root rb;
+ struct rb_node *left;
+};
+#define CFQ_RB_ROOT (struct cfq_rb_root) { RB_ROOT, NULL, }
+
+/*
+ * Per block device queue structure
+ */
+struct cfq_data {
+ struct request_queue *queue;
+
+ /*
+ * rr list of queues with requests and the count of them
+ */
+ struct cfq_rb_root service_tree;
+ unsigned int busy_queues;
+
+ int rq_in_driver;
+ int sync_flight;
+
+ /*
+ * queue-depth detection
+ */
+ int rq_queued;
+ int hw_tag;
+ int hw_tag_samples;
+ int rq_in_driver_peak;
+
+ /*
+ * idle window management
+ */
+ struct timer_list idle_slice_timer;
+ struct work_struct unplug_work;
+
+ struct cfq_queue *active_queue;
+ struct cfq_io_context *active_cic;
+
+ /*
+ * async queue for each priority case
+ */
+ struct cfq_queue *async_cfqq[2][IOPRIO_BE_NR];
+ struct cfq_queue *async_idle_cfqq;
+
+ sector_t last_position;
+ unsigned long last_end_request;
+
+ /*
+ * tunables, see top of file
+ */
+ unsigned int cfq_quantum;
+ unsigned int cfq_fifo_expire[2];
+ unsigned int cfq_back_penalty;
+ unsigned int cfq_back_max;
+ unsigned int cfq_slice[2];
+ unsigned int cfq_slice_async_rq;
+ unsigned int cfq_slice_idle;
+
+ struct list_head cic_list;
+};
+
+#endif /* _LINUX_CFQ_IOSCHED_H */
--
1.5.6.5

2008-11-12 08:34:38

by Satoshi UCHIDA

[permalink] [raw]
Subject: [PATCH][cfq-cgroups][02/12] Introduce "cfq_driver_data" structure.


This patch introduces "cfq_driver_Data" structure.
This structure extract driver unique data from "cfq_data" structure.


Signed-off-by: Satoshi UCHIDA <[email protected]>

---
block/cfq-iosched.c | 218 ++++++++++++++++++++++++++-----------------
include/linux/cfq-iosched.h | 32 ++++---
2 files changed, 151 insertions(+), 99 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 024d392..b726e85 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -144,9 +144,10 @@ CFQ_CFQQ_FNS(sync);
#undef CFQ_CFQQ_FNS

#define cfq_log_cfqq(cfqd, cfqq, fmt, args...) \
- blk_add_trace_msg((cfqd)->queue, "cfq%d " fmt, (cfqq)->pid, ##args)
+ blk_add_trace_msg((cfqd)->cfqdd->queue, \
+ "cfq%d " fmt, (cfqq)->pid, ##args)
#define cfq_log(cfqd, fmt, args...) \
- blk_add_trace_msg((cfqd)->queue, "cfq " fmt, ##args)
+ blk_add_trace_msg((cfqd)->cfqdd->queue, "cfq " fmt, ##args)

static void cfq_dispatch_insert(struct request_queue *, struct request *);
static struct cfq_queue *cfq_get_queue(struct cfq_data *, int,
@@ -184,9 +185,11 @@ static inline int cfq_bio_sync(struct bio *bio)
*/
static inline void cfq_schedule_dispatch(struct cfq_data *cfqd)
{
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
if (cfqd->busy_queues) {
cfq_log(cfqd, "schedule dispatch");
- kblockd_schedule_work(cfqd->queue, &cfqd->unplug_work);
+ kblockd_schedule_work(cfqdd->queue,
+ &cfqdd->unplug_work);
}
}

@@ -271,7 +274,7 @@ cfq_choose_req(struct cfq_data *cfqd, struct request *rq1, struct request *rq2)
s1 = rq1->sector;
s2 = rq2->sector;

- last = cfqd->last_position;
+ last = cfqd->cfqdd->last_position;

/*
* by definition, 1KiB is 2 sectors
@@ -548,7 +551,7 @@ static void cfq_add_rq_rb(struct request *rq)
* if that happens, put the alias on the dispatch list
*/
while ((__alias = elv_rb_add(&cfqq->sort_list, rq)) != NULL)
- cfq_dispatch_insert(cfqd->queue, __alias);
+ cfq_dispatch_insert(cfqd->cfqdd->queue, __alias);

if (!cfq_cfqq_on_rr(cfqq))
cfq_add_cfqq_rr(cfqd, cfqq);
@@ -591,22 +594,24 @@ cfq_find_rq_fmerge(struct cfq_data *cfqd, struct bio *bio)
static void cfq_activate_request(struct request_queue *q, struct request *rq)
{
struct cfq_data *cfqd = q->elevator->elevator_data;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;

- cfqd->rq_in_driver++;
+ cfqdd->rq_in_driver++;
cfq_log_cfqq(cfqd, RQ_CFQQ(rq), "activate rq, drv=%d",
- cfqd->rq_in_driver);
+ cfqdd->rq_in_driver);

- cfqd->last_position = rq->hard_sector + rq->hard_nr_sectors;
+ cfqdd->last_position = rq->hard_sector + rq->hard_nr_sectors;
}

static void cfq_deactivate_request(struct request_queue *q, struct request *rq)
{
struct cfq_data *cfqd = q->elevator->elevator_data;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;

- WARN_ON(!cfqd->rq_in_driver);
- cfqd->rq_in_driver--;
+ WARN_ON(!cfqdd->rq_in_driver);
+ cfqdd->rq_in_driver--;
cfq_log_cfqq(cfqd, RQ_CFQQ(rq), "deactivate rq, drv=%d",
- cfqd->rq_in_driver);
+ cfqdd->rq_in_driver);
}

static void cfq_remove_request(struct request *rq)
@@ -619,7 +624,7 @@ static void cfq_remove_request(struct request *rq)
list_del_init(&rq->queuelist);
cfq_del_rq_rb(rq);

- cfqq->cfqd->rq_queued--;
+ cfqq->cfqd->cfqdd->rq_queued--;
if (rq_is_meta(rq)) {
WARN_ON(!cfqq->meta_pending);
cfqq->meta_pending--;
@@ -715,10 +720,12 @@ static void
__cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq,
int timed_out)
{
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
+
cfq_log_cfqq(cfqd, cfqq, "slice expired t=%d", timed_out);

if (cfq_cfqq_wait_request(cfqq))
- del_timer(&cfqd->idle_slice_timer);
+ del_timer(&cfqdd->idle_slice_timer);

cfq_clear_cfqq_must_dispatch(cfqq);
cfq_clear_cfqq_wait_request(cfqq);
@@ -736,9 +743,9 @@ __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq,
if (cfqq == cfqd->active_queue)
cfqd->active_queue = NULL;

- if (cfqd->active_cic) {
- put_io_context(cfqd->active_cic->ioc);
- cfqd->active_cic = NULL;
+ if (cfqdd->active_cic) {
+ put_io_context(cfqdd->active_cic->ioc);
+ cfqdd->active_cic = NULL;
}
}

@@ -777,15 +784,17 @@ static struct cfq_queue *cfq_set_active_queue(struct cfq_data *cfqd)
static inline sector_t cfq_dist_from_last(struct cfq_data *cfqd,
struct request *rq)
{
- if (rq->sector >= cfqd->last_position)
- return rq->sector - cfqd->last_position;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
+
+ if (rq->sector >= cfqdd->last_position)
+ return rq->sector - cfqdd->last_position;
else
- return cfqd->last_position - rq->sector;
+ return cfqdd->last_position - rq->sector;
}

static inline int cfq_rq_close(struct cfq_data *cfqd, struct request *rq)
{
- struct cfq_io_context *cic = cfqd->active_cic;
+ struct cfq_io_context *cic = cfqd->cfqdd->active_cic;

if (!sample_valid(cic->seek_samples))
return 0;
@@ -809,6 +818,7 @@ static int cfq_close_cooperator(struct cfq_data *cfq_data,
static void cfq_arm_slice_timer(struct cfq_data *cfqd)
{
struct cfq_queue *cfqq = cfqd->active_queue;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
struct cfq_io_context *cic;
unsigned long sl;

@@ -817,7 +827,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
* for devices that support queuing, otherwise we still have a problem
* with sync vs async workloads.
*/
- if (blk_queue_nonrot(cfqd->queue) && cfqd->hw_tag)
+ if (blk_queue_nonrot(cfqdd->queue) && cfqdd->hw_tag)
return;

WARN_ON(!RB_EMPTY_ROOT(&cfqq->sort_list));
@@ -832,13 +842,13 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
/*
* still requests with the driver, don't idle
*/
- if (cfqd->rq_in_driver)
+ if (cfqdd->rq_in_driver)
return;

/*
* task has exited, don't wait
*/
- cic = cfqd->active_cic;
+ cic = cfqdd->active_cic;
if (!cic || !atomic_read(&cic->ioc->nr_tasks))
return;

@@ -861,7 +871,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
if (sample_valid(cic->seek_samples) && CIC_SEEKY(cic))
sl = min(sl, msecs_to_jiffies(CFQ_MIN_TT));

- mod_timer(&cfqd->idle_slice_timer, jiffies + sl);
+ mod_timer(&cfqdd->idle_slice_timer, jiffies + sl);
cfq_log(cfqd, "arm_idle: %lu", sl);
}

@@ -880,7 +890,7 @@ static void cfq_dispatch_insert(struct request_queue *q, struct request *rq)
elv_dispatch_sort(q, rq);

if (cfq_cfqq_sync(cfqq))
- cfqd->sync_flight++;
+ cfqd->cfqdd->sync_flight++;
}

/*
@@ -950,7 +960,7 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
* flight or is idling for a new request, allow either of these
* conditions to happen (or time out) before selecting a new queue.
*/
- if (timer_pending(&cfqd->idle_slice_timer) ||
+ if (timer_pending(&cfqd->cfqdd->idle_slice_timer) ||
(cfqq->dispatched && cfq_cfqq_idle_window(cfqq))) {
cfqq = NULL;
goto keep_queue;
@@ -972,6 +982,7 @@ static int
__cfq_dispatch_requests(struct cfq_data *cfqd, struct cfq_queue *cfqq,
int max_dispatch)
{
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
int dispatched = 0;

BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list));
@@ -989,13 +1000,13 @@ __cfq_dispatch_requests(struct cfq_data *cfqd, struct cfq_queue *cfqq,
/*
* finally, insert request into driver dispatch list
*/
- cfq_dispatch_insert(cfqd->queue, rq);
+ cfq_dispatch_insert(cfqdd->queue, rq);

dispatched++;

- if (!cfqd->active_cic) {
+ if (!cfqdd->active_cic) {
atomic_inc(&RQ_CIC(rq)->ioc->refcount);
- cfqd->active_cic = RQ_CIC(rq);
+ cfqdd->active_cic = RQ_CIC(rq);
}

if (RB_EMPTY_ROOT(&cfqq->sort_list))
@@ -1022,7 +1033,7 @@ static int __cfq_forced_dispatch_cfqq(struct cfq_queue *cfqq)
int dispatched = 0;

while (cfqq->next_rq) {
- cfq_dispatch_insert(cfqq->cfqd->queue, cfqq->next_rq);
+ cfq_dispatch_insert(cfqq->cfqd->cfqdd->queue, cfqq->next_rq);
dispatched++;
}

@@ -1054,6 +1065,7 @@ static int cfq_dispatch_requests(struct request_queue *q, int force)
{
struct cfq_data *cfqd = q->elevator->elevator_data;
struct cfq_queue *cfqq;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
int dispatched;

if (!cfqd->busy_queues)
@@ -1077,12 +1089,12 @@ static int cfq_dispatch_requests(struct request_queue *q, int force)
break;
}

- if (cfqd->sync_flight && !cfq_cfqq_sync(cfqq))
+ if (cfqdd->sync_flight && !cfq_cfqq_sync(cfqq))
break;

cfq_clear_cfqq_must_dispatch(cfqq);
cfq_clear_cfqq_wait_request(cfqq);
- del_timer(&cfqd->idle_slice_timer);
+ del_timer(&cfqdd->idle_slice_timer);

dispatched += __cfq_dispatch_requests(cfqd, cfqq, max_dispatch);
}
@@ -1248,7 +1260,7 @@ static void cfq_exit_single_io_context(struct io_context *ioc,
struct cfq_data *cfqd = cic->key;

if (cfqd) {
- struct request_queue *q = cfqd->queue;
+ struct request_queue *q = cfqd->cfqdd->queue;
unsigned long flags;

spin_lock_irqsave(q->queue_lock, flags);
@@ -1272,7 +1284,7 @@ cfq_alloc_io_context(struct cfq_data *cfqd, gfp_t gfp_mask)
struct cfq_io_context *cic;

cic = kmem_cache_alloc_node(cfq_ioc_pool, gfp_mask | __GFP_ZERO,
- cfqd->queue->node);
+ cfqd->cfqdd->queue->node);
if (cic) {
cic->last_end_request = jiffies;
INIT_LIST_HEAD(&cic->queue_list);
@@ -1332,12 +1344,13 @@ static void changed_ioprio(struct io_context *ioc, struct cfq_io_context *cic)
{
struct cfq_data *cfqd = cic->key;
struct cfq_queue *cfqq;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
unsigned long flags;

if (unlikely(!cfqd))
return;

- spin_lock_irqsave(cfqd->queue->queue_lock, flags);
+ spin_lock_irqsave(cfqdd->queue->queue_lock, flags);

cfqq = cic->cfqq[ASYNC];
if (cfqq) {
@@ -1353,7 +1366,7 @@ static void changed_ioprio(struct io_context *ioc, struct cfq_io_context *cic)
if (cfqq)
cfq_mark_cfqq_prio_changed(cfqq);

- spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
+ spin_unlock_irqrestore(cfqdd->queue->queue_lock, flags);
}

static void cfq_ioc_set_ioprio(struct io_context *ioc)
@@ -1367,6 +1380,7 @@ cfq_find_alloc_queue(struct cfq_data *cfqd, int is_sync,
struct io_context *ioc, gfp_t gfp_mask)
{
struct cfq_queue *cfqq, *new_cfqq = NULL;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
struct cfq_io_context *cic;

retry:
@@ -1385,16 +1399,16 @@ retry:
* the allocator to do whatever it needs to attempt to
* free memory.
*/
- spin_unlock_irq(cfqd->queue->queue_lock);
+ spin_unlock_irq(cfqdd->queue->queue_lock);
new_cfqq = kmem_cache_alloc_node(cfq_pool,
gfp_mask | __GFP_NOFAIL | __GFP_ZERO,
- cfqd->queue->node);
- spin_lock_irq(cfqd->queue->queue_lock);
+ cfqdd->queue->node);
+ spin_lock_irq(cfqdd->queue->queue_lock);
goto retry;
} else {
cfqq = kmem_cache_alloc_node(cfq_pool,
gfp_mask | __GFP_ZERO,
- cfqd->queue->node);
+ cfqdd->queue->node);
if (!cfqq)
goto out;
}
@@ -1547,9 +1561,11 @@ cfq_cic_lookup(struct cfq_data *cfqd, struct io_context *ioc)
static int cfq_cic_link(struct cfq_data *cfqd, struct io_context *ioc,
struct cfq_io_context *cic, gfp_t gfp_mask)
{
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
unsigned long flags;
int ret;

+
ret = radix_tree_preload(gfp_mask);
if (!ret) {
cic->ioc = ioc;
@@ -1565,9 +1581,11 @@ static int cfq_cic_link(struct cfq_data *cfqd, struct io_context *ioc,
radix_tree_preload_end();

if (!ret) {
- spin_lock_irqsave(cfqd->queue->queue_lock, flags);
+ spin_lock_irqsave(cfqdd->queue->queue_lock,
+ flags);
list_add(&cic->queue_list, &cfqd->cic_list);
- spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
+ spin_unlock_irqrestore(cfqdd->queue->queue_lock,
+ flags);
}
}

@@ -1590,7 +1608,7 @@ cfq_get_io_context(struct cfq_data *cfqd, gfp_t gfp_mask)

might_sleep_if(gfp_mask & __GFP_WAIT);

- ioc = get_io_context(gfp_mask, cfqd->queue->node);
+ ioc = get_io_context(gfp_mask, cfqd->cfqdd->queue->node);
if (!ioc)
return NULL;

@@ -1676,7 +1694,7 @@ cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq,
enable_idle = old_idle = cfq_cfqq_idle_window(cfqq);

if (!atomic_read(&cic->ioc->nr_tasks) || !cfqd->cfq_slice_idle ||
- (cfqd->hw_tag && CIC_SEEKY(cic)))
+ (cfqd->cfqdd->hw_tag && CIC_SEEKY(cic)))
enable_idle = 0;
else if (sample_valid(cic->ttime_samples)) {
if (cic->ttime_mean > cfqd->cfq_slice_idle)
@@ -1731,7 +1749,7 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq,
if (rq_is_meta(rq) && !cfqq->meta_pending)
return 1;

- if (!cfqd->active_cic || !cfq_cfqq_wait_request(cfqq))
+ if (!cfqd->cfqdd->active_cic || !cfq_cfqq_wait_request(cfqq))
return 0;

/*
@@ -1774,8 +1792,9 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
struct request *rq)
{
struct cfq_io_context *cic = RQ_CIC(rq);
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;

- cfqd->rq_queued++;
+ cfqdd->rq_queued++;
if (rq_is_meta(rq))
cfqq->meta_pending++;

@@ -1793,8 +1812,8 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
*/
if (cfq_cfqq_wait_request(cfqq)) {
cfq_mark_cfqq_must_dispatch(cfqq);
- del_timer(&cfqd->idle_slice_timer);
- blk_start_queueing(cfqd->queue);
+ del_timer(&cfqdd->idle_slice_timer);
+ blk_start_queueing(cfqdd->queue);
}
} else if (cfq_should_preempt(cfqd, cfqq, rq)) {
/*
@@ -1804,7 +1823,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
*/
cfq_preempt_queue(cfqd, cfqq);
cfq_mark_cfqq_must_dispatch(cfqq);
- blk_start_queueing(cfqd->queue);
+ blk_start_queueing(cfqdd->queue);
}
}

@@ -1827,49 +1846,50 @@ static void cfq_insert_request(struct request_queue *q, struct request *rq)
* Update hw_tag based on peak queue depth over 50 samples under
* sufficient load.
*/
-static void cfq_update_hw_tag(struct cfq_data *cfqd)
+static void cfq_update_hw_tag(struct cfq_driver_data *cfqdd)
{
- if (cfqd->rq_in_driver > cfqd->rq_in_driver_peak)
- cfqd->rq_in_driver_peak = cfqd->rq_in_driver;
+ if (cfqdd->rq_in_driver > cfqdd->rq_in_driver_peak)
+ cfqdd->rq_in_driver_peak = cfqdd->rq_in_driver;

- if (cfqd->rq_queued <= CFQ_HW_QUEUE_MIN &&
- cfqd->rq_in_driver <= CFQ_HW_QUEUE_MIN)
+ if (cfqdd->rq_queued <= CFQ_HW_QUEUE_MIN &&
+ cfqdd->rq_in_driver <= CFQ_HW_QUEUE_MIN)
return;

- if (cfqd->hw_tag_samples++ < 50)
+ if (cfqdd->hw_tag_samples++ < 50)
return;

- if (cfqd->rq_in_driver_peak >= CFQ_HW_QUEUE_MIN)
- cfqd->hw_tag = 1;
+ if (cfqdd->rq_in_driver_peak >= CFQ_HW_QUEUE_MIN)
+ cfqdd->hw_tag = 1;
else
- cfqd->hw_tag = 0;
+ cfqdd->hw_tag = 0;

- cfqd->hw_tag_samples = 0;
- cfqd->rq_in_driver_peak = 0;
+ cfqdd->hw_tag_samples = 0;
+ cfqdd->rq_in_driver_peak = 0;
}

static void cfq_completed_request(struct request_queue *q, struct request *rq)
{
struct cfq_queue *cfqq = RQ_CFQQ(rq);
struct cfq_data *cfqd = cfqq->cfqd;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
const int sync = rq_is_sync(rq);
unsigned long now;

now = jiffies;
cfq_log_cfqq(cfqd, cfqq, "complete");

- cfq_update_hw_tag(cfqd);
+ cfq_update_hw_tag(cfqdd);

- WARN_ON(!cfqd->rq_in_driver);
+ WARN_ON(!cfqdd->rq_in_driver);
WARN_ON(!cfqq->dispatched);
- cfqd->rq_in_driver--;
+ cfqdd->rq_in_driver--;
cfqq->dispatched--;

if (cfq_cfqq_sync(cfqq))
- cfqd->sync_flight--;
+ cfqdd->sync_flight--;

if (!cfq_class_idle(cfqq))
- cfqd->last_end_request = now;
+ cfqdd->last_end_request = now;

if (sync)
RQ_CIC(rq)->last_end_request = now;
@@ -1889,7 +1909,7 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
cfq_arm_slice_timer(cfqd);
}

- if (!cfqd->rq_in_driver)
+ if (!cfqdd->rq_in_driver)
cfq_schedule_dispatch(cfqd);
}

@@ -2034,9 +2054,9 @@ queue_fail:

static void cfq_kick_queue(struct work_struct *work)
{
- struct cfq_data *cfqd =
- container_of(work, struct cfq_data, unplug_work);
- struct request_queue *q = cfqd->queue;
+ struct cfq_driver_data *cfqdd =
+ container_of(work, struct cfq_driver_data, unplug_work);
+ struct request_queue *q = cfqdd->queue;
unsigned long flags;

spin_lock_irqsave(q->queue_lock, flags);
@@ -2051,12 +2071,13 @@ static void cfq_idle_slice_timer(unsigned long data)
{
struct cfq_data *cfqd = (struct cfq_data *) data;
struct cfq_queue *cfqq;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
unsigned long flags;
int timed_out = 1;

cfq_log(cfqd, "idle timer fired");

- spin_lock_irqsave(cfqd->queue->queue_lock, flags);
+ spin_lock_irqsave(cfqdd->queue->queue_lock, flags);

cfqq = cfqd->active_queue;
if (cfqq) {
@@ -2088,13 +2109,13 @@ expire:
out_kick:
cfq_schedule_dispatch(cfqd);
out_cont:
- spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
+ spin_unlock_irqrestore(cfqdd->queue->queue_lock, flags);
}

-static void cfq_shutdown_timer_wq(struct cfq_data *cfqd)
+static void cfq_shutdown_timer_wq(struct cfq_driver_data *cfqdd)
{
- del_timer_sync(&cfqd->idle_slice_timer);
- kblockd_flush_work(&cfqd->unplug_work);
+ del_timer_sync(&cfqdd->idle_slice_timer);
+ kblockd_flush_work(&cfqdd->unplug_work);
}

static void cfq_put_async_queues(struct cfq_data *cfqd)
@@ -2115,9 +2136,10 @@ static void cfq_put_async_queues(struct cfq_data *cfqd)
static void cfq_exit_queue(elevator_t *e)
{
struct cfq_data *cfqd = e->elevator_data;
- struct request_queue *q = cfqd->queue;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
+ struct request_queue *q = cfqdd->queue;

- cfq_shutdown_timer_wq(cfqd);
+ cfq_shutdown_timer_wq(cfqdd);

spin_lock_irq(q->queue_lock);

@@ -2136,11 +2158,37 @@ static void cfq_exit_queue(elevator_t *e)

spin_unlock_irq(q->queue_lock);

- cfq_shutdown_timer_wq(cfqd);
+ cfq_shutdown_timer_wq(cfqdd);

+ kfree(cfqdd);
kfree(cfqd);
}

+static struct cfq_driver_data *
+cfq_init_driver_data(struct request_queue *q, struct cfq_data *cfqd)
+{
+ struct cfq_driver_data *cfqdd;
+
+ cfqdd = kmalloc_node(sizeof(*cfqdd),
+ GFP_KERNEL | __GFP_ZERO, q->node);
+ if (!cfqdd)
+ return NULL;
+
+ cfqdd->queue = q;
+
+ init_timer(&cfqdd->idle_slice_timer);
+ cfqdd->idle_slice_timer.function = cfq_idle_slice_timer;
+ cfqdd->idle_slice_timer.data = (unsigned long) cfqd;
+
+ INIT_WORK(&cfqdd->unplug_work, cfq_kick_queue);
+
+ cfqdd->last_end_request = jiffies;
+
+ cfqdd->hw_tag = 1;
+
+ return cfqdd;
+}
+
static void *cfq_init_queue(struct request_queue *q)
{
struct cfq_data *cfqd;
@@ -2149,18 +2197,15 @@ static void *cfq_init_queue(struct request_queue *q)
if (!cfqd)
return NULL;

+ cfqd->cfqdd = cfq_init_driver_data(q, cfqd);
+ if (!cfqd->cfqdd) {
+ kfree(cfqd);
+ return NULL;
+ }
+
cfqd->service_tree = CFQ_RB_ROOT;
INIT_LIST_HEAD(&cfqd->cic_list);

- cfqd->queue = q;
-
- init_timer(&cfqd->idle_slice_timer);
- cfqd->idle_slice_timer.function = cfq_idle_slice_timer;
- cfqd->idle_slice_timer.data = (unsigned long) cfqd;
-
- INIT_WORK(&cfqd->unplug_work, cfq_kick_queue);
-
- cfqd->last_end_request = jiffies;
cfqd->cfq_quantum = cfq_quantum;
cfqd->cfq_fifo_expire[0] = cfq_fifo_expire[0];
cfqd->cfq_fifo_expire[1] = cfq_fifo_expire[1];
@@ -2170,7 +2215,6 @@ static void *cfq_init_queue(struct request_queue *q)
cfqd->cfq_slice[1] = cfq_slice_sync;
cfqd->cfq_slice_async_rq = cfq_slice_async_rq;
cfqd->cfq_slice_idle = cfq_slice_idle;
- cfqd->hw_tag = 1;

return cfqd;
}
diff --git a/include/linux/cfq-iosched.h b/include/linux/cfq-iosched.h
index adb2410..50003f7 100644
--- a/include/linux/cfq-iosched.h
+++ b/include/linux/cfq-iosched.h
@@ -20,17 +20,11 @@ struct cfq_rb_root {
#define CFQ_RB_ROOT (struct cfq_rb_root) { RB_ROOT, NULL, }

/*
- * Per block device queue structure
+ * Driver unique data structure
*/
-struct cfq_data {
+struct cfq_driver_data {
struct request_queue *queue;

- /*
- * rr list of queues with requests and the count of them
- */
- struct cfq_rb_root service_tree;
- unsigned int busy_queues;
-
int rq_in_driver;
int sync_flight;

@@ -48,18 +42,30 @@ struct cfq_data {
struct timer_list idle_slice_timer;
struct work_struct unplug_work;

- struct cfq_queue *active_queue;
struct cfq_io_context *active_cic;

+ sector_t last_position;
+ unsigned long last_end_request;
+};
+
+/*
+ * Per block device queue structure
+ */
+struct cfq_data {
+ /*
+ * rr list of queues with requests and the count of them
+ */
+ struct cfq_rb_root service_tree;
+ unsigned int busy_queues;
+
+ struct cfq_queue *active_queue;
+
/*
* async queue for each priority case
*/
struct cfq_queue *async_cfqq[2][IOPRIO_BE_NR];
struct cfq_queue *async_idle_cfqq;

- sector_t last_position;
- unsigned long last_end_request;
-
/*
* tunables, see top of file
*/
@@ -72,6 +78,8 @@ struct cfq_data {
unsigned int cfq_slice_idle;

struct list_head cic_list;
+
+ struct cfq_driver_data *cfqdd;
};

#endif /* _LINUX_CFQ_IOSCHED_H */
--
1.5.6.5

2008-11-12 08:34:52

by Satoshi UCHIDA

[permalink] [raw]
Subject: [PATCH][cfq-cgroups][03/12] Add cgroup file and modify configure files.


This patch adds the cfq-cgroup file and modifies configure files.
The cfq-cgroup file store some functions which use to expand CFQ
scheduler for handling cgroups.
Expanded CFQ scheduler is registered for "cfq-cgroups".


Signed-off-by: Satoshi UCHIDA <[email protected]>

---
block/Kconfig.iosched | 16 ++++++++++++++++
block/Makefile | 1 +
block/cfq-cgroup.c | 32 ++++++++++++++++++++++++++++++++
3 files changed, 49 insertions(+), 0 deletions(-)
create mode 100644 block/cfq-cgroup.c

diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index 7e803fc..dc61120 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -40,6 +40,18 @@ config IOSCHED_CFQ
working environment, suitable for desktop systems.
This is the default I/O scheduler.

+config IOSCHED_CFQ_CGROUP
+ tristate "expanded CFQ I/O Scheduler for cgroups"
+ default n
+ depends on IOSCHED_CFQ && CGROUPS
+ ---help---
+ The expanded CFQ I/O scheduelr for cgroups tries to distribute
+ bandwidth equally among all groups and among all processes within
+ groups in the system. It should provide a fair working environment,
+ suitable for consolidated environment which have some destop systems.
+ This scheduler expands the CFQ I/O scheduler into two layer control
+ -- per group layer and per task layer --.
+
choice
prompt "Default I/O scheduler"
default DEFAULT_CFQ
@@ -56,6 +68,9 @@ choice
config DEFAULT_CFQ
bool "CFQ" if IOSCHED_CFQ=y

+ config DEFAULT_CFQ_CGROUP
+ bool "CFQ-Cgroups" if IOSCHED_CFQ_CGROUP=y
+
config DEFAULT_NOOP
bool "No-op"

@@ -66,6 +81,7 @@ config DEFAULT_IOSCHED
default "anticipatory" if DEFAULT_AS
default "deadline" if DEFAULT_DEADLINE
default "cfq" if DEFAULT_CFQ
+ default "cfq-cgroups" if DEFAULT_CFQ_CGROUP
default "noop" if DEFAULT_NOOP

endmenu
diff --git a/block/Makefile b/block/Makefile
index bfe7304..3c0f59d 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_IOSCHED_NOOP) += noop-iosched.o
obj-$(CONFIG_IOSCHED_AS) += as-iosched.o
obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
+obj-$(CONFIG_IOSCHED_CFQ_CGROUP) += cfq-cgroup.o

obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
diff --git a/block/cfq-cgroup.c b/block/cfq-cgroup.c
new file mode 100644
index 0000000..3deef41
--- /dev/null
+++ b/block/cfq-cgroup.c
@@ -0,0 +1,32 @@
+/*
+ * CFQ CGROUP disk scheduler.
+ *
+ * This program is a wrapper program that is
+ * extend CFQ disk scheduler for handling
+ * cgroup subsystem.
+ *
+ * This program is based on original CFQ code.
+ *
+ * Copyright (C) 2008 Satoshi UCHIDA <[email protected]>
+ * and NEC Corp.
+ */
+
+#include <linux/blkdev.h>
+#include <linux/cgroup.h>
+#include <linux/cfq-iosched.h>
+
+static int __init cfq_cgroup_init(void)
+{
+ return 0;
+}
+
+static void __exit cfq_cgroup_exit(void)
+{
+}
+
+module_init(cfq_cgroup_init);
+module_exit(cfq_cgroup_exit);
+
+MODULE_AUTHOR("Satoshi UCHIDA");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Expanded CFQ IO scheduler for CGROUPS");
--
1.5.6.5

2008-11-12 08:35:12

by Satoshi UCHIDA

[permalink] [raw]
Subject: [PATCH][cfq-cgroups][04/12] Register or unregister "cfq-cgroups" module.


This patch introduce a register/unregister functions of
"cfq-cgroups" module.
A elevator_type variables is inherited one of the original CFQ
scheduler.


Signed-off-by: Satoshi UCHIDA <[email protected]>

---
block/cfq-cgroup.c | 122 +++++++++++++++++++++++++++++++++++++++++++
block/cfq-iosched.c | 2 +-
include/linux/cfq-iosched.h | 2 +
3 files changed, 125 insertions(+), 1 deletions(-)

diff --git a/block/cfq-cgroup.c b/block/cfq-cgroup.c
index 3deef41..aaa00ef 100644
--- a/block/cfq-cgroup.c
+++ b/block/cfq-cgroup.c
@@ -15,13 +15,135 @@
#include <linux/cgroup.h>
#include <linux/cfq-iosched.h>

+/*
+ * sysfs parts below -->
+ */
+static ssize_t
+cfq_cgroup_var_show(char *page, struct cfq_data *cfqd,
+ int (func)(struct cfq_data *))
+{
+ int val, retval = 0;
+
+ val = func(cfqd);
+
+ retval = snprintf(page, PAGE_SIZE, "%d\n", val);
+
+ return retval;
+}
+
+#define SHOW_FUNCTION(__FUNC, __VAR, __CONV) \
+static int val_transrate_##__FUNC(struct cfq_data *cfqd) \
+{ \
+ if (__CONV) \
+ return jiffies_to_msecs(cfqd->__VAR); \
+ else \
+ return cfqd->__VAR; \
+} \
+static ssize_t __FUNC(elevator_t *e, char *page) \
+{ \
+ struct cfq_data *cfqd = e->elevator_data; \
+ \
+ return cfq_cgroup_var_show((page), (cfqd), \
+ val_transrate_##__FUNC); \
+}
+SHOW_FUNCTION(cfq_cgroup_quantum_show, cfq_quantum, 0);
+SHOW_FUNCTION(cfq_cgroup_fifo_expire_sync_show, cfq_fifo_expire[1], 1);
+SHOW_FUNCTION(cfq_cgroup_fifo_expire_async_show, cfq_fifo_expire[0], 1);
+SHOW_FUNCTION(cfq_cgroup_back_seek_max_show, cfq_back_max, 0);
+SHOW_FUNCTION(cfq_cgroup_back_seek_penalty_show, cfq_back_penalty, 0);
+SHOW_FUNCTION(cfq_cgroup_slice_idle_show, cfq_slice_idle, 1);
+SHOW_FUNCTION(cfq_cgroup_slice_sync_show, cfq_slice[1], 1);
+SHOW_FUNCTION(cfq_cgroup_slice_async_show, cfq_slice[0], 1);
+SHOW_FUNCTION(cfq_cgroup_slice_async_rq_show, cfq_slice_async_rq, 0);
+#undef SHOW_FUNCTION
+
+static ssize_t
+cfq_cgroup_var_store(const char *page, size_t count, struct cfq_data *cfqd,
+ void (func)(struct cfq_data *, unsigned int))
+{
+ int err;
+ unsigned long val;
+
+ err = strict_strtoul(page, 10, &val);
+ if (err)
+ return 0;
+
+ func(cfqd, val);
+
+ return count;
+}
+
+#define STORE_FUNCTION(__FUNC, __VAR, MIN, MAX, __CONV) \
+static void val_transrate_##__FUNC(struct cfq_data *cfqd, \
+ unsigned int __data) \
+{ \
+ if (__data < (MIN)) \
+ __data = (MIN); \
+ else if (__data > (MAX)) \
+ __data = (MAX); \
+ if (__CONV) \
+ cfqd->__VAR = msecs_to_jiffies(__data); \
+ else \
+ cfqd->__VAR = __data; \
+} \
+static ssize_t __FUNC(elevator_t *e, const char *page, size_t count) \
+{ \
+ struct cfq_data *cfqd = e->elevator_data; \
+ int ret = cfq_cgroup_var_store((page), count, cfqd, \
+ val_transrate_##__FUNC); \
+ return ret; \
+}
+STORE_FUNCTION(cfq_cgroup_quantum_store, cfq_quantum, 1, UINT_MAX, 0);
+STORE_FUNCTION(cfq_cgroup_fifo_expire_sync_store, cfq_fifo_expire[1], 1,
+ UINT_MAX, 1);
+STORE_FUNCTION(cfq_cgroup_fifo_expire_async_store, cfq_fifo_expire[0], 1,
+ UINT_MAX, 1);
+STORE_FUNCTION(cfq_cgroup_back_seek_max_store, cfq_back_max, 0, UINT_MAX, 0);
+STORE_FUNCTION(cfq_cgroup_back_seek_penalty_store, cfq_back_penalty, 1,
+ UINT_MAX, 0);
+STORE_FUNCTION(cfq_cgroup_slice_idle_store, cfq_slice_idle,
+ 0, UINT_MAX, 1);
+STORE_FUNCTION(cfq_cgroup_slice_sync_store, cfq_slice[1], 1, UINT_MAX, 1);
+STORE_FUNCTION(cfq_cgroup_slice_async_store, cfq_slice[0], 1, UINT_MAX, 1);
+STORE_FUNCTION(cfq_cgroup_slice_async_rq_store, cfq_slice_async_rq, 1,
+ UINT_MAX, 0);
+#undef STORE_FUNCTION
+
+#define CFQ_CGROUP_ATTR(name) \
+ __ATTR(name, S_IRUGO|S_IWUSR, cfq_cgroup_##name##_show, \
+ cfq_cgroup_##name##_store)
+
+static struct elv_fs_entry cfq_cgroup_attrs[] = {
+ CFQ_CGROUP_ATTR(quantum),
+ CFQ_CGROUP_ATTR(fifo_expire_sync),
+ CFQ_CGROUP_ATTR(fifo_expire_async),
+ CFQ_CGROUP_ATTR(back_seek_max),
+ CFQ_CGROUP_ATTR(back_seek_penalty),
+ CFQ_CGROUP_ATTR(slice_sync),
+ CFQ_CGROUP_ATTR(slice_async),
+ CFQ_CGROUP_ATTR(slice_async_rq),
+ CFQ_CGROUP_ATTR(slice_idle),
+ __ATTR_NULL
+};
+
+static struct elevator_type iosched_cfq_cgroup = {
+ .elevator_attrs = cfq_cgroup_attrs,
+ .elevator_name = "cfq-cgroups",
+ .elevator_owner = THIS_MODULE,
+};
+
static int __init cfq_cgroup_init(void)
{
+ iosched_cfq_cgroup.ops = iosched_cfq.ops;
+
+ elv_register(&iosched_cfq_cgroup);
+
return 0;
}

static void __exit cfq_cgroup_exit(void)
{
+ elv_unregister(&iosched_cfq_cgroup);
}

module_init(cfq_cgroup_init);
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index b726e85..e105827 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -2332,7 +2332,7 @@ static struct elv_fs_entry cfq_attrs[] = {
__ATTR_NULL
};

-static struct elevator_type iosched_cfq = {
+struct elevator_type iosched_cfq = {
.ops = {
.elevator_merge_fn = cfq_merge,
.elevator_merged_fn = cfq_merged_request,
diff --git a/include/linux/cfq-iosched.h b/include/linux/cfq-iosched.h
index 50003f7..a28ef00 100644
--- a/include/linux/cfq-iosched.h
+++ b/include/linux/cfq-iosched.h
@@ -82,4 +82,6 @@ struct cfq_data {
struct cfq_driver_data *cfqdd;
};

+extern struct elevator_type iosched_cfq;
+
#endif /* _LINUX_CFQ_IOSCHED_H */
--
1.5.6.5

2008-11-12 08:35:46

by Satoshi UCHIDA

[permalink] [raw]
Subject: [PATCH][cfq-cgroups][05/12] Introduce cgroups structure with ioprio entry.

>From 471798eeeea0d3cdc4c556ec3d663d4a91fd2e05 Mon Sep 17 00:00:00 2001
From: Satoshi UCHIDA <[email protected]>
Date: Wed, 29 Oct 2008 18:12:51 +0900
Subject: [PATCH][cfq-cgroups] Introduce cgroups structure with ioprio entry.

This patch introcude cfq_cgroup structure which is type for
group control within expanded CFQ scheduler.
In addition, the cfq_cgroup structure has "ioprio" entry which
is preference of group for I/O.


Signed-off-by: Satoshi UCHIDA <[email protected]>

---
block/cfq-cgroup.c | 148 +++++++++++++++++++++++++++++++++++++++++
include/linux/cgroup_subsys.h | 6 ++
2 files changed, 154 insertions(+), 0 deletions(-)

diff --git a/block/cfq-cgroup.c b/block/cfq-cgroup.c
index aaa00ef..733980d 100644
--- a/block/cfq-cgroup.c
+++ b/block/cfq-cgroup.c
@@ -15,6 +15,154 @@
#include <linux/cgroup.h>
#include <linux/cfq-iosched.h>

+#define CFQ_CGROUP_MAX_IOPRIO (7)
+
+
+struct cfq_cgroup {
+ struct cgroup_subsys_state css;
+ unsigned int ioprio;
+};
+
+static inline struct cfq_cgroup *cgroup_to_cfq_cgroup(struct cgroup *cont)
+{
+ return container_of(cgroup_subsys_state(cont, cfq_subsys_id),
+ struct cfq_cgroup, css);
+}
+
+static inline struct cfq_cgroup *task_to_cfq_cgroup(struct task_struct *tsk)
+{
+ return container_of(task_subsys_state(tsk, cfq_subsys_id),
+ struct cfq_cgroup, css);
+}
+
+
+static struct cgroup_subsys_state *
+cfq_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
+{
+ struct cfq_cgroup *cfqc;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return ERR_PTR(-EPERM);
+
+ if (!cgroup_is_descendant(cont))
+ return ERR_PTR(-EPERM);
+
+ cfqc = kzalloc(sizeof(struct cfq_cgroup), GFP_KERNEL);
+ if (unlikely(!cfqc))
+ return ERR_PTR(-ENOMEM);
+
+ cfqc->ioprio = 3;
+
+ return &cfqc->css;
+}
+
+static void cfq_cgroup_destroy(struct cgroup_subsys *ss, struct cgroup *cont)
+{
+ kfree(cgroup_to_cfq_cgroup(cont));
+}
+
+static ssize_t cfq_cgroup_read(struct cgroup *cont, struct cftype *cft,
+ struct file *file, char __user *userbuf,
+ size_t nbytes, loff_t *ppos)
+{
+ struct cfq_cgroup *cfqc;
+ char *page;
+ ssize_t ret;
+
+ page = (char *)__get_free_page(GFP_TEMPORARY);
+ if (!page)
+ return -ENOMEM;
+
+ cgroup_lock();
+ if (cgroup_is_removed(cont)) {
+ cgroup_unlock();
+ ret = -ENODEV;
+ goto out;
+ }
+
+ cfqc = cgroup_to_cfq_cgroup(cont);
+
+ cgroup_unlock();
+
+ /* print priority */
+ ret = snprintf(page, PAGE_SIZE, "%d \n", cfqc->ioprio);
+
+ ret = simple_read_from_buffer(userbuf, nbytes, ppos, page, ret);
+
+out:
+ free_page((unsigned long)page);
+ return ret;
+}
+
+static ssize_t cfq_cgroup_write(struct cgroup *cont, struct cftype *cft,
+ struct file *file, const char __user *userbuf,
+ size_t nbytes, loff_t *ppos)
+{
+ struct cfq_cgroup *cfqc;
+ ssize_t ret;
+ long new_prio;
+ int err;
+ char *buffer = NULL;
+
+ cgroup_lock();
+ if (cgroup_is_removed(cont)) {
+ cgroup_unlock();
+ ret = -ENODEV;
+ goto out;
+ }
+
+ cfqc = cgroup_to_cfq_cgroup(cont);
+ cgroup_unlock();
+
+ /* set priority */
+ buffer = kmalloc(nbytes + 1, GFP_KERNEL);
+ if (buffer == NULL)
+ return -ENOMEM;
+
+ if (copy_from_user(buffer, userbuf, nbytes)) {
+ ret = -EFAULT;
+ goto out;
+ }
+ buffer[nbytes] = 0;
+
+ err = strict_strtoul(buffer, 10, &new_prio);
+ if ((err) || ((new_prio < 0) || (new_prio > CFQ_CGROUP_MAX_IOPRIO))) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ cfqc->ioprio = new_prio;
+
+ ret = nbytes;
+
+out:
+ kfree(buffer);
+
+ return ret;
+}
+
+static struct cftype files[] = {
+ {
+ .name = "ioprio",
+ .read = cfq_cgroup_read,
+ .write = cfq_cgroup_write,
+ },
+};
+
+static int cfq_cgroup_populate(struct cgroup_subsys *ss, struct cgroup *cont)
+{
+ return cgroup_add_files(cont, ss, files, ARRAY_SIZE(files));
+}
+
+struct cgroup_subsys cfq_subsys = {
+ .name = "cfq",
+ .create = cfq_cgroup_create,
+ .destroy = cfq_cgroup_destroy,
+ .populate = cfq_cgroup_populate,
+ .subsys_id = cfq_subsys_id,
+};
+
+
/*
* sysfs parts below -->
*/
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 9c22396..a9482aa 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -54,3 +54,9 @@ SUBSYS(freezer)
#endif

/* */
+
+#ifdef CONFIG_IOSCHED_CFQ_CGROUP
+SUBSYS(cfq)
+#endif
+
+/* */
--
1.5.6.5

2008-11-12 08:36:14

by Satoshi UCHIDA

[permalink] [raw]
Subject: [PATCH][cfq-cgroups][06/12] Add siblings tree control for driver data(cfq_driver_data).


This patch adds a tree control for siblings of driver data(cfq_driver_data).
This tree controls cfq data(cfq_data) for same device, and is used mainly
when a new device is registered or a existed device is unregistered.


Signed-off-by: Satoshi UCHIDA <[email protected]>

---
block/cfq-cgroup.c | 114 +++++++++++++++++++++++++++++++++++++++++++
block/cfq-iosched.c | 50 +++++++++++++++---
include/linux/cfq-iosched.h | 27 ++++++++++
3 files changed, 182 insertions(+), 9 deletions(-)

diff --git a/block/cfq-cgroup.c b/block/cfq-cgroup.c
index 733980d..ce35af2 100644
--- a/block/cfq-cgroup.c
+++ b/block/cfq-cgroup.c
@@ -17,6 +17,7 @@

#define CFQ_CGROUP_MAX_IOPRIO (7)

+static struct cfq_ops cfq_cgroup_op;

struct cfq_cgroup {
struct cgroup_subsys_state css;
@@ -36,6 +37,70 @@ static inline struct cfq_cgroup *task_to_cfq_cgroup(struct task_struct *tsk)
}


+/*
+ * Add device or cgroup data functions.
+ */
+static void cfq_cgroup_init_driver_data_opt(struct cfq_driver_data *cfqdd,
+ struct cfq_data *cfqd)
+{
+ cfqdd->sibling_tree = RB_ROOT;
+ cfqdd->siblings = 0;
+}
+
+static void cfq_driver_sibling_tree_add(struct cfq_driver_data *cfqdd,
+ struct cfq_data *cfqd)
+{
+ struct rb_node **p;
+ struct rb_node *parent = NULL;
+
+ BUG_ON(!RB_EMPTY_NODE(&cfqd->sib_node));
+
+ p = &cfqdd->sibling_tree.rb_node;
+
+ while (*p) {
+ struct cfq_data *__cfqd;
+ struct rb_node **n;
+
+ parent = *p;
+ __cfqd = rb_entry(parent, struct cfq_data, sib_node);
+
+ if (cfqd < __cfqd)
+ n = &(*p)->rb_left;
+ else
+ n = &(*p)->rb_right;
+ p = n;
+ }
+
+ rb_link_node(&cfqd->sib_node, parent, p);
+ rb_insert_color(&cfqd->sib_node, &cfqdd->sibling_tree);
+ cfqdd->siblings++;
+ cfqd->cfqdd = cfqdd;
+}
+
+static struct cfq_data *
+__cfq_cgroup_init_queue(struct request_queue *q, struct cfq_driver_data *cfqdd)
+{
+ struct cfq_data *cfqd = cfq_init_cfq_data(q, cfqdd, &cfq_cgroup_op);
+
+ if (!cfqd)
+ return NULL;
+
+ RB_CLEAR_NODE(&cfqd->sib_node);
+
+ cfq_driver_sibling_tree_add(cfqd->cfqdd, cfqd);
+
+ return cfqd;
+}
+
+static void *cfq_cgroup_init_queue(struct request_queue *q)
+{
+ struct cfq_data *cfqd = NULL;
+
+ cfqd = __cfq_cgroup_init_queue(q, NULL);
+
+ return cfqd;
+}
+
static struct cgroup_subsys_state *
cfq_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
{
@@ -56,11 +121,53 @@ cfq_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
return &cfqc->css;
}

+
+/*
+ * Remove device or cgroup data functions.
+ */
+static void cfq_cgroup_erase_driver_siblings(struct cfq_driver_data *cfqdd,
+ struct cfq_data *cfqd)
+{
+ rb_erase(&cfqd->sib_node, &cfqdd->sibling_tree);
+ cfqdd->siblings--;
+}
+
+static void cfq_exit_device_group(struct cfq_driver_data *cfqdd)
+{
+ struct rb_node *p, *n;
+ struct cfq_data *cfqd;
+
+ p = rb_first(&cfqdd->sibling_tree);
+
+ while (p) {
+ n = rb_next(p);
+ cfqd = rb_entry(p, struct cfq_data, sib_node);
+
+ cfq_cgroup_erase_driver_siblings(cfqdd, cfqd);
+ cfq_free_cfq_data(cfqd);
+
+ p = n;
+ }
+}
+
+static void cfq_cgroup_exit_queue(elevator_t *e)
+{
+ struct cfq_data *cfqd = e->elevator_data;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
+
+ cfq_exit_device_group(cfqdd);
+ kfree(cfqdd);
+}
+
static void cfq_cgroup_destroy(struct cgroup_subsys *ss, struct cgroup *cont)
{
kfree(cgroup_to_cfq_cgroup(cont));
}

+
+/*
+ * cgroupfs parts below -->
+ */
static ssize_t cfq_cgroup_read(struct cgroup *cont, struct cftype *cft,
struct file *file, char __user *userbuf,
size_t nbytes, loff_t *ppos)
@@ -154,6 +261,7 @@ static int cfq_cgroup_populate(struct cgroup_subsys *ss, struct cgroup *cont)
return cgroup_add_files(cont, ss, files, ARRAY_SIZE(files));
}

+
struct cgroup_subsys cfq_subsys = {
.name = "cfq",
.create = cfq_cgroup_create,
@@ -280,9 +388,15 @@ static struct elevator_type iosched_cfq_cgroup = {
.elevator_owner = THIS_MODULE,
};

+static struct cfq_ops cfq_cgroup_op = {
+ .cfq_init_driver_data_opt_fn = cfq_cgroup_init_driver_data_opt,
+};
+
static int __init cfq_cgroup_init(void)
{
iosched_cfq_cgroup.ops = iosched_cfq.ops;
+ iosched_cfq_cgroup.ops.elevator_init_fn = cfq_cgroup_init_queue;
+ iosched_cfq_cgroup.ops.elevator_exit_fn = cfq_cgroup_exit_queue;

elv_register(&iosched_cfq_cgroup);

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index e105827..fd1ed0c 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -62,6 +62,8 @@ static DEFINE_SPINLOCK(ioc_gone_lock);

#define sample_valid(samples) ((samples) > 80)

+static struct cfq_ops cfq_op;
+
/*
* Per process-grouping structure
*/
@@ -2133,9 +2135,8 @@ static void cfq_put_async_queues(struct cfq_data *cfqd)
cfq_put_queue(cfqd->async_idle_cfqq);
}

-static void cfq_exit_queue(elevator_t *e)
+void cfq_free_cfq_data(struct cfq_data *cfqd)
{
- struct cfq_data *cfqd = e->elevator_data;
struct cfq_driver_data *cfqdd = cfqd->cfqdd;
struct request_queue *q = cfqdd->queue;

@@ -2160,12 +2161,21 @@ static void cfq_exit_queue(elevator_t *e)

cfq_shutdown_timer_wq(cfqdd);

- kfree(cfqdd);
kfree(cfqd);
}

+static void cfq_exit_queue(elevator_t *e)
+{
+ struct cfq_data *cfqd = e->elevator_data;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
+
+ cfq_free_cfq_data(cfqd);
+ kfree(cfqdd);
+}
+
static struct cfq_driver_data *
-cfq_init_driver_data(struct request_queue *q, struct cfq_data *cfqd)
+cfq_init_driver_data(struct request_queue *q, struct cfq_data *cfqd,
+ struct cfq_ops *op)
{
struct cfq_driver_data *cfqdd;

@@ -2186,10 +2196,16 @@ cfq_init_driver_data(struct request_queue *q, struct cfq_data *cfqd)

cfqdd->hw_tag = 1;

+ /* module dependend initialization */
+ cfqdd->op = op;
+ if (op->cfq_init_driver_data_opt_fn)
+ op->cfq_init_driver_data_opt_fn(cfqdd, cfqd);
+
return cfqdd;
}

-static void *cfq_init_queue(struct request_queue *q)
+struct cfq_data *cfq_init_cfq_data(struct request_queue *q,
+ struct cfq_driver_data *cfqdd, struct cfq_ops *op)
{
struct cfq_data *cfqd;

@@ -2197,10 +2213,14 @@ static void *cfq_init_queue(struct request_queue *q)
if (!cfqd)
return NULL;

- cfqd->cfqdd = cfq_init_driver_data(q, cfqd);
- if (!cfqd->cfqdd) {
- kfree(cfqd);
- return NULL;
+ if (cfqdd)
+ cfqd->cfqdd = cfqdd;
+ else {
+ cfqd->cfqdd = cfq_init_driver_data(q, cfqd, op);
+ if (!cfqd->cfqdd) {
+ kfree(cfqd);
+ return NULL;
+ }
}

cfqd->service_tree = CFQ_RB_ROOT;
@@ -2219,6 +2239,15 @@ static void *cfq_init_queue(struct request_queue *q)
return cfqd;
}

+static void *cfq_init_queue(struct request_queue *q)
+{
+ struct cfq_data *cfqd = NULL;
+
+ cfqd = cfq_init_cfq_data(q, NULL, &cfq_op);
+
+ return cfqd;
+}
+
static void cfq_slab_kill(void)
{
/*
@@ -2358,6 +2387,9 @@ struct elevator_type iosched_cfq = {
.elevator_owner = THIS_MODULE,
};

+static struct cfq_ops cfq_op = {
+};
+
static int __init cfq_init(void)
{
/*
diff --git a/include/linux/cfq-iosched.h b/include/linux/cfq-iosched.h
index a28ef00..22d1aed 100644
--- a/include/linux/cfq-iosched.h
+++ b/include/linux/cfq-iosched.h
@@ -6,6 +6,7 @@

struct request_queue;
struct cfq_io_context;
+struct cfq_ops;

/*
* Most of our rbtree usage is for sorting with min extraction, so
@@ -46,6 +47,14 @@ struct cfq_driver_data {

sector_t last_position;
unsigned long last_end_request;
+
+ struct cfq_ops *op;
+
+#ifdef CONFIG_IOSCHED_CFQ_CGROUP
+ /* device siblings */
+ struct rb_root sibling_tree;
+ unsigned int siblings;
+#endif
};

/*
@@ -80,8 +89,26 @@ struct cfq_data {
struct list_head cic_list;

struct cfq_driver_data *cfqdd;
+
+#ifdef CONFIG_IOSCHED_CFQ_CGROUP
+ /* sibling_tree member for cfq_meta_data */
+ struct rb_node sib_node;
+#endif
+};
+
+/*
+ * Module depended optional operations.
+ */
+typedef void (cfq_init_driver_data_opt_fn)(struct cfq_driver_data *,
+ struct cfq_data *);
+struct cfq_ops {
+ cfq_init_driver_data_opt_fn *cfq_init_driver_data_opt_fn;
};

+
extern struct elevator_type iosched_cfq;
+extern struct cfq_data *cfq_init_cfq_data(struct request_queue *,
+ struct cfq_driver_data *, struct cfq_ops *);
+extern void cfq_free_cfq_data(struct cfq_data *cfqd);

#endif /* _LINUX_CFQ_IOSCHED_H */
--
1.5.6.5

2008-11-12 08:36:55

by Satoshi UCHIDA

[permalink] [raw]
Subject: [PATCH][cfq-cgroups][08/12] Interface to new cfq data structure in cfq_cgroup module.


This patch modified interfaces to new cfq_data structure in cfq_cgroup
module. By this patch, interfaces can have two parameter and can set
a cfq data for particular device and particular group.
In cgroupfs,
if the number of argument is one,
a parameter means value which is set for all device, and
if the number of argument is two,
first parameter means value which is set and
second parameter means name which is setting device.
when second parameter is "defaults", default parameter of group
is set by first parameter.
In sysfs,
if the number of argument is one,
a parameter means value which is set for all group, and
if the number of argument is two,
first parameter means value which is set and
second parameter means name which is setting group .


Signed-off-by: Satoshi UCHIDA <[email protected]>

---
block/cfq-cgroup.c | 192 +++++++++++++++++++++++++++++++++++++++---
include/linux/cfq-iosched.h | 2 +
2 files changed, 180 insertions(+), 14 deletions(-)

diff --git a/block/cfq-cgroup.c b/block/cfq-cgroup.c
index 25da08e..99f3d94 100644
--- a/block/cfq-cgroup.c
+++ b/block/cfq-cgroup.c
@@ -123,6 +123,7 @@ static void *cfq_cgroup_init_cfq_data(struct cfq_cgroup *cfqc,
if (!cfqc) {
cfqc = cgroup_to_cfq_cgroup(get_root_subsys(&cfq_subsys));
cfq_cgroup_sibling_tree_add(cfqc, cfqd);
+ cfqd->ioprio = cfqc->ioprio;
} else {
struct cfq_data *__cfqd;
__cfqd = __cfq_cgroup_init_queue(cfqd->cfqdd->queue,
@@ -130,6 +131,7 @@ static void *cfq_cgroup_init_cfq_data(struct cfq_cgroup *cfqc,
if (!__cfqd)
return NULL;
cfq_cgroup_sibling_tree_add(cfqc, __cfqd);
+ __cfqd->ioprio = cfqc->ioprio;
}

/* check and create cfq_data for children */
@@ -294,6 +296,35 @@ static void cfq_cgroup_destroy(struct cgroup_subsys *ss, struct cgroup *cont)
/*
* cgroupfs parts below -->
*/
+static void
+param_separate(const char *master, char *valbuf, char *pathbuf, int size)
+{
+ int i;
+ char *pc1 = (char *) master, *pc2;
+
+ pc2 = valbuf;
+ for (i = 0 ; i < (size - 1) && (*pc1 != ' ') &&
+ (*pc1 != '\n') && (*pc1 != '\0') ; i++) {
+ *pc2 = *pc1;
+ pc2++;
+ pc1++;
+ }
+ *pc2 = '\n'; pc2++; *pc2 = '\0';
+
+ for ( ; (i < (PAGE_SIZE - 1)) && (*pc1 == ' ') &&
+ (*pc1 != '\n') && (*pc1 != '\0') ; i++)
+ pc1++;
+
+ pc2 = pathbuf;
+ for ( ; i < (PAGE_SIZE - 1) && (*pc1 != ' ') &&
+ (*pc1 != '\n') && (*pc1 != '\0') ; i++) {
+ *pc2 = *pc1;
+ pc2++;
+ pc1++;
+ }
+ *pc2 = '\0';
+}
+
static ssize_t cfq_cgroup_read(struct cgroup *cont, struct cftype *cft,
struct file *file, char __user *userbuf,
size_t nbytes, loff_t *ppos)
@@ -301,6 +332,7 @@ static ssize_t cfq_cgroup_read(struct cgroup *cont, struct cftype *cft,
struct cfq_cgroup *cfqc;
char *page;
ssize_t ret;
+ struct rb_node *p;

page = (char *)__get_free_page(GFP_TEMPORARY);
if (!page)
@@ -318,7 +350,20 @@ static ssize_t cfq_cgroup_read(struct cgroup *cont, struct cftype *cft,
cgroup_unlock();

/* print priority */
- ret = snprintf(page, PAGE_SIZE, "%d \n", cfqc->ioprio);
+ ret = snprintf(page, PAGE_SIZE, "default priority: %d\n", cfqc->ioprio);
+
+ p = rb_first(&cfqc->sibling_tree);
+ while (p) {
+ struct cfq_data *__cfqd;
+
+ __cfqd = rb_entry(p, struct cfq_data, group_node);
+
+ ret += snprintf(page + ret, PAGE_SIZE - ret, " %s %d\n",
+ __cfqd->cfqdd->queue->kobj.parent->name,
+ __cfqd->ioprio);
+
+ p = rb_next(p);
+ }

ret = simple_read_from_buffer(userbuf, nbytes, ppos, page, ret);

@@ -334,8 +379,10 @@ static ssize_t cfq_cgroup_write(struct cgroup *cont, struct cftype *cft,
struct cfq_cgroup *cfqc;
ssize_t ret;
long new_prio;
- int err;
+ int err, sn;
char *buffer = NULL;
+ char *valbuf = NULL, *pathbuf = NULL;
+ struct rb_node *p;

cgroup_lock();
if (cgroup_is_removed(cont)) {
@@ -354,23 +401,64 @@ static ssize_t cfq_cgroup_write(struct cgroup *cont, struct cftype *cft,

if (copy_from_user(buffer, userbuf, nbytes)) {
ret = -EFAULT;
- goto out;
+ goto free_buf;
}
buffer[nbytes] = 0;

- err = strict_strtoul(buffer, 10, &new_prio);
+ valbuf = kmalloc(nbytes + 1, GFP_KERNEL);
+ if (!valbuf) {
+ ret = -ENOMEM;
+ goto free_buf;
+ }
+
+ pathbuf = kmalloc(nbytes + 1, GFP_KERNEL);
+ if (!pathbuf) {
+ ret = -ENOMEM;
+ goto free_val;
+ }
+
+ param_separate(buffer, valbuf, pathbuf, nbytes);
+
+ err = strict_strtoul(valbuf, 10, &new_prio);
if ((err) || ((new_prio < 0) || (new_prio > CFQ_CGROUP_MAX_IOPRIO))) {
ret = -EINVAL;
- goto out;
+ goto free_path;
}

- cfqc->ioprio = new_prio;
+ sn = strlen(pathbuf);
+
+ p = rb_first(&cfqc->sibling_tree);
+ while (p) {
+ struct cfq_data *__cfqd;
+ const char *namep;
+
+ __cfqd = rb_entry(p, struct cfq_data, group_node);
+ namep = __cfqd->cfqdd->queue->kobj.parent->name;
+
+ if (sn == 0) {
+ __cfqd->ioprio = new_prio;
+ } else if ((sn == strlen(namep)) &&
+ (strncmp(pathbuf, namep, sn) == 0)) {
+ __cfqd->ioprio = new_prio;
+ break;
+ }
+
+ p = rb_next(p);
+ }
+
+ if ((sn == 0) ||
+ ((sn == 7) && (strncmp(pathbuf, "default", 7) == 0)))
+ cfqc->ioprio = new_prio;

ret = nbytes;

-out:
+free_path:
+ kfree(pathbuf);
+free_val:
+ kfree(valbuf);
+free_buf:
kfree(buffer);
-
+out:
return ret;
}

@@ -404,11 +492,33 @@ static ssize_t
cfq_cgroup_var_show(char *page, struct cfq_data *cfqd,
int (func)(struct cfq_data *))
{
- int val, retval = 0;
+ int err, val, retval = 0;
+ char *pathbuf = NULL;
+ struct rb_node *p;
+
+ pathbuf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ if (!pathbuf)
+ return 0;
+
+ p = rb_first(&cfqd->cfqdd->sibling_tree);
+ while (p) {
+ struct cfq_data *__cfqd;
+ struct cgroup *cgrp;

- val = func(cfqd);
+ __cfqd = rb_entry(p, struct cfq_data, sib_node);
+ cgrp = __cfqd->cfqc->css.cgroup;

- retval = snprintf(page, PAGE_SIZE, "%d\n", val);
+ err = cgroup_path(cgrp, pathbuf, PAGE_SIZE);
+ if (err)
+ break;
+ val = func(__cfqd);
+
+ retval += snprintf(page + retval, PAGE_SIZE - retval,
+ "%s %d\n", pathbuf, val);
+ p = rb_next(p);
+ }
+
+ kfree(pathbuf);

return retval;
}
@@ -437,21 +547,73 @@ SHOW_FUNCTION(cfq_cgroup_slice_idle_show, cfq_slice_idle, 1);
SHOW_FUNCTION(cfq_cgroup_slice_sync_show, cfq_slice[1], 1);
SHOW_FUNCTION(cfq_cgroup_slice_async_show, cfq_slice[0], 1);
SHOW_FUNCTION(cfq_cgroup_slice_async_rq_show, cfq_slice_async_rq, 0);
+SHOW_FUNCTION(cfq_cgroup_ioprio_show, ioprio, 0);
#undef SHOW_FUNCTION

static ssize_t
cfq_cgroup_var_store(const char *page, size_t count, struct cfq_data *cfqd,
void (func)(struct cfq_data *, unsigned int))
{
- int err;
+ int err, sn;
unsigned long val;
+ char *valbuf = NULL, *setpathbuf = NULL, *pathbuf = NULL;
+ struct rb_node *p;
+
+ valbuf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ if (!valbuf) {
+ count = 0;
+ goto out;
+ }

- err = strict_strtoul(page, 10, &val);
+ setpathbuf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ if (!setpathbuf) {
+ count = 0;
+ goto free_val;
+ }
+
+ pathbuf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ if (!pathbuf) {
+ count = 0;
+ goto free_setpath;
+ }
+
+ param_separate(page, valbuf, setpathbuf, PAGE_SIZE);
+
+ err = strict_strtoul(valbuf, 10, &val);
if (err)
return 0;

- func(cfqd, val);
+ sn = strlen(setpathbuf);
+
+ p = rb_first(&cfqd->cfqdd->sibling_tree);
+ while (p) {
+ struct cfq_data *__cfqd;
+ struct cgroup *cgrp;

+ __cfqd = rb_entry(p, struct cfq_data, sib_node);
+ cgrp = __cfqd->cfqc->css.cgroup;
+
+ err = cgroup_path(cgrp, pathbuf, PAGE_SIZE);
+ if (err)
+ break;
+
+ if (sn == 0) {
+ func(__cfqd, val);
+ } else if ((sn == strlen(pathbuf)) &&
+ (strncmp(setpathbuf, pathbuf, sn) == 0)) {
+ func(__cfqd, val);
+ break;
+ }
+
+ p = rb_next(p);
+ }
+
+ kfree(pathbuf);
+free_setpath:
+ kfree(setpathbuf);
+free_val:
+ kfree(valbuf);
+out:
return count;
}

@@ -489,6 +651,7 @@ STORE_FUNCTION(cfq_cgroup_slice_sync_store, cfq_slice[1], 1, UINT_MAX, 1);
STORE_FUNCTION(cfq_cgroup_slice_async_store, cfq_slice[0], 1, UINT_MAX, 1);
STORE_FUNCTION(cfq_cgroup_slice_async_rq_store, cfq_slice_async_rq, 1,
UINT_MAX, 0);
+STORE_FUNCTION(cfq_cgroup_ioprio_store, ioprio, 0, CFQ_CGROUP_MAX_IOPRIO, 0);
#undef STORE_FUNCTION

#define CFQ_CGROUP_ATTR(name) \
@@ -505,6 +668,7 @@ static struct elv_fs_entry cfq_cgroup_attrs[] = {
CFQ_CGROUP_ATTR(slice_async),
CFQ_CGROUP_ATTR(slice_async_rq),
CFQ_CGROUP_ATTR(slice_idle),
+ CFQ_CGROUP_ATTR(ioprio),
__ATTR_NULL
};

diff --git a/include/linux/cfq-iosched.h b/include/linux/cfq-iosched.h
index 382fc0a..b58d476 100644
--- a/include/linux/cfq-iosched.h
+++ b/include/linux/cfq-iosched.h
@@ -91,6 +91,8 @@ struct cfq_data {
struct cfq_driver_data *cfqdd;

#ifdef CONFIG_IOSCHED_CFQ_CGROUP
+ unsigned int ioprio;
+
/* sibling_tree member for cfq_meta_data */
struct rb_node sib_node;

--
1.5.6.5

2008-11-12 08:36:34

by Satoshi UCHIDA

[permalink] [raw]
Subject: [PATCH][cfq-cgroups][07/12] Add sibling tree control for group data(cfq_cgroup).


This patch adds a tree control for siblings of group data(cfq_cgroup).
This tree controls cfq data(cfq_data) for same group, and is used mainly
when a new group is registerecd or a existed group is unregistered.


Signed-off-by: Satoshi UCHIDA <[email protected]>

---
block/cfq-cgroup.c | 126 +++++++++++++++++++++++++++++++++++++++++++
include/linux/cfq-iosched.h | 5 ++
include/linux/cgroup.h | 1 +
kernel/cgroup.c | 5 ++
4 files changed, 137 insertions(+), 0 deletions(-)

diff --git a/block/cfq-cgroup.c b/block/cfq-cgroup.c
index ce35af2..25da08e 100644
--- a/block/cfq-cgroup.c
+++ b/block/cfq-cgroup.c
@@ -22,8 +22,12 @@ static struct cfq_ops cfq_cgroup_op;
struct cfq_cgroup {
struct cgroup_subsys_state css;
unsigned int ioprio;
+
+ struct rb_root sibling_tree;
+ unsigned int siblings;
};

+
static inline struct cfq_cgroup *cgroup_to_cfq_cgroup(struct cgroup *cont)
{
return container_of(cgroup_subsys_state(cont, cfq_subsys_id),
@@ -77,6 +81,68 @@ static void cfq_driver_sibling_tree_add(struct cfq_driver_data *cfqdd,
cfqd->cfqdd = cfqdd;
}

+static void cfq_cgroup_sibling_tree_add(struct cfq_cgroup *cfqc,
+ struct cfq_data *cfqd)
+{
+ struct rb_node **p;
+ struct rb_node *parent = NULL;
+
+ BUG_ON(!RB_EMPTY_NODE(&cfqd->group_node));
+
+ p = &cfqc->sibling_tree.rb_node;
+
+ while (*p) {
+ struct cfq_data *__cfqd;
+ struct rb_node **n;
+
+ parent = *p;
+ __cfqd = rb_entry(parent, struct cfq_data, group_node);
+
+ if (cfqd->cfqdd < __cfqd->cfqdd)
+ n = &(*p)->rb_left;
+ else
+ n = &(*p)->rb_right;
+ p = n;
+ }
+
+ rb_link_node(&cfqd->group_node, parent, p);
+ rb_insert_color(&cfqd->group_node, &cfqc->sibling_tree);
+ cfqc->siblings++;
+ cfqd->cfqc = cfqc;
+}
+
+static struct cfq_data *
+__cfq_cgroup_init_queue(struct request_queue *, struct cfq_driver_data *);
+
+static void *cfq_cgroup_init_cfq_data(struct cfq_cgroup *cfqc,
+ struct cfq_data *cfqd)
+{
+ struct cgroup *child;
+
+ /* setting cfq_data for cfq_cgroup */
+ if (!cfqc) {
+ cfqc = cgroup_to_cfq_cgroup(get_root_subsys(&cfq_subsys));
+ cfq_cgroup_sibling_tree_add(cfqc, cfqd);
+ } else {
+ struct cfq_data *__cfqd;
+ __cfqd = __cfq_cgroup_init_queue(cfqd->cfqdd->queue,
+ cfqd->cfqdd);
+ if (!__cfqd)
+ return NULL;
+ cfq_cgroup_sibling_tree_add(cfqc, __cfqd);
+ }
+
+ /* check and create cfq_data for children */
+ if (cfqc->css.cgroup)
+ list_for_each_entry(child, &cfqc->css.cgroup->children,
+ sibling){
+ cfq_cgroup_init_cfq_data(cgroup_to_cfq_cgroup(child),
+ cfqd);
+ }
+
+ return cfqc;
+}
+
static struct cfq_data *
__cfq_cgroup_init_queue(struct request_queue *q, struct cfq_driver_data *cfqdd)
{
@@ -86,9 +152,13 @@ __cfq_cgroup_init_queue(struct request_queue *q, struct cfq_driver_data *cfqdd)
return NULL;

RB_CLEAR_NODE(&cfqd->sib_node);
+ RB_CLEAR_NODE(&cfqd->group_node);

cfq_driver_sibling_tree_add(cfqd->cfqdd, cfqd);

+ if (!cfqdd)
+ cfq_cgroup_init_cfq_data(NULL, cfqd);
+
return cfqd;
}

@@ -101,6 +171,28 @@ static void *cfq_cgroup_init_queue(struct request_queue *q)
return cfqd;
}

+static void *cfq_cgroup_init_cgroup(struct cfq_cgroup *cfqc,
+ struct cgroup *parent)
+{
+ struct rb_node *p;
+ if (parent) {
+ struct cfq_cgroup *cfqc_p = cgroup_to_cfq_cgroup(parent);
+
+ p = rb_first(&cfqc_p->sibling_tree);
+ while (p) {
+ struct cfq_data *__cfqd;
+ __cfqd = rb_entry(p, struct cfq_data, group_node);
+
+ cfq_cgroup_init_cfq_data(cfqc, __cfqd);
+
+ p = rb_next(p);
+ }
+ }
+
+ return cfqc;
+}
+
+
static struct cgroup_subsys_state *
cfq_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
{
@@ -118,6 +210,12 @@ cfq_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)

cfqc->ioprio = 3;

+ cfqc->sibling_tree = RB_ROOT;
+ cfqc->siblings = 0;
+
+ if (!cfq_cgroup_init_cgroup(cfqc, cont->parent))
+ return ERR_PTR(-ENOMEM);
+
return &cfqc->css;
}

@@ -132,6 +230,13 @@ static void cfq_cgroup_erase_driver_siblings(struct cfq_driver_data *cfqdd,
cfqdd->siblings--;
}

+static void cfq_cgroup_erase_cgroup_siblings(struct cfq_cgroup *cfqc,
+ struct cfq_data *cfqd)
+{
+ rb_erase(&cfqd->group_node, &cfqc->sibling_tree);
+ cfqc->siblings--;
+}
+
static void cfq_exit_device_group(struct cfq_driver_data *cfqdd)
{
struct rb_node *p, *n;
@@ -144,6 +249,7 @@ static void cfq_exit_device_group(struct cfq_driver_data *cfqdd)
cfqd = rb_entry(p, struct cfq_data, sib_node);

cfq_cgroup_erase_driver_siblings(cfqdd, cfqd);
+ cfq_cgroup_erase_cgroup_siblings(cfqd->cfqc, cfqd);
cfq_free_cfq_data(cfqd);

p = n;
@@ -159,8 +265,28 @@ static void cfq_cgroup_exit_queue(elevator_t *e)
kfree(cfqdd);
}

+static void cfq_exit_cgroup(struct cfq_cgroup *cfqc)
+{
+ struct rb_node *p, *n;
+ struct cfq_data *cfqd;
+
+ p = rb_first(&cfqc->sibling_tree);
+
+ while (p) {
+ n = rb_next(p);
+ cfqd = rb_entry(p, struct cfq_data, group_node);
+
+ cfq_cgroup_erase_driver_siblings(cfqd->cfqdd, cfqd);
+ cfq_cgroup_erase_cgroup_siblings(cfqc, cfqd);
+ cfq_free_cfq_data(cfqd);
+
+ p = n;
+ }
+}
+
static void cfq_cgroup_destroy(struct cgroup_subsys *ss, struct cgroup *cont)
{
+ cfq_exit_cgroup(cgroup_to_cfq_cgroup(cont));
kfree(cgroup_to_cfq_cgroup(cont));
}

diff --git a/include/linux/cfq-iosched.h b/include/linux/cfq-iosched.h
index 22d1aed..382fc0a 100644
--- a/include/linux/cfq-iosched.h
+++ b/include/linux/cfq-iosched.h
@@ -93,6 +93,11 @@ struct cfq_data {
#ifdef CONFIG_IOSCHED_CFQ_CGROUP
/* sibling_tree member for cfq_meta_data */
struct rb_node sib_node;
+
+ /* cfq_cgroup attribute */
+ struct cfq_cgroup *cfqc;
+ /* group_tree member for cfq_cgroup */
+ struct rb_node group_node;
#endif
};

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 8b00f66..4bfd815 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -402,6 +402,7 @@ struct task_struct *cgroup_iter_next(struct cgroup *cgrp,
void cgroup_iter_end(struct cgroup *cgrp, struct cgroup_iter *it);
int cgroup_scan_tasks(struct cgroup_scanner *scan);
int cgroup_attach_task(struct cgroup *, struct task_struct *);
+struct cgroup *get_root_subsys(struct cgroup_subsys *css);

void cgroup_mm_owner_callbacks(struct task_struct *old,
struct task_struct *new);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 35eebd5..71bb335 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1316,6 +1316,11 @@ static int cgroup_tasks_write(struct cgroup *cgrp, struct cftype *cft, u64 pid)
return ret;
}

+struct cgroup *get_root_subsys(struct cgroup_subsys *css)
+{
+ return &css->root->top_cgroup;
+}
+
/* The various types of files and directories in a cgroup file system */
enum cgroup_filetype {
FILE_ROOT,
--
1.5.6.5

2008-11-12 08:37:19

by Satoshi UCHIDA

[permalink] [raw]
Subject: [PATCH][cfq-cgroups][09/12] Develop service tree control.


This patch introduces and controls a service tree for cfq data, namely
group layer control.
This functions expand IPRIO_BE class section of traditional CFQ scheduler.


Signed-off-by: Satoshi UCHIDA <[email protected]>

---
block/cfq-cgroup.c | 266 +++++++++++++++++++++++++++++++++++++++++++
block/cfq-iosched.c | 32 ++++-
include/linux/cfq-iosched.h | 32 +++++-
3 files changed, 323 insertions(+), 7 deletions(-)

diff --git a/block/cfq-cgroup.c b/block/cfq-cgroup.c
index 99f3d94..ff652fe 100644
--- a/block/cfq-cgroup.c
+++ b/block/cfq-cgroup.c
@@ -15,8 +15,11 @@
#include <linux/cgroup.h>
#include <linux/cfq-iosched.h>

+#define CFQ_CGROUP_SLICE_SCALE (5)
#define CFQ_CGROUP_MAX_IOPRIO (7)

+static const int cfq_cgroup_slice = HZ / 10;
+
static struct cfq_ops cfq_cgroup_op;

struct cfq_cgroup {
@@ -27,6 +30,28 @@ struct cfq_cgroup {
unsigned int siblings;
};

+enum cfqd_state_flags {
+ CFQ_CFQD_FLAG_on_rr = 0, /* on round-robin busy list */
+ CFQ_CFQD_FLAG_slice_new, /* no requests dispatched in slice */
+};
+
+#define CFQ_CFQD_FNS(name) \
+static inline void cfq_mark_cfqd_##name(struct cfq_data *cfqd) \
+{ \
+ (cfqd)->flags |= (1 << CFQ_CFQD_FLAG_##name); \
+} \
+static inline void cfq_clear_cfqd_##name(struct cfq_data *cfqd) \
+{ \
+ (cfqd)->flags &= ~(1 << CFQ_CFQD_FLAG_##name); \
+} \
+static inline int cfq_cfqd_##name(const struct cfq_data *cfqd) \
+{ \
+ return ((cfqd)->flags & (1 << CFQ_CFQD_FLAG_##name)) != 0; \
+}
+
+CFQ_CFQD_FNS(on_rr);
+CFQ_CFQD_FNS(slice_new);
+#undef CFQ_CFQD_FNS

static inline struct cfq_cgroup *cgroup_to_cfq_cgroup(struct cgroup *cont)
{
@@ -49,6 +74,11 @@ static void cfq_cgroup_init_driver_data_opt(struct cfq_driver_data *cfqdd,
{
cfqdd->sibling_tree = RB_ROOT;
cfqdd->siblings = 0;
+
+ cfqdd->service_tree = CFQ_RB_ROOT;
+ cfqdd->busy_data = 0;
+
+ cfqdd->cfq_cgroup_slice = cfq_cgroup_slice;
}

static void cfq_driver_sibling_tree_add(struct cfq_driver_data *cfqdd,
@@ -155,6 +185,8 @@ __cfq_cgroup_init_queue(struct request_queue *q, struct cfq_driver_data *cfqdd)

RB_CLEAR_NODE(&cfqd->sib_node);
RB_CLEAR_NODE(&cfqd->group_node);
+ RB_CLEAR_NODE(&cfqd->rb_node);
+ cfqd->rb_key = 0;

cfq_driver_sibling_tree_add(cfqd->cfqdd, cfqd);

@@ -294,6 +326,237 @@ static void cfq_cgroup_destroy(struct cgroup_subsys *ss, struct cgroup *cont)


/*
+ * service tree control.
+ */
+static inline int cfq_cgroup_slice_used(struct cfq_data *cfqd)
+{
+ if (cfq_cfqd_slice_new(cfqd))
+ return 0;
+ if (time_before(jiffies, cfqd->slice_end))
+ return 0;
+
+ return 1;
+}
+
+static inline int
+cfq_cgroup_prio_slice(struct cfq_data *cfqd, unsigned short prio)
+{
+ const int base_slice = cfqd->cfqdd->cfq_cgroup_slice;
+
+ WARN_ON(prio >= IOPRIO_BE_NR);
+
+ return base_slice + (base_slice/CFQ_CGROUP_SLICE_SCALE *
+ (CFQ_CGROUP_MAX_IOPRIO / 2 - prio));
+}
+
+static inline void
+cfq_cgroup_set_prio_slice(struct cfq_data *cfqd)
+{
+ cfqd->slice_end = cfq_cgroup_prio_slice(cfqd, cfqd->ioprio)
+ + jiffies;
+}
+
+static unsigned long cfq_cgroup_slice_offset(struct cfq_data *cfqd)
+{
+ return (cfqd->cfqdd->busy_data - 1) *
+ (cfq_cgroup_prio_slice(cfqd, 0) -
+ cfq_cgroup_prio_slice(cfqd, cfqd->ioprio));
+}
+
+static void cfq_cgroup_service_tree_add(struct cfq_data *cfqd, int add_front)
+{
+ struct rb_node **p, *parent;
+ struct cfq_data *__cfqd;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
+ unsigned long rb_key;
+ int left;
+
+ if (!add_front) {
+ rb_key = cfq_cgroup_slice_offset(cfqd) + jiffies;
+ rb_key += cfqd->slice_resid;
+ cfqd->slice_resid = 0;
+ } else
+ rb_key = 0;
+
+ if (!RB_EMPTY_NODE(&cfqd->rb_node)) {
+ if (rb_key == cfqd->rb_key)
+ return;
+ cfq_rb_erase(&cfqd->rb_node, &cfqdd->service_tree);
+ }
+
+ left = 1;
+ parent = NULL;
+ p = &cfqdd->service_tree.rb.rb_node;
+ while (*p) {
+ struct rb_node **n;
+
+ parent = *p;
+ __cfqd = rb_entry(parent, struct cfq_data, rb_node);
+
+ if (rb_key < __cfqd->rb_key)
+ n = &(*p)->rb_left;
+ else
+ n = &(*p)->rb_right;
+
+ if (n == &(*p)->rb_right)
+ left = 0;
+
+ p = n;
+ }
+
+ if (left)
+ cfqdd->service_tree.left = &cfqd->rb_node;
+
+ cfqd->rb_key = rb_key;
+ rb_link_node(&cfqd->rb_node, parent, p);
+ rb_insert_color(&cfqd->rb_node, &cfqdd->service_tree.rb);
+}
+
+static void __cfq_cgroup_slice_expired(struct cfq_driver_data *cfqdd,
+ struct cfq_data *cfqd, int timed_out)
+{
+ if (timed_out && !cfq_cfqd_slice_new(cfqd))
+ cfqd->slice_resid = cfqd->slice_end - jiffies;
+
+ if (cfq_cfqd_on_rr(cfqd))
+ cfq_cgroup_service_tree_add(cfqd, 0);
+
+ if (cfqd == cfqdd->active_data)
+ cfqdd->active_data = NULL;
+}
+
+static inline void
+cfq_cgroup_slice_expired(struct cfq_driver_data *cfqdd, int timed_out)
+{
+ struct cfq_data *cfqd = cfqdd->active_data;
+
+ if (cfqd) {
+ cfq_slice_expired(cfqd, 1);
+ __cfq_cgroup_slice_expired(cfqdd, cfqd, timed_out);
+ }
+}
+
+static struct cfq_data *cfq_cgroup_rb_first(struct cfq_rb_root *root)
+{
+ if (!root->left)
+ root->left = rb_first(&root->rb);
+
+ if (root->left)
+ return rb_entry(root->left, struct cfq_data, rb_node);
+
+ return NULL;
+}
+
+static struct cfq_data *cfq_cgroup_get_next_data(struct cfq_driver_data *cfqdd)
+{
+ if (RB_EMPTY_ROOT(&cfqdd->service_tree.rb))
+ return NULL;
+
+ return cfq_cgroup_rb_first(&cfqdd->service_tree);
+}
+
+static void __cfq_cgroup_set_active_data(struct cfq_driver_data*cfqdd,
+ struct cfq_data *cfqd)
+{
+ if (cfqd) {
+ cfqd->slice_end = 0;
+ cfq_mark_cfqd_slice_new(cfqd);
+ }
+
+ cfqdd->active_data = cfqd;
+}
+
+static struct cfq_data *
+cfq_cgroup_set_active_data(struct cfq_driver_data *cfqdd)
+{
+ struct cfq_data *cfqd;
+
+ cfqd = cfq_cgroup_get_next_data(cfqdd);
+ __cfq_cgroup_set_active_data(cfqdd , cfqd);
+
+ return cfqd;
+}
+
+struct cfq_data *cfq_cgroup_select_data(struct cfq_driver_data *cfqdd)
+{
+ struct cfq_data *cfqd;
+
+ cfqd = cfqdd->active_data;
+ if (!cfqd)
+ goto new_data;
+
+ if (cfq_cgroup_slice_used(cfqd))
+ goto expire;
+
+ if (!RB_EMPTY_ROOT(&cfqd->service_tree.rb))
+ goto keep_data;
+
+ if (wait_request_checker(cfqd))
+ goto keep_data;
+
+expire:
+ cfq_cgroup_slice_expired(cfqdd, 0);
+new_data:
+ cfqd = cfq_cgroup_set_active_data(cfqdd);
+keep_data:
+ return cfqd;
+}
+
+int cfq_cgroup_forced_dispatch(struct cfq_data *cfqd)
+{
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
+ int dispatched = 0;
+
+ while ((cfqd = cfq_cgroup_rb_first(&cfqdd->service_tree)) != NULL)
+ dispatched += cfq_forced_dispatch(cfqd);
+
+ cfq_cgroup_slice_expired(cfqdd, 0);
+
+ BUG_ON(cfqdd->busy_data);
+
+ return dispatched;
+}
+
+int cfq_cgroup_dispatch_requests(struct request_queue *q, int force)
+{
+ struct cfq_data *cfqd = q->elevator->elevator_data;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
+ int dispatched;
+
+ if (!cfqdd->busy_data)
+ return 0;
+
+ if (unlikely(force))
+ return cfq_cgroup_forced_dispatch(cfqd);
+
+ dispatched = 0;
+ cfqd = cfq_cgroup_select_data(cfqdd);
+
+ if (cfqd)
+ dispatched = cfq_queue_dispatch_requests(cfqd, force);
+
+ return dispatched;
+}
+
+int cfq_cgroup_completed_request_opt(struct cfq_data *cfqd)
+{
+ if (cfqd->cfqdd->active_data == cfqd) {
+ if (cfq_cfqd_slice_new(cfqd)) {
+ cfq_cgroup_set_prio_slice(cfqd);
+ cfq_clear_cfqd_slice_new(cfqd);
+
+ }
+ if (cfq_cgroup_slice_used(cfqd)) {
+ cfq_cgroup_slice_expired(cfqd->cfqdd, 1);
+ return 0;
+ }
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
* cgroupfs parts below -->
*/
static void
@@ -680,6 +943,7 @@ static struct elevator_type iosched_cfq_cgroup = {

static struct cfq_ops cfq_cgroup_op = {
.cfq_init_driver_data_opt_fn = cfq_cgroup_init_driver_data_opt,
+ .cfq_completed_request_opt_fn = cfq_cgroup_completed_request_opt,
};

static int __init cfq_cgroup_init(void)
@@ -687,6 +951,8 @@ static int __init cfq_cgroup_init(void)
iosched_cfq_cgroup.ops = iosched_cfq.ops;
iosched_cfq_cgroup.ops.elevator_init_fn = cfq_cgroup_init_queue;
iosched_cfq_cgroup.ops.elevator_exit_fn = cfq_cgroup_exit_queue;
+ iosched_cfq_cgroup.ops.elevator_dispatch_fn =
+ cfq_cgroup_dispatch_requests,

elv_register(&iosched_cfq_cgroup);

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index fd1ed0c..5fbef85 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -354,7 +354,7 @@ static struct cfq_queue *cfq_rb_first(struct cfq_rb_root *root)
return NULL;
}

-static void cfq_rb_erase(struct rb_node *n, struct cfq_rb_root *root)
+void cfq_rb_erase(struct rb_node *n, struct cfq_rb_root *root)
{
if (root->left == n)
root->left = NULL;
@@ -751,7 +751,7 @@ __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq,
}
}

-static inline void cfq_slice_expired(struct cfq_data *cfqd, int timed_out)
+inline void cfq_slice_expired(struct cfq_data *cfqd, int timed_out)
{
struct cfq_queue *cfqq = cfqd->active_queue;

@@ -932,6 +932,16 @@ cfq_prio_to_maxrq(struct cfq_data *cfqd, struct cfq_queue *cfqq)
return 2 * (base_rq + base_rq * (CFQ_PRIO_LISTS - 1 - cfqq->ioprio));
}

+int wait_request_checker(struct cfq_data *cfqd)
+{
+ struct cfq_queue *cfqq = cfqd->active_queue;
+ if (cfqq)
+ return timer_pending(&cfqd->cfqdd->idle_slice_timer)
+ || (cfqq->dispatched && cfq_cfqq_idle_window(cfqq));
+ else
+ return 0;
+}
+
/*
* Select a queue for service. If we have a current active queue,
* check whether to continue servicing it, or retrieve and set a new one.
@@ -1047,7 +1057,7 @@ static int __cfq_forced_dispatch_cfqq(struct cfq_queue *cfqq)
* Drain our current requests. Used for barriers and when switching
* io schedulers on-the-fly.
*/
-static int cfq_forced_dispatch(struct cfq_data *cfqd)
+int cfq_forced_dispatch(struct cfq_data *cfqd)
{
struct cfq_queue *cfqq;
int dispatched = 0;
@@ -1063,9 +1073,8 @@ static int cfq_forced_dispatch(struct cfq_data *cfqd)
return dispatched;
}

-static int cfq_dispatch_requests(struct request_queue *q, int force)
+int cfq_queue_dispatch_requests(struct cfq_data *cfqd, int force)
{
- struct cfq_data *cfqd = q->elevator->elevator_data;
struct cfq_queue *cfqq;
struct cfq_driver_data *cfqdd = cfqd->cfqdd;
int dispatched;
@@ -1105,6 +1114,13 @@ static int cfq_dispatch_requests(struct request_queue *q, int force)
return dispatched;
}

+static int cfq_dispatch_requests(struct request_queue *q, int force)
+{
+ struct cfq_data *cfqd = q->elevator->elevator_data;
+
+ return cfq_queue_dispatch_requests(cfqd, force);
+}
+
/*
* task holds one reference to the queue, dropped when task exits. each rq
* in-flight on this queue also holds a reference, dropped when rq is freed.
@@ -1876,6 +1892,7 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
struct cfq_driver_data *cfqdd = cfqd->cfqdd;
const int sync = rq_is_sync(rq);
unsigned long now;
+ int flag = 1;

now = jiffies;
cfq_log_cfqq(cfqd, cfqq, "complete");
@@ -1900,7 +1917,10 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
* If this is the active queue, check if it needs to be expired,
* or if we want to idle in case it has no pending requests.
*/
- if (cfqd->active_queue == cfqq) {
+ if (cfqd->cfqdd->op->cfq_completed_request_opt_fn)
+ flag = cfqd->cfqdd->op->cfq_completed_request_opt_fn(cfqd);
+
+ if ((flag) && (cfqd->active_queue == cfqq)) {
if (cfq_cfqq_slice_new(cfqq)) {
cfq_set_prio_slice(cfqd, cfqq);
cfq_clear_cfqq_slice_new(cfqq);
diff --git a/include/linux/cfq-iosched.h b/include/linux/cfq-iosched.h
index b58d476..30702c9 100644
--- a/include/linux/cfq-iosched.h
+++ b/include/linux/cfq-iosched.h
@@ -54,6 +54,15 @@ struct cfq_driver_data {
/* device siblings */
struct rb_root sibling_tree;
unsigned int siblings;
+
+ /*
+ * rr list of cfq_data with requests and the count of them
+ */
+ struct cfq_rb_root service_tree;
+ unsigned int busy_data;
+ struct cfq_data *active_data;
+
+ unsigned int cfq_cgroup_slice;
#endif
};

@@ -100,6 +109,20 @@ struct cfq_data {
struct cfq_cgroup *cfqc;
/* group_tree member for cfq_cgroup */
struct rb_node group_node;
+
+ /* service_tree member */
+ struct rb_node rb_node;
+ /* service_tree key */
+ unsigned long rb_key;
+
+ /*
+ * slice parameter
+ */
+ unsigned long slice_end;
+ long slice_resid;
+
+ /* various state flags, see below */
+ unsigned int flags;
#endif
};

@@ -108,14 +131,21 @@ struct cfq_data {
*/
typedef void (cfq_init_driver_data_opt_fn)(struct cfq_driver_data *,
struct cfq_data *);
+typedef int (cfq_completed_request_opt_fn)(struct cfq_data *);
struct cfq_ops {
cfq_init_driver_data_opt_fn *cfq_init_driver_data_opt_fn;
+ cfq_completed_request_opt_fn *cfq_completed_request_opt_fn;
};


extern struct elevator_type iosched_cfq;
extern struct cfq_data *cfq_init_cfq_data(struct request_queue *,
struct cfq_driver_data *, struct cfq_ops *);
-extern void cfq_free_cfq_data(struct cfq_data *cfqd);
+extern void cfq_free_cfq_data(struct cfq_data *);
+extern void cfq_rb_erase(struct rb_node *, struct cfq_rb_root *);
+extern void cfq_slice_expired(struct cfq_data *, int);
+extern int wait_request_checker(struct cfq_data *cfqd);
+extern int cfq_forced_dispatch(struct cfq_data *);
+extern int cfq_queue_dispatch_requests(struct cfq_data *, int);

#endif /* _LINUX_CFQ_IOSCHED_H */
--
1.5.6.5

2008-11-12 08:38:03

by Satoshi UCHIDA

[permalink] [raw]
Subject: [PATCH][cfq-cgroups][11/12] Expand idle slice timer function.


This patch expandes function of idle slice timer.


Signed-off-by: Satoshi UCHIDA <[email protected]>

---
block/cfq-cgroup.c | 45 +++++++++++++++++++++++++++++++++++++++++++
block/cfq-iosched.c | 29 ++++++++++++++++++++++-----
include/linux/cfq-iosched.h | 3 ++
3 files changed, 71 insertions(+), 6 deletions(-)

diff --git a/block/cfq-cgroup.c b/block/cfq-cgroup.c
index f3e9f40..4938fa0 100644
--- a/block/cfq-cgroup.c
+++ b/block/cfq-cgroup.c
@@ -69,9 +69,16 @@ static inline struct cfq_cgroup *task_to_cfq_cgroup(struct task_struct *tsk)
/*
* Add device or cgroup data functions.
*/
+static void cfq_cgroup_idle_slice_timer(unsigned long data);
+
static void cfq_cgroup_init_driver_data_opt(struct cfq_driver_data *cfqdd,
struct cfq_data *cfqd)
{
+ cfqdd->elv_data = cfqd;
+
+ cfqdd->idle_slice_timer.function = cfq_cgroup_idle_slice_timer;
+ cfqdd->idle_slice_timer.data = (unsigned long) cfqdd;
+
cfqdd->sibling_tree = RB_ROOT;
cfqdd->siblings = 0;

@@ -623,6 +630,44 @@ static int cfq_cgroup_is_active_data(struct cfq_data *cfqd)


/*
+ * Timer running if the active_queue is currently idling inside its time slice
+ */
+static void cfq_cgroup_idle_slice_timer(unsigned long data)
+{
+ struct cfq_driver_data *cfqdd = (struct cfq_driver_data *) data;
+ struct cfq_data *cfqd;
+ int timed_out = 1;
+ unsigned long flags;
+
+ spin_lock_irqsave(cfqdd->queue->queue_lock, flags);
+
+ cfqd = cfqdd->active_data;
+ if (cfqd) {
+ timed_out = 0;
+
+ if (cfq_cgroup_slice_used(cfqd))
+ goto expire_cgroup;
+
+ if (!cfqdd->busy_data)
+ goto out_cont;
+
+ if (__cfq_idle_slice_timer(cfqd))
+ goto out_cont;
+ else
+ goto out_kick;
+
+ }
+expire_cgroup:
+ cfq_cgroup_slice_expired(cfqdd, timed_out);
+out_kick:
+ cfq_schedule_dispatch(cfqdd->elv_data);
+out_cont:
+ spin_unlock_irqrestore(cfqdd->queue->queue_lock,
+ flags);
+}
+
+
+/*
* cgroupfs parts below -->
*/
static void
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 5fe0551..edc23e5 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -2122,18 +2122,13 @@ static void cfq_kick_queue(struct work_struct *work)
/*
* Timer running if the active_queue is currently idling inside its time slice
*/
-static void cfq_idle_slice_timer(unsigned long data)
+inline int __cfq_idle_slice_timer(struct cfq_data *cfqd)
{
- struct cfq_data *cfqd = (struct cfq_data *) data;
struct cfq_queue *cfqq;
- struct cfq_driver_data *cfqdd = cfqd->cfqdd;
- unsigned long flags;
int timed_out = 1;

cfq_log(cfqd, "idle timer fired");

- spin_lock_irqsave(cfqdd->queue->queue_lock, flags);
-
cfqq = cfqd->active_queue;
if (cfqq) {
timed_out = 0;
@@ -2163,7 +2158,21 @@ expire:
cfq_slice_expired(cfqd, timed_out);
out_kick:
cfq_schedule_dispatch(cfqd);
+ return 1;
out_cont:
+ return 0;
+}
+
+static void cfq_idle_slice_timer(unsigned long data)
+{
+ struct cfq_data *cfqd = (struct cfq_data *) data;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
+ unsigned long flags;
+
+ spin_lock_irqsave(cfqdd->queue->queue_lock, flags);
+
+ __cfq_idle_slice_timer(cfqd);
+
spin_unlock_irqrestore(cfqdd->queue->queue_lock, flags);
}

@@ -2226,6 +2235,13 @@ static void cfq_exit_queue(elevator_t *e)
kfree(cfqdd);
}

+static void
+cfq_init_driver_data_opt(struct cfq_driver_data *cfqdd, struct cfq_data *cfqd)
+{
+ cfqdd->idle_slice_timer.function = cfq_idle_slice_timer;
+ cfqdd->idle_slice_timer.data = (unsigned long) cfqd;
+}
+
static struct cfq_driver_data *
cfq_init_driver_data(struct request_queue *q, struct cfq_data *cfqd,
struct cfq_ops *op)
@@ -2441,6 +2457,7 @@ struct elevator_type iosched_cfq = {
};

static struct cfq_ops cfq_op = {
+ .cfq_init_driver_data_opt_fn = cfq_init_driver_data_opt,
};

static int __init cfq_init(void)
diff --git a/include/linux/cfq-iosched.h b/include/linux/cfq-iosched.h
index 7287186..920bcb5 100644
--- a/include/linux/cfq-iosched.h
+++ b/include/linux/cfq-iosched.h
@@ -24,6 +24,7 @@ struct cfq_rb_root {
* Driver unique data structure
*/
struct cfq_driver_data {
+ struct cfq_data *elv_data;
struct request_queue *queue;

int rq_in_driver;
@@ -156,5 +157,7 @@ extern void cfq_slice_expired(struct cfq_data *, int);
extern int wait_request_checker(struct cfq_data *cfqd);
extern int cfq_forced_dispatch(struct cfq_data *);
extern int cfq_queue_dispatch_requests(struct cfq_data *, int);
+extern int __cfq_idle_slice_timer(struct cfq_data *cfqd);
+extern void cfq_schedule_dispatch(struct cfq_data *cfqd);

#endif /* _LINUX_CFQ_IOSCHED_H */
--
1.5.6.5

2008-11-12 08:37:45

by Satoshi UCHIDA

[permalink] [raw]
Subject: [PATCH][cfq-cgroups][10/12] Introduce request control for two layer.


This patch controls requests according to two layer mechanism.
As it is, functions searches corresponding cfq data and
activates/deactivates cfq data.


Signed-off-by: Satoshi UCHIDA <[email protected]>

---
block/cfq-cgroup.c | 71 +++++++++++++++++++++++++++++++++++++++++++
block/cfq-iosched.c | 57 +++++++++++++++++++++++++++-------
include/linux/cfq-iosched.h | 9 +++++
3 files changed, 125 insertions(+), 12 deletions(-)

diff --git a/block/cfq-cgroup.c b/block/cfq-cgroup.c
index ff652fe..f3e9f40 100644
--- a/block/cfq-cgroup.c
+++ b/block/cfq-cgroup.c
@@ -557,6 +557,72 @@ int cfq_cgroup_completed_request_opt(struct cfq_data *cfqd)
}

/*
+ * optional functions for two layers
+ */
+struct cfq_data *cfq_cgroup_search_data(void *data,
+ struct task_struct *tsk)
+{
+ struct cfq_data *cfqd = (struct cfq_data *)data;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
+ struct cfq_cgroup *cont = task_to_cfq_cgroup(tsk);
+ struct rb_node *p = cont->sibling_tree.rb_node;
+
+ while (p) {
+ struct cfq_data *__cfqd;
+ __cfqd = rb_entry(p, struct cfq_data, group_node);
+
+
+ if (cfqdd < __cfqd->cfqdd)
+ p = p->rb_left;
+ else if (cfqdd > __cfqd->cfqdd)
+ p = p->rb_right;
+ else
+ return __cfqd;
+ }
+
+ return NULL;
+}
+
+static int cfq_cgroup_queue_empty(struct request_queue *q)
+{
+ struct cfq_data *cfqd = q->elevator->elevator_data;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
+
+ return !cfqdd->busy_data;
+}
+
+static void cfq_cgroup_add_cfqd_rr(struct cfq_data *cfqd)
+{
+ if (!cfq_cfqd_on_rr(cfqd)) {
+ cfq_mark_cfqd_on_rr(cfqd);
+ cfqd->cfqdd->busy_data++;
+
+ cfq_cgroup_service_tree_add(cfqd, 0);
+ }
+}
+
+static void cfq_cgroup_del_cfqd_rr(struct cfq_data *cfqd)
+{
+ if (RB_EMPTY_ROOT(&cfqd->service_tree.rb)) {
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
+ BUG_ON(!cfq_cfqd_on_rr(cfqd));
+ cfq_clear_cfqd_on_rr(cfqd);
+ if (!RB_EMPTY_NODE(&cfqd->rb_node)) {
+ cfq_rb_erase(&cfqd->rb_node,
+ &cfqdd->service_tree);
+ }
+ BUG_ON(!cfqdd->busy_data);
+ cfqdd->busy_data--;
+ }
+}
+
+static int cfq_cgroup_is_active_data(struct cfq_data *cfqd)
+{
+ return cfqd->cfqdd->active_data == cfqd;
+}
+
+
+/*
* cgroupfs parts below -->
*/
static void
@@ -944,6 +1010,10 @@ static struct elevator_type iosched_cfq_cgroup = {
static struct cfq_ops cfq_cgroup_op = {
.cfq_init_driver_data_opt_fn = cfq_cgroup_init_driver_data_opt,
.cfq_completed_request_opt_fn = cfq_cgroup_completed_request_opt,
+ .cfq_search_data_fn = cfq_cgroup_search_data,
+ .cfq_add_cfqq_opt_fn = cfq_cgroup_add_cfqd_rr,
+ .cfq_del_cfqq_opt_fn = cfq_cgroup_del_cfqd_rr,
+ .cfq_is_active_data_fn = cfq_cgroup_is_active_data,
};

static int __init cfq_cgroup_init(void)
@@ -953,6 +1023,7 @@ static int __init cfq_cgroup_init(void)
iosched_cfq_cgroup.ops.elevator_exit_fn = cfq_cgroup_exit_queue;
iosched_cfq_cgroup.ops.elevator_dispatch_fn =
cfq_cgroup_dispatch_requests,
+ iosched_cfq_cgroup.ops.elevator_queue_empty_fn = cfq_cgroup_queue_empty,

elv_register(&iosched_cfq_cgroup);

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 5fbef85..5fe0551 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -181,26 +181,30 @@ static inline int cfq_bio_sync(struct bio *bio)
return 0;
}

+
+static int cfq_queue_empty(struct request_queue *q)
+{
+ struct cfq_data *cfqd = q->elevator->elevator_data;
+
+ return !cfqd->busy_queues;
+}
+
/*
* scheduler run of queue, if there are requests pending and no one in the
* driver that will restart queueing
*/
-static inline void cfq_schedule_dispatch(struct cfq_data *cfqd)
+inline void cfq_schedule_dispatch(struct cfq_data *cfqd)
{
struct cfq_driver_data *cfqdd = cfqd->cfqdd;
- if (cfqd->busy_queues) {
+ struct elevator_ops *ops = cfqdd->queue->elevator->ops;
+
+ if (!ops->elevator_queue_empty_fn(cfqdd->queue)) {
cfq_log(cfqd, "schedule dispatch");
kblockd_schedule_work(cfqdd->queue,
&cfqdd->unplug_work);
}
}

-static int cfq_queue_empty(struct request_queue *q)
-{
- struct cfq_data *cfqd = q->elevator->elevator_data;
-
- return !cfqd->busy_queues;
-}

/*
* Scale schedule slice based on io priority. Use the sync time slice only
@@ -503,6 +507,9 @@ static void cfq_add_cfqq_rr(struct cfq_data *cfqd, struct cfq_queue *cfqq)
cfqd->busy_queues++;

cfq_resort_rr_list(cfqd, cfqq);
+
+ if (cfqd->cfqdd->op->cfq_add_cfqq_opt_fn)
+ cfqd->cfqdd->op->cfq_add_cfqq_opt_fn(cfqd);
}

/*
@@ -520,6 +527,9 @@ static void cfq_del_cfqq_rr(struct cfq_data *cfqd, struct cfq_queue *cfqq)

BUG_ON(!cfqd->busy_queues);
cfqd->busy_queues--;
+
+ if (cfqd->cfqdd->op->cfq_del_cfqq_opt_fn)
+ cfqd->cfqdd->op->cfq_del_cfqq_opt_fn(cfqd);
}

/*
@@ -639,6 +649,9 @@ static int cfq_merge(struct request_queue *q, struct request **req,
struct cfq_data *cfqd = q->elevator->elevator_data;
struct request *__rq;

+ if (cfqd->cfqdd->op->cfq_search_data_fn)
+ cfqd = cfqd->cfqdd->op->cfq_search_data_fn(cfqd, current);
+
__rq = cfq_find_rq_fmerge(cfqd, bio);
if (__rq && elv_rq_merge_ok(__rq, bio)) {
*req = __rq;
@@ -679,6 +692,9 @@ static int cfq_allow_merge(struct request_queue *q, struct request *rq,
struct cfq_io_context *cic;
struct cfq_queue *cfqq;

+ if (cfqd->cfqdd->op->cfq_search_data_fn)
+ cfqd = cfqd->cfqdd->op->cfq_search_data_fn(cfqd, current);
+
/*
* Disallow merge of a sync bio into an async request.
*/
@@ -882,8 +898,8 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
*/
static void cfq_dispatch_insert(struct request_queue *q, struct request *rq)
{
- struct cfq_data *cfqd = q->elevator->elevator_data;
struct cfq_queue *cfqq = RQ_CFQQ(rq);
+ struct cfq_data *cfqd = cfqq->cfqd;

cfq_log_cfqq(cfqd, cfqq, "dispatch_insert");

@@ -1739,6 +1755,13 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq,
struct request *rq)
{
struct cfq_queue *cfqq;
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd;
+ int flag = 1;
+
+ if (cfqdd->op->cfq_is_active_data_fn)
+ flag = cfqdd->op->cfq_is_active_data_fn(cfqd);
+ if (!flag)
+ return 0;

cfqq = cfqd->active_queue;
if (!cfqq)
@@ -1767,7 +1790,7 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq,
if (rq_is_meta(rq) && !cfqq->meta_pending)
return 1;

- if (!cfqd->cfqdd->active_cic || !cfq_cfqq_wait_request(cfqq))
+ if (!cfqdd->active_cic || !cfq_cfqq_wait_request(cfqq))
return 0;

/*
@@ -1811,6 +1834,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
{
struct cfq_io_context *cic = RQ_CIC(rq);
struct cfq_driver_data *cfqdd = cfqd->cfqdd;
+ int flag = 1;

cfqdd->rq_queued++;
if (rq_is_meta(rq))
@@ -1822,7 +1846,10 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,

cic->last_request_pos = rq->sector + rq->nr_sectors;

- if (cfqq == cfqd->active_queue) {
+ if (cfqdd->op->cfq_is_active_data_fn)
+ flag = cfqdd->op->cfq_is_active_data_fn(cfqd);
+
+ if ((flag) && (cfqq == cfqd->active_queue)) {
/*
* if we are waiting for a request for this queue, let it rip
* immediately and flag that we must not expire this queue
@@ -1847,8 +1874,8 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,

static void cfq_insert_request(struct request_queue *q, struct request *rq)
{
- struct cfq_data *cfqd = q->elevator->elevator_data;
struct cfq_queue *cfqq = RQ_CFQQ(rq);
+ struct cfq_data *cfqd = cfqq->cfqd;

cfq_log_cfqq(cfqd, cfqq, "insert_request");
cfq_init_prio_data(cfqq, RQ_CIC(rq)->ioc);
@@ -1979,6 +2006,9 @@ static int cfq_may_queue(struct request_queue *q, int rw)
struct cfq_io_context *cic;
struct cfq_queue *cfqq;

+ if (cfqd->cfqdd->op->cfq_search_data_fn)
+ cfqd = cfqd->cfqdd->op->cfq_search_data_fn(cfqd, current);
+
/*
* don't force setup of a queue from here, as a call to may_queue
* does not necessarily imply that a request actually will be queued.
@@ -2035,6 +2065,9 @@ cfq_set_request(struct request_queue *q, struct request *rq, gfp_t gfp_mask)
struct cfq_queue *cfqq;
unsigned long flags;

+ if (cfqd->cfqdd->op->cfq_search_data_fn)
+ cfqd = cfqd->cfqdd->op->cfq_search_data_fn(cfqd, current);
+
might_sleep_if(gfp_mask & __GFP_WAIT);

cic = cfq_get_io_context(cfqd, gfp_mask);
diff --git a/include/linux/cfq-iosched.h b/include/linux/cfq-iosched.h
index 30702c9..7287186 100644
--- a/include/linux/cfq-iosched.h
+++ b/include/linux/cfq-iosched.h
@@ -132,9 +132,18 @@ struct cfq_data {
typedef void (cfq_init_driver_data_opt_fn)(struct cfq_driver_data *,
struct cfq_data *);
typedef int (cfq_completed_request_opt_fn)(struct cfq_data *);
+typedef struct cfq_data* (cfq_search_data_fn)(void *, struct task_struct *);
+typedef void (cfq_add_cfqq_opt_fn)(struct cfq_data *);
+typedef void (cfq_del_cfqq_opt_fn)(struct cfq_data *);
+typedef int (cfq_is_active_data_fn)(struct cfq_data *);
+
struct cfq_ops {
cfq_init_driver_data_opt_fn *cfq_init_driver_data_opt_fn;
cfq_completed_request_opt_fn *cfq_completed_request_opt_fn;
+ cfq_search_data_fn *cfq_search_data_fn;
+ cfq_add_cfqq_opt_fn *cfq_add_cfqq_opt_fn;
+ cfq_del_cfqq_opt_fn *cfq_del_cfqq_opt_fn;
+ cfq_is_active_data_fn *cfq_is_active_data_fn;
};


--
1.5.6.5

2008-11-12 08:38:27

by Satoshi UCHIDA

[permalink] [raw]
Subject: [PATCH][cfq-cgroups][12/12] Interface for parameter of cfq driver data

>From 8bdb6a318fbc266cf359aea699c97a964baeff35 Mon Sep 17 00:00:00 2001
From: Satoshi UCHIDA <[email protected]>
Date: Fri, 31 Oct 2008 20:49:57 +0900
Subject: [PATCH][cfq-cgroups] Interface for parameter of cfq driver data

This patch add a interface for parameter of cfq driver data.


Signed-off-by: Satoshi UCHIDA <[email protected]>

---
block/cfq-cgroup.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 58 insertions(+), 1 deletions(-)

diff --git a/block/cfq-cgroup.c b/block/cfq-cgroup.c
index 4938fa0..776874d 100644
--- a/block/cfq-cgroup.c
+++ b/block/cfq-cgroup.c
@@ -1028,6 +1028,62 @@ STORE_FUNCTION(cfq_cgroup_slice_async_rq_store, cfq_slice_async_rq, 1,
STORE_FUNCTION(cfq_cgroup_ioprio_store, ioprio, 0, CFQ_CGROUP_MAX_IOPRIO, 0);
#undef STORE_FUNCTION

+static ssize_t
+cfq_cgroup_var_show2(unsigned int var, char *page)
+{
+ return snprintf(page, PAGE_SIZE, "%d\n", var);
+}
+
+static ssize_t
+cfq_cgroup_var_store2(unsigned int *var, const char *page, size_t count)
+{
+ int err;
+ char *p = (char *) page;
+ unsigned long new_var;
+
+ err = strict_strtoul(p, 10, &new_var);
+ if (err)
+ count = 0;
+
+ *var = new_var;
+
+ return count;
+}
+
+#define SHOW_FUNCTION2(__FUNC, __VAR, __CONV) \
+static ssize_t __FUNC(elevator_t *e, char *page) \
+{ \
+ struct cfq_data *cfqd = e->elevator_data; \
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd; \
+ unsigned int __data = __VAR; \
+ if (__CONV) \
+ __data = jiffies_to_msecs(__data); \
+ return cfq_cgroup_var_show2(__data, (page)); \
+}
+SHOW_FUNCTION2(cfq_cgroup_slice_cgroup_show, cfqdd->cfq_cgroup_slice, 1);
+#undef SHOW_FUNCTION2
+
+#define STORE_FUNCTION2(__FUNC, __PTR, MIN, MAX, __CONV) \
+static ssize_t __FUNC(elevator_t *e, const char *page, size_t count) \
+{ \
+ struct cfq_data *cfqd = e->elevator_data; \
+ struct cfq_driver_data *cfqdd = cfqd->cfqdd; \
+ unsigned int __data; \
+ int ret = cfq_cgroup_var_store2(&__data, (page), count); \
+ if (__data < (MIN)) \
+ __data = (MIN); \
+ else if (__data > (MAX)) \
+ __data = (MAX); \
+ if (__CONV) \
+ *(__PTR) = msecs_to_jiffies(__data); \
+ else \
+ *(__PTR) = __data; \
+ return ret; \
+}
+STORE_FUNCTION2(cfq_cgroup_slice_cgroup_store, &cfqdd->cfq_cgroup_slice, 1,
+ UINT_MAX, 1);
+#undef STORE_FUNCTION2
+
#define CFQ_CGROUP_ATTR(name) \
__ATTR(name, S_IRUGO|S_IWUSR, cfq_cgroup_##name##_show, \
cfq_cgroup_##name##_store)
@@ -1043,6 +1099,7 @@ static struct elv_fs_entry cfq_cgroup_attrs[] = {
CFQ_CGROUP_ATTR(slice_async_rq),
CFQ_CGROUP_ATTR(slice_idle),
CFQ_CGROUP_ATTR(ioprio),
+ CFQ_CGROUP_ATTR(slice_cgroup),
__ATTR_NULL
};

--
1.5.6.5

2008-11-12 08:41:33

by Satoshi UCHIDA

[permalink] [raw]
Subject: [PATCH][cfq-cgroups][Option 1] Introduce a think time valid entry.


This patch introduces a think time valid entry.

A think time is effective when queue is poor I/O requests
because its queue is handled as idle class and then
next queue can start to dispatch requests right after it.
However, if there are many tasks, a value of think time is bigger.
So, many queue are handled as idle class.
Many queue will dispatch few requests(one request) and then
expire slice.
Namely, ioprio control for their queues is invalid.

A think time valid entry is decide to check think time.
The value 0 is always handled as idle class.
The value 1 is handled as same as traditional CFQ.
The value 2 make think time invalid.


Signed-off-by: Satoshi UCHIDA <[email protected]>

---
block/cfq-cgroup.c | 2 ++
block/cfq-iosched.c | 9 ++++++++-
include/linux/cfq-iosched.h | 1 +
3 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/block/cfq-cgroup.c b/block/cfq-cgroup.c
index 776874d..b407768 100644
--- a/block/cfq-cgroup.c
+++ b/block/cfq-cgroup.c
@@ -922,6 +922,7 @@ SHOW_FUNCTION(cfq_cgroup_slice_sync_show, cfq_slice[1], 1);
SHOW_FUNCTION(cfq_cgroup_slice_async_show, cfq_slice[0], 1);
SHOW_FUNCTION(cfq_cgroup_slice_async_rq_show, cfq_slice_async_rq, 0);
SHOW_FUNCTION(cfq_cgroup_ioprio_show, ioprio, 0);
+SHOW_FUNCTION(cfq_cgroup_ttime_valid_show, cfq_ttime_valid, 0);
#undef SHOW_FUNCTION

static ssize_t
@@ -1026,6 +1027,7 @@ STORE_FUNCTION(cfq_cgroup_slice_async_store, cfq_slice[0], 1, UINT_MAX, 1);
STORE_FUNCTION(cfq_cgroup_slice_async_rq_store, cfq_slice_async_rq, 1,
UINT_MAX, 0);
STORE_FUNCTION(cfq_cgroup_ioprio_store, ioprio, 0, CFQ_CGROUP_MAX_IOPRIO, 0);
+STORE_FUNCTION(cfq_cgroup_ttime_valid_store, cfq_ttime_valid, 0, 2, 0);
#undef STORE_FUNCTION

static ssize_t
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index edc23e5..51dccad 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -28,6 +28,8 @@ static const int cfq_slice_sync = HZ / 10;
static int cfq_slice_async = HZ / 25;
static const int cfq_slice_async_rq = 2;
static int cfq_slice_idle = HZ / 125;
+/* think time valid flag */
+static int cfq_ttime_valid = 1;

/*
* offset from end of service tree
@@ -1731,7 +1733,8 @@ cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq,
(cfqd->cfqdd->hw_tag && CIC_SEEKY(cic)))
enable_idle = 0;
else if (sample_valid(cic->ttime_samples)) {
- if (cic->ttime_mean > cfqd->cfq_slice_idle)
+ if (cic->ttime_mean >
+ cfqd->cfq_slice_idle * cfqd->cfq_ttime_valid)
enable_idle = 0;
else
enable_idle = 1;
@@ -2304,6 +2307,7 @@ struct cfq_data *cfq_init_cfq_data(struct request_queue *q,
cfqd->cfq_slice[1] = cfq_slice_sync;
cfqd->cfq_slice_async_rq = cfq_slice_async_rq;
cfqd->cfq_slice_idle = cfq_slice_idle;
+ cfqd->cfq_ttime_valid = cfq_ttime_valid;

return cfqd;
}
@@ -2381,6 +2385,7 @@ SHOW_FUNCTION(cfq_slice_idle_show, cfqd->cfq_slice_idle, 1);
SHOW_FUNCTION(cfq_slice_sync_show, cfqd->cfq_slice[1], 1);
SHOW_FUNCTION(cfq_slice_async_show, cfqd->cfq_slice[0], 1);
SHOW_FUNCTION(cfq_slice_async_rq_show, cfqd->cfq_slice_async_rq, 0);
+SHOW_FUNCTION(cfq_ttime_valid_show, cfqd->cfq_ttime_valid, 0);
#undef SHOW_FUNCTION

#define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV) \
@@ -2412,6 +2417,7 @@ STORE_FUNCTION(cfq_slice_sync_store, &cfqd->cfq_slice[1], 1, UINT_MAX, 1);
STORE_FUNCTION(cfq_slice_async_store, &cfqd->cfq_slice[0], 1, UINT_MAX, 1);
STORE_FUNCTION(cfq_slice_async_rq_store, &cfqd->cfq_slice_async_rq, 1,
UINT_MAX, 0);
+STORE_FUNCTION(cfq_ttime_valid_store, &cfqd->cfq_ttime_valid, 0, 2, 0);
#undef STORE_FUNCTION

#define CFQ_ATTR(name) \
@@ -2427,6 +2433,7 @@ static struct elv_fs_entry cfq_attrs[] = {
CFQ_ATTR(slice_async),
CFQ_ATTR(slice_async_rq),
CFQ_ATTR(slice_idle),
+ CFQ_ATTR(ttime_valid),
__ATTR_NULL
};

diff --git a/include/linux/cfq-iosched.h b/include/linux/cfq-iosched.h
index 920bcb5..8fd4b59 100644
--- a/include/linux/cfq-iosched.h
+++ b/include/linux/cfq-iosched.h
@@ -95,6 +95,7 @@ struct cfq_data {
unsigned int cfq_slice[2];
unsigned int cfq_slice_async_rq;
unsigned int cfq_slice_idle;
+ unsigned int cfq_ttime_valid;

struct list_head cic_list;

--
1.5.6.5

2008-11-12 08:43:22

by Satoshi UCHIDA

[permalink] [raw]
Subject: [PATCH][cfq-cgroups][Option 2] Introduce ioprio class for top layer.

>From c13547c5758479116b6dcf10c58d0ef4f058351e Mon Sep 17 00:00:00 2001
From: Satoshi UCHIDA <[email protected]>
Date: Fri, 7 Nov 2008 19:21:19 +0900
Subject: [PATCH][cfq-cgroups] Introduce ioprio class for top layer.

This patch introduces iprio class for cfq data control layer.
By applying this patch, controller can also handle the RT/IDLE properties
among groups.

Signed-off-by: Satoshi UCHIDA <[email protected]>

---
block/cfq-cgroup.c | 344 +++++++++++++++++++++++++------------------
include/linux/cfq-iosched.h | 1 +
2 files changed, 203 insertions(+), 142 deletions(-)

diff --git a/block/cfq-cgroup.c b/block/cfq-cgroup.c
index bb8cb6f..993a3b6 100644
--- a/block/cfq-cgroup.c
+++ b/block/cfq-cgroup.c
@@ -20,11 +20,24 @@

static const int cfq_cgroup_slice = HZ / 10;

+/*
+ * offset from end of service tree
+ */
+#define CFQ_CGROUP_IDLE_DELAY (HZ / 5)
+
+#define cfq_data_class_idle(cfqd) \
+ ((cfqd)->ioprio_class == IOPRIO_CLASS_IDLE)
+#define cfq_data_class_rt(cfqd) \
+ ((cfqd)->ioprio_class == IOPRIO_CLASS_RT)
+
+
+
static struct cfq_ops cfq_cgroup_op;

struct cfq_cgroup {
struct cgroup_subsys_state css;
unsigned int ioprio;
+ unsigned short ioprio_class;

struct rb_root sibling_tree;
unsigned int siblings;
@@ -161,6 +174,7 @@ static void *cfq_cgroup_init_cfq_data(struct cfq_cgroup *cfqc,
cfqc = cgroup_to_cfq_cgroup(get_root_subsys(&cfq_subsys));
cfq_cgroup_sibling_tree_add(cfqc, cfqd);
cfqd->ioprio = cfqc->ioprio;
+ cfqd->ioprio_class = cfqc->ioprio_class;
} else {
struct cfq_data *__cfqd;
__cfqd = __cfq_cgroup_init_queue(cfqd->cfqdd->queue,
@@ -168,7 +182,7 @@ static void *cfq_cgroup_init_cfq_data(struct cfq_cgroup *cfqc,
if (!__cfqd)
return NULL;
cfq_cgroup_sibling_tree_add(cfqc, __cfqd);
- __cfqd->ioprio = cfqc->ioprio;
+ __cfqd->ioprio_class = cfqc->ioprio_class;
}

/* check and create cfq_data for children */
@@ -250,6 +264,7 @@ cfq_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
return ERR_PTR(-ENOMEM);

cfqc->ioprio = 3;
+ cfqc->ioprio = IOPRIO_CLASS_BE;

cfqc->sibling_tree = RB_ROOT;
cfqc->siblings = 0;
@@ -378,7 +393,15 @@ static void cfq_cgroup_service_tree_add(struct cfq_data *cfqd, int add_front)
unsigned long rb_key;
int left;

- if (!add_front) {
+ if (cfq_data_class_idle(cfqd)) {
+ rb_key = CFQ_CGROUP_IDLE_DELAY;
+ parent = rb_last(&cfqdd->service_tree.rb);
+ if (parent && parent != &cfqd->rb_node) {
+ __cfqd = rb_entry(parent, struct cfq_data, rb_node);
+ rb_key += __cfqd->rb_key;
+ } else
+ rb_key += jiffies;
+ } else if (!add_front) {
rb_key = cfq_cgroup_slice_offset(cfqd) + jiffies;
rb_key += cfqd->slice_resid;
cfqd->slice_resid = 0;
@@ -400,7 +423,23 @@ static void cfq_cgroup_service_tree_add(struct cfq_data *cfqd, int add_front)
parent = *p;
__cfqd = rb_entry(parent, struct cfq_data, rb_node);

- if (rb_key < __cfqd->rb_key)
+
+ /*
+ * sort RT cfq_data first, we always want to give
+ * preference to them. IDLE cfq_data goes to the back.
+ * after that, sort on the next service time.
+ */
+ if (cfq_data_class_rt(cfqd) > cfq_data_class_rt(__cfqd))
+ n = &(*p)->rb_left;
+ else if (cfq_data_class_rt(cfqd) < cfq_data_class_rt(__cfqd))
+ n = &(*p)->rb_right;
+ else if (cfq_data_class_idle(cfqd) <
+ cfq_data_class_idle(__cfqd))
+ n = &(*p)->rb_left;
+ else if (cfq_data_class_idle(cfqd) >
+ cfq_data_class_idle(__cfqd))
+ n = &(*p)->rb_right;
+ else if (rb_key < __cfqd->rb_key)
n = &(*p)->rb_left;
else
n = &(*p)->rb_right;
@@ -542,6 +579,14 @@ int cfq_cgroup_dispatch_requests(struct request_queue *q, int force)
if (cfqd)
dispatched = cfq_queue_dispatch_requests(cfqd, force);

+ /*
+ * idle cfq_data always expire after 1 dispatch round.
+ */
+ if (cfqdd->busy_data > 1 && cfq_data_class_idle(cfqd)) {
+ cfqd->slice_end = jiffies + 1;
+ cfq_cgroup_slice_expired(cfqdd, 0);
+ }
+
return dispatched;
}

@@ -699,149 +744,164 @@ param_separate(const char *master, char *valbuf, char *pathbuf, int size)
*pc2 = '\0';
}

-static ssize_t cfq_cgroup_read(struct cgroup *cont, struct cftype *cft,
- struct file *file, char __user *userbuf,
- size_t nbytes, loff_t *ppos)
-{
- struct cfq_cgroup *cfqc;
- char *page;
- ssize_t ret;
- struct rb_node *p;
-
- page = (char *)__get_free_page(GFP_TEMPORARY);
- if (!page)
- return -ENOMEM;
-
- cgroup_lock();
- if (cgroup_is_removed(cont)) {
- cgroup_unlock();
- ret = -ENODEV;
- goto out;
- }
-
- cfqc = cgroup_to_cfq_cgroup(cont);
-
- cgroup_unlock();
-
- /* print priority */
- ret = snprintf(page, PAGE_SIZE, "default priority: %d\n", cfqc->ioprio);
-
- p = rb_first(&cfqc->sibling_tree);
- while (p) {
- struct cfq_data *__cfqd;
-
- __cfqd = rb_entry(p, struct cfq_data, group_node);
-
- ret += snprintf(page + ret, PAGE_SIZE - ret, " %s %d\n",
- __cfqd->cfqdd->queue->kobj.parent->name,
- __cfqd->ioprio);
-
- p = rb_next(p);
- }

- ret = simple_read_from_buffer(userbuf, nbytes, ppos, page, ret);
-
-out:
- free_page((unsigned long)page);
- return ret;
+#define READ_FUNCTION(__FUNC, __VAR, __DEF_MSG) \
+static ssize_t __FUNC(struct cgroup *cont, struct cftype *cft, \
+ struct file *file, char __user *userbuf, \
+ size_t nbytes, loff_t *ppos) \
+{ \
+ struct cfq_cgroup *cfqc; \
+ char *page; \
+ ssize_t ret; \
+ struct rb_node *p; \
+ \
+ page = (char *)__get_free_page(GFP_TEMPORARY); \
+ if (!page) \
+ return -ENOMEM; \
+ \
+ cgroup_lock(); \
+ if (cgroup_is_removed(cont)) { \
+ cgroup_unlock(); \
+ ret = -ENODEV; \
+ goto out; \
+ } \
+ \
+ cfqc = cgroup_to_cfq_cgroup(cont); \
+ \
+ cgroup_unlock(); \
+ \
+ /* print */ \
+ ret = snprintf(page, PAGE_SIZE, "default " __DEF_MSG ": %d\n", \
+ cfqc->__VAR); \
+ \
+ p = rb_first(&cfqc->sibling_tree); \
+ while (p) { \
+ struct cfq_data *__cfqd; \
+ \
+ __cfqd = rb_entry(p, struct cfq_data, group_node); \
+ \
+ ret += snprintf(page + ret, PAGE_SIZE - ret, " %s %d\n",\
+ __cfqd->cfqdd->queue->kobj.parent->name, \
+ __cfqd->__VAR); \
+ \
+ p = rb_next(p); \
+ } \
+ \
+ ret = simple_read_from_buffer(userbuf, nbytes, ppos, page, ret);\
+ \
+out: \
+ free_page((unsigned long)page); \
+ return ret; \
}
-
-static ssize_t cfq_cgroup_write(struct cgroup *cont, struct cftype *cft,
- struct file *file, const char __user *userbuf,
- size_t nbytes, loff_t *ppos)
-{
- struct cfq_cgroup *cfqc;
- ssize_t ret;
- long new_prio;
- int err, sn;
- char *buffer = NULL;
- char *valbuf = NULL, *pathbuf = NULL;
- struct rb_node *p;
-
- cgroup_lock();
- if (cgroup_is_removed(cont)) {
- cgroup_unlock();
- ret = -ENODEV;
- goto out;
- }
-
- cfqc = cgroup_to_cfq_cgroup(cont);
- cgroup_unlock();
-
- /* set priority */
- buffer = kmalloc(nbytes + 1, GFP_KERNEL);
- if (buffer == NULL)
- return -ENOMEM;
-
- if (copy_from_user(buffer, userbuf, nbytes)) {
- ret = -EFAULT;
- goto free_buf;
- }
- buffer[nbytes] = 0;
-
- valbuf = kmalloc(nbytes + 1, GFP_KERNEL);
- if (!valbuf) {
- ret = -ENOMEM;
- goto free_buf;
- }
-
- pathbuf = kmalloc(nbytes + 1, GFP_KERNEL);
- if (!pathbuf) {
- ret = -ENOMEM;
- goto free_val;
- }
-
- param_separate(buffer, valbuf, pathbuf, nbytes);
-
- err = strict_strtoul(valbuf, 10, &new_prio);
- if ((err) || ((new_prio < 0) || (new_prio > CFQ_CGROUP_MAX_IOPRIO))) {
- ret = -EINVAL;
- goto free_path;
- }
-
- sn = strlen(pathbuf);
-
- p = rb_first(&cfqc->sibling_tree);
- while (p) {
- struct cfq_data *__cfqd;
- const char *namep;
-
- __cfqd = rb_entry(p, struct cfq_data, group_node);
- namep = __cfqd->cfqdd->queue->kobj.parent->name;
-
- if (sn == 0) {
- __cfqd->ioprio = new_prio;
- } else if ((sn == strlen(namep)) &&
- (strncmp(pathbuf, namep, sn) == 0)) {
- __cfqd->ioprio = new_prio;
- break;
- }
-
- p = rb_next(p);
- }
-
- if ((sn == 0) ||
- ((sn == 7) && (strncmp(pathbuf, "default", 7) == 0)))
- cfqc->ioprio = new_prio;
-
- ret = nbytes;
-
-free_path:
- kfree(pathbuf);
-free_val:
- kfree(valbuf);
-free_buf:
- kfree(buffer);
-out:
- return ret;
+READ_FUNCTION(cfq_cgroup_ioprio_read, ioprio, "priority");
+READ_FUNCTION(cfq_cgroup_ioprio_class_read, ioprio_class, "priority class");
+#undef READ_FUNCTION
+
+#define WRITE_FUNCTION(__FUNC, __VAR, MIN, MAX) \
+static ssize_t __FUNC(struct cgroup *cont, struct cftype *cft, \
+ struct file *file, const char __user *userbuf, \
+ size_t nbytes, loff_t *ppos) \
+{ \
+ struct cfq_cgroup *cfqc; \
+ ssize_t ret; \
+ long new_val; \
+ int err, sn; \
+ char *buffer = NULL; \
+ char *valbuf = NULL, *pathbuf = NULL; \
+ struct rb_node *p; \
+ \
+ cgroup_lock(); \
+ if (cgroup_is_removed(cont)) { \
+ cgroup_unlock(); \
+ ret = -ENODEV; \
+ goto out; \
+ } \
+ \
+ cfqc = cgroup_to_cfq_cgroup(cont); \
+ cgroup_unlock(); \
+ \
+ /* set */ \
+ buffer = kmalloc(nbytes + 1, GFP_KERNEL); \
+ if (buffer == NULL) \
+ return -ENOMEM; \
+ \
+ if (copy_from_user(buffer, userbuf, nbytes)) { \
+ ret = -EFAULT; \
+ goto free_buf; \
+ } \
+ buffer[nbytes] = 0; \
+ \
+ valbuf = kmalloc(nbytes + 1, GFP_KERNEL); \
+ if (!valbuf) { \
+ ret = -ENOMEM; \
+ goto free_buf; \
+ } \
+ \
+ pathbuf = kmalloc(nbytes + 1, GFP_KERNEL); \
+ if (!pathbuf) { \
+ ret = -ENOMEM; \
+ goto free_val; \
+ } \
+ \
+ param_separate(buffer, valbuf, pathbuf, nbytes); \
+ \
+ err = strict_strtoul(valbuf, 10, &new_val); \
+ if ((err) || ((new_val < (MIN)) || (new_val > (MAX)))) { \
+ ret = -EINVAL; \
+ goto free_path; \
+ } \
+ \
+ sn = strlen(pathbuf); \
+ \
+ p = rb_first(&cfqc->sibling_tree); \
+ while (p) { \
+ struct cfq_data *__cfqd; \
+ const char *namep; \
+ \
+ __cfqd = rb_entry(p, struct cfq_data, group_node); \
+ namep = __cfqd->cfqdd->queue->kobj.parent->name; \
+ \
+ if (sn == 0) { \
+ __cfqd->__VAR = new_val; \
+ } else if ((sn == strlen(namep)) && \
+ (strncmp(pathbuf, namep, sn) == 0)) { \
+ __cfqd->__VAR = new_val; \
+ break; \
+ } \
+ \
+ p = rb_next(p); \
+ } \
+ \
+ if ((sn == 0) || \
+ ((sn == 7) && (strncmp(pathbuf, "default", 7) == 0))) \
+ cfqc->__VAR = new_val; \
+ \
+ ret = nbytes; \
+ \
+free_path: \
+ kfree(pathbuf); \
+free_val: \
+ kfree(valbuf); \
+free_buf: \
+ kfree(buffer); \
+out: \
+ return ret; \
}
+WRITE_FUNCTION(cfq_cgroup_ioprio_write, ioprio, 0, CFQ_CGROUP_MAX_IOPRIO);
+WRITE_FUNCTION(cfq_cgroup_ioprio_class_write, ioprio_class, 0,
+ IOPRIO_CLASS_IDLE);
+#undef WRITE_FUNCTION
+
+#define CFQ_CGROUP_CTYPE_ATTR(_name) \
+ { \
+ .name = (__stringify(_name)), \
+ .read = cfq_cgroup_##_name##_read, \
+ .write = cfq_cgroup_##_name##_write, \
+ }

static struct cftype files[] = {
- {
- .name = "ioprio",
- .read = cfq_cgroup_read,
- .write = cfq_cgroup_write,
- },
+ CFQ_CGROUP_CTYPE_ATTR(ioprio),
+ CFQ_CGROUP_CTYPE_ATTR(ioprio_class),
};

static int cfq_cgroup_populate(struct cgroup_subsys *ss, struct cgroup *cont)
diff --git a/include/linux/cfq-iosched.h b/include/linux/cfq-iosched.h
index 920bcb5..ca04ebd 100644
--- a/include/linux/cfq-iosched.h
+++ b/include/linux/cfq-iosched.h
@@ -102,6 +102,7 @@ struct cfq_data {

#ifdef CONFIG_IOSCHED_CFQ_CGROUP
unsigned int ioprio;
+ unsigned short ioprio_class;

/* sibling_tree member for cfq_meta_data */
struct rb_node sib_node;
--
1.5.6.5

2008-11-12 08:59:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH][RFC][12+2][v3] A expanded CFQ scheduler for cgroups

Hi,

Just wondering,.. have you lot looked at the recently posted BFQ
patches?

BFQ looks like a very promising elevator, its has tighter bounds than
CFQ and already does the cgroup thing.

2008-11-12 09:23:12

by Satoshi UCHIDA

[permalink] [raw]
Subject: RE: [PATCH][RFC][12+2][v3] A expanded CFQ scheduler for cgroups

Hi, Peter.

> Just wondering,.. have you lot looked at the recently posted BFQ
> patches?
>
> BFQ looks like a very promising elevator, its has tighter bounds than
> CFQ and already does the cgroup thing.


This patchset is improved our previous scheduler(sent 04/01/2008) and resolves some problems.

And, we scheduler can control at the two layers.
Namely, it can control I/O among groups and I/O among tasks within a group.

I always check in containers ML mainly.
So, I found I/O control patches in it ML, dm-iobnd, io-throttle, Naveen's Anticipatory I/O scheduler,
Vasily's CFQ scheduler and Vivek's Another IO controller.
And today I know the BFQ scheduler, so I'm not looked at the BFQ yet.
Now, I'm looking its patch.

Thanks, comments.


Satoshi UCHIDA


> -----Original Message-----
> From: Peter Zijlstra [mailto:[email protected]]
> Sent: Wednesday, November 12, 2008 5:58 PM
> To: Satoshi UCHIDA
> Cc: [email protected];
> [email protected];
> [email protected]; [email protected]; 'Ryo
> Tsuruta'; 'Andrea Righi'; [email protected]; [email protected];
> [email protected]; 'Hirokazu Takahashi'; [email protected];
> 'Andrew Morton'; [email protected]; SUGAWARA Tomoyoshi;
> [email protected]; Fabio Checconi
> Subject: Re: [PATCH][RFC][12+2][v3] A expanded CFQ scheduler for cgroups
>
> Hi,
>
> Just wondering,.. have you lot looked at the recently posted BFQ
> patches?
>
> BFQ looks like a very promising elevator, its has tighter bounds than
> CFQ and already does the cgroup thing.
>


> -----Original Message-----
> From: Peter Zijlstra [mailto:[email protected]]
> Sent: Wednesday, November 12, 2008 5:58 PM
> To: Satoshi UCHIDA
> Cc: [email protected];
> [email protected];
> [email protected]; [email protected]; 'Ryo
> Tsuruta'; 'Andrea Righi'; [email protected]; [email protected];
> [email protected]; 'Hirokazu Takahashi'; [email protected];
> 'Andrew Morton'; [email protected]; SUGAWARA Tomoyoshi;
> [email protected]; Fabio Checconi
> Subject: Re: [PATCH][RFC][12+2][v3] A expanded CFQ scheduler for cgroups
>
> Hi,
>
> Just wondering,.. have you lot looked at the recently posted BFQ
> patches?
>
> BFQ looks like a very promising elevator, its has tighter bounds than
> CFQ and already does the cgroup thing.
>