RAID5 still doesn't compile....sigh....
gcc -D__KERNEL__ -I/usr/src/linux-2.5.14/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fomit-frame-pointer -fno-strict-alias
ing -fno-common -pipe -mpreferred-stack-boundary=2 -march=i686 -DMODULE -DKBUILD_BASENAME=raid5 -c -o raid5.o raid5.c
raid5.c: In function `raid5_plug_device':
raid5.c:1247: `tq_disk' undeclared (first use in this function)
raid5.c:1247: (Each undeclared identifier is reported only once
raid5.c:1247: for each function it appears in.)
raid5.c: In function `run':
raid5.c:1589: sizeof applied to an incomplete type
make[2]: *** [raid5.o] Error 1
make[2]: Leaving directory `/usr/src/linux-2.5.14/drivers/md'
make[1]: *** [_modsubdir_md] Error 2
make[1]: Leaving directory `/usr/src/linux-2.5.14/drivers'
make: *** [_mod_drivers] Error 2
Michael D. Black [email protected]
http://www.csihq.com/
http://www.csihq.com/~mike
321-676-2923, x203
Melbourne FL
On Mon, Jun 03 2002, Mike Black wrote:
> RAID5 still doesn't compile....sigh....
[snip]
Some people do nothing but complain instead of trying to fix things.
Sigh...
--
Jens Axboe
On Tuesday June 4, [email protected] wrote:
> On Mon, Jun 03 2002, Mike Black wrote:
> > RAID5 still doesn't compile....sigh....
>
> [snip]
>
> Some people do nothing but complain instead of trying to fix things.
> Sigh...
I've got fixes.... but I want to suggest some changes to the plugging
mechanism, and as it seems to have changed a bit since 2.5.20, I'll
have to sync up my patch before I show it to you...
NeilBrown
On Tue, Jun 04 2002, Neil Brown wrote:
> On Tuesday June 4, [email protected] wrote:
> > On Mon, Jun 03 2002, Mike Black wrote:
> > > RAID5 still doesn't compile....sigh....
> >
> > [snip]
> >
> > Some people do nothing but complain instead of trying to fix things.
> > Sigh...
>
> I've got fixes.... but I want to suggest some changes to the plugging
> mechanism, and as it seems to have changed a bit since 2.5.20, I'll
> have to sync up my patch before I show it to you...
Excellent. I've sent the last plugging patch to Linus, which appears to
be ok/stable. If you could send changes relative to that, it would be
great.
What changes did you have in mind?
--
Jens Axboe
On Tuesday June 4, [email protected] wrote:
>
> What changes did you have in mind?
http://www.cse.unsw.edu.au/~neilb/patches/linux-devel/2.5.20/patch-A-NewPlug
Is what I had against 2.5.20. A quick look at the mail that you sent
with improvements suggest that I can be even less intrusive.. But it
will have to wait until tomorrow (my time).
NeilBrown
On Tue, Jun 04 2002, Neil Brown wrote:
> On Tuesday June 4, [email protected] wrote:
> >
> > What changes did you have in mind?
>
> http://www.cse.unsw.edu.au/~neilb/patches/linux-devel/2.5.20/patch-A-NewPlug
>
> Is what I had against 2.5.20. A quick look at the mail that you sent
> with improvements suggest that I can be even less intrusive.. But it
> will have to wait until tomorrow (my time).
Ah ok, I see what you have in mind. Right now you are completely
mimicking the tq_struct setup -- any reason a simple q->plug_fn is not
enough? Do you ever need anything else than the queue passed in with the
plug? Wrt umem, it seems you could keep 'card' in the queuedata. Same
for raid5 and conf.
--
Jens Axboe
On Tue, Jun 04 2002, Jens Axboe wrote:
> plug? Wrt umem, it seems you could keep 'card' in the queuedata. Same
> for raid5 and conf.
Ok by actually looking at it, both card and conf are the queues
themselves. So I'd say your approach is indeed a bit overkill. I'll send
a patch in half an hour for you to review.
--
Jens Axboe
On Tue, Jun 04 2002, Jens Axboe wrote:
> On Tue, Jun 04 2002, Jens Axboe wrote:
> > plug? Wrt umem, it seems you could keep 'card' in the queuedata. Same
> > for raid5 and conf.
>
> Ok by actually looking at it, both card and conf are the queues
> themselves. So I'd say your approach is indeed a bit overkill. I'll send
> a patch in half an hour for you to review.
Alright here's the block part of it, you'll need to merge your umem and
raid5 changes in with that. I'm just interested in knowing if this is
all you want/need. It's actually just a two line changes from the last
version posted -- set q->unplug_fn in blk_init_queue to our default
(you'll override that for umem and raid5, or rather you'll set it
yourself after blk_queue_make_request()), and then call that instead of
__generic_unplug_device directly from blk_run_queues().
--- /opt/kernel/linux-2.5.20/drivers/block/ll_rw_blk.c 2002-06-03 10:35:35.000000000 +0200
+++ linux/drivers/block/ll_rw_blk.c 2002-06-04 14:35:16.000000000 +0200
@@ -795,8 +795,8 @@
* force the transfer to start only after we have put all the requests
* on the list.
*
- * This is called with interrupts off and no requests on the queue.
- * (and with the request spinlock acquired)
+ * This is called with interrupts off and no requests on the queue and
+ * with the queue lock held.
*/
void blk_plug_device(request_queue_t *q)
{
@@ -806,7 +806,7 @@
if (!elv_queue_empty(q))
return;
- if (!test_and_set_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags)) {
+ if (!blk_queue_plugged(q)) {
spin_lock(&blk_plug_lock);
list_add_tail(&q->plug_list, &blk_plug_list);
spin_unlock(&blk_plug_lock);
@@ -814,14 +814,27 @@
}
/*
+ * remove the queue from the plugged list, if present. called with
+ * queue lock held and interrupts disabled.
+ */
+static inline int blk_remove_plug(request_queue_t *q)
+{
+ if (blk_queue_plugged(q)) {
+ spin_lock(&blk_plug_lock);
+ list_del_init(&q->plug_list);
+ spin_unlock(&blk_plug_lock);
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
* remove the plug and let it rip..
*/
static inline void __generic_unplug_device(request_queue_t *q)
{
- /*
- * not plugged
- */
- if (!__test_and_clear_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags))
+ if (!blk_remove_plug(q))
return;
if (test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags))
@@ -849,11 +862,10 @@
void generic_unplug_device(void *data)
{
request_queue_t *q = data;
- unsigned long flags;
- spin_lock_irqsave(q->queue_lock, flags);
+ spin_lock_irq(q->queue_lock);
__generic_unplug_device(q);
- spin_unlock_irqrestore(q->queue_lock, flags);
+ spin_unlock_irq(q->queue_lock);
}
/**
@@ -893,6 +905,12 @@
**/
void blk_stop_queue(request_queue_t *q)
{
+ unsigned long flags;
+
+ spin_lock_irqsave(q->queue_lock, flags);
+ blk_remove_plug(q);
+ spin_unlock_irqrestore(q->queue_lock, flags);
+
set_bit(QUEUE_FLAG_STOPPED, &q->queue_flags);
}
@@ -904,45 +922,33 @@
* are currently stopped are ignored. This is equivalent to the older
* tq_disk task queue run.
**/
+#define blk_plug_entry(entry) list_entry((entry), request_queue_t, plug_list)
void blk_run_queues(void)
{
- struct list_head *n, *tmp, local_plug_list;
- unsigned long flags;
+ struct list_head local_plug_list;
INIT_LIST_HEAD(&local_plug_list);
+ spin_lock_irq(&blk_plug_lock);
+
/*
* this will happen fairly often
*/
- spin_lock_irqsave(&blk_plug_lock, flags);
if (list_empty(&blk_plug_list)) {
- spin_unlock_irqrestore(&blk_plug_lock, flags);
+ spin_unlock_irq(&blk_plug_lock);
return;
}
list_splice(&blk_plug_list, &local_plug_list);
INIT_LIST_HEAD(&blk_plug_list);
- spin_unlock_irqrestore(&blk_plug_lock, flags);
+ spin_unlock_irq(&blk_plug_lock);
+
+ while (!list_empty(&local_plug_list)) {
+ request_queue_t *q = blk_plug_entry(local_plug_list.next);
- /*
- * local_plug_list is now a private copy we can traverse lockless
- */
- list_for_each_safe(n, tmp, &local_plug_list) {
- request_queue_t *q = list_entry(n, request_queue_t, plug_list);
+ BUG_ON(test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags));
- if (!test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags)) {
- list_del(&q->plug_list);
- generic_unplug_device(q);
- }
- }
-
- /*
- * add any remaining queue back to plug list
- */
- if (!list_empty(&local_plug_list)) {
- spin_lock_irqsave(&blk_plug_lock, flags);
- list_splice(&local_plug_list, &blk_plug_list);
- spin_unlock_irqrestore(&blk_plug_lock, flags);
+ q->unplug_fn(q);
}
}
@@ -1085,6 +1091,7 @@
q->front_merge_fn = ll_front_merge_fn;
q->merge_requests_fn = ll_merge_requests_fn;
q->prep_rq_fn = NULL;
+ q->unplug_fn = generic_unplug_device;
q->queue_flags = (1 << QUEUE_FLAG_CLUSTER);
q->queue_lock = lock;
--- /opt/kernel/linux-2.5.20/include/linux/blkdev.h 2002-06-03 10:35:40.000000000 +0200
+++ linux/include/linux/blkdev.h 2002-06-04 14:33:04.000000000 +0200
@@ -116,7 +116,7 @@
typedef request_queue_t * (queue_proc) (kdev_t dev);
typedef int (make_request_fn) (request_queue_t *q, struct bio *bio);
typedef int (prep_rq_fn) (request_queue_t *, struct request *);
-typedef void (unplug_device_fn) (void *q);
+typedef void (unplug_fn) (void *q);
enum blk_queue_state {
Queue_down,
@@ -160,6 +160,7 @@
merge_requests_fn *merge_requests_fn;
make_request_fn *make_request_fn;
prep_rq_fn *prep_rq_fn;
+ unplug_fn *unplug_fn;
struct backing_dev_info backing_dev_info;
@@ -209,13 +210,11 @@
#define RQ_SCSI_DONE 0xfffe
#define RQ_SCSI_DISCONNECTING 0xffe0
-#define QUEUE_FLAG_PLUGGED 0 /* queue is plugged */
-#define QUEUE_FLAG_CLUSTER 1 /* cluster several segments into 1 */
-#define QUEUE_FLAG_QUEUED 2 /* uses generic tag queueing */
-#define QUEUE_FLAG_STOPPED 3 /* queue is stopped */
+#define QUEUE_FLAG_CLUSTER 0 /* cluster several segments into 1 */
+#define QUEUE_FLAG_QUEUED 1 /* uses generic tag queueing */
+#define QUEUE_FLAG_STOPPED 2 /* queue is stopped */
-#define blk_queue_plugged(q) test_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags)
-#define blk_mark_plugged(q) set_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags)
+#define blk_queue_plugged(q) !list_empty(&(q)->plug_list)
#define blk_queue_tagged(q) test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags)
#define blk_queue_empty(q) elv_queue_empty(q)
#define list_entry_rq(ptr) list_entry((ptr), struct request, queuelist)
--
Jens Axboe
Neil,
I tried converting umem to see how it fit together, this is what I came
up with. This does use a queue per umem unit, but I think that's the
right split anyways. Maybe at some point we can use the per-major
statically allocated queues...
diff -ur -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.20/drivers/block/ll_rw_blk.c linux/drivers/block/ll_rw_blk.c
--- /opt/kernel/linux-2.5.20/drivers/block/ll_rw_blk.c 2002-06-03 10:35:35.000000000 +0200
+++ linux/drivers/block/ll_rw_blk.c 2002-06-04 16:14:40.000000000 +0200
@@ -49,6 +49,9 @@
*/
static kmem_cache_t *request_cachep;
+/*
+ * plug management
+ */
static struct list_head blk_plug_list;
static spinlock_t blk_plug_lock __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;
@@ -795,8 +798,8 @@
* force the transfer to start only after we have put all the requests
* on the list.
*
- * This is called with interrupts off and no requests on the queue.
- * (and with the request spinlock acquired)
+ * This is called with interrupts off and no requests on the queue and
+ * with the queue lock held.
*/
void blk_plug_device(request_queue_t *q)
{
@@ -806,7 +809,7 @@
if (!elv_queue_empty(q))
return;
- if (!test_and_set_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags)) {
+ if (!blk_queue_plugged(q)) {
spin_lock(&blk_plug_lock);
list_add_tail(&q->plug_list, &blk_plug_list);
spin_unlock(&blk_plug_lock);
@@ -814,14 +817,27 @@
}
/*
+ * remove the queue from the plugged list, if present. called with
+ * queue lock held and interrupts disabled.
+ */
+inline int blk_remove_plug(request_queue_t *q)
+{
+ if (blk_queue_plugged(q)) {
+ spin_lock(&blk_plug_lock);
+ list_del_init(&q->plug_list);
+ spin_unlock(&blk_plug_lock);
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
* remove the plug and let it rip..
*/
static inline void __generic_unplug_device(request_queue_t *q)
{
- /*
- * not plugged
- */
- if (!__test_and_clear_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags))
+ if (!blk_remove_plug(q))
return;
if (test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags))
@@ -849,11 +865,10 @@
void generic_unplug_device(void *data)
{
request_queue_t *q = data;
- unsigned long flags;
- spin_lock_irqsave(q->queue_lock, flags);
+ spin_lock_irq(q->queue_lock);
__generic_unplug_device(q);
- spin_unlock_irqrestore(q->queue_lock, flags);
+ spin_unlock_irq(q->queue_lock);
}
/**
@@ -893,6 +908,12 @@
**/
void blk_stop_queue(request_queue_t *q)
{
+ unsigned long flags;
+
+ spin_lock_irqsave(q->queue_lock, flags);
+ blk_remove_plug(q);
+ spin_unlock_irqrestore(q->queue_lock, flags);
+
set_bit(QUEUE_FLAG_STOPPED, &q->queue_flags);
}
@@ -904,45 +925,31 @@
* are currently stopped are ignored. This is equivalent to the older
* tq_disk task queue run.
**/
+#define blk_plug_entry(entry) list_entry((entry), request_queue_t, plug_list)
void blk_run_queues(void)
{
- struct list_head *n, *tmp, local_plug_list;
- unsigned long flags;
+ struct list_head local_plug_list;
INIT_LIST_HEAD(&local_plug_list);
+ spin_lock_irq(&blk_plug_lock);
+
/*
* this will happen fairly often
*/
- spin_lock_irqsave(&blk_plug_lock, flags);
if (list_empty(&blk_plug_list)) {
- spin_unlock_irqrestore(&blk_plug_lock, flags);
+ spin_unlock_irq(&blk_plug_lock);
return;
}
list_splice(&blk_plug_list, &local_plug_list);
INIT_LIST_HEAD(&blk_plug_list);
- spin_unlock_irqrestore(&blk_plug_lock, flags);
+ spin_unlock_irq(&blk_plug_lock);
+
+ while (!list_empty(&local_plug_list)) {
+ request_queue_t *q = blk_plug_entry(local_plug_list.next);
- /*
- * local_plug_list is now a private copy we can traverse lockless
- */
- list_for_each_safe(n, tmp, &local_plug_list) {
- request_queue_t *q = list_entry(n, request_queue_t, plug_list);
-
- if (!test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags)) {
- list_del(&q->plug_list);
- generic_unplug_device(q);
- }
- }
-
- /*
- * add any remaining queue back to plug list
- */
- if (!list_empty(&local_plug_list)) {
- spin_lock_irqsave(&blk_plug_lock, flags);
- list_splice(&local_plug_list, &blk_plug_list);
- spin_unlock_irqrestore(&blk_plug_lock, flags);
+ q->unplug_fn(q);
}
}
@@ -1085,6 +1092,7 @@
q->front_merge_fn = ll_front_merge_fn;
q->merge_requests_fn = ll_merge_requests_fn;
q->prep_rq_fn = NULL;
+ q->unplug_fn = generic_unplug_device;
q->queue_flags = (1 << QUEUE_FLAG_CLUSTER);
q->queue_lock = lock;
@@ -1352,7 +1360,7 @@
static int __make_request(request_queue_t *q, struct bio *bio)
{
struct request *req, *freereq = NULL;
- int el_ret, rw, nr_sectors, cur_nr_sectors, barrier;
+ int el_ret, rw, nr_sectors, cur_nr_sectors;
struct list_head *insert_here;
sector_t sector;
@@ -1368,16 +1376,12 @@
*/
blk_queue_bounce(q, &bio);
- spin_lock_prefetch(q->queue_lock);
-
- barrier = test_bit(BIO_RW_BARRIER, &bio->bi_rw);
-
spin_lock_irq(q->queue_lock);
again:
req = NULL;
insert_here = q->queue_head.prev;
- if (blk_queue_empty(q) || barrier) {
+ if (blk_queue_empty(q) || bio_barrier(bio)) {
blk_plug_device(q);
goto get_rq;
}
@@ -1477,7 +1481,7 @@
/*
* REQ_BARRIER implies no merging, but lets make it explicit
*/
- if (barrier)
+ if (bio_barrier(bio))
req->flags |= (REQ_BARRIER | REQ_NOMERGE);
req->errors = 0;
@@ -1489,9 +1493,7 @@
req->buffer = bio_data(bio); /* see ->buffer comment above */
req->waiting = NULL;
req->bio = req->biotail = bio;
- if (bio->bi_bdev)
- req->rq_dev = to_kdev_t(bio->bi_bdev->bd_dev);
- else req->rq_dev = NODEV;
+ req->rq_dev = to_kdev_t(bio->bi_bdev->bd_dev);
add_request(q, req, insert_here);
out:
if (freereq)
@@ -2002,6 +2004,8 @@
EXPORT_SYMBOL(generic_make_request);
EXPORT_SYMBOL(blkdev_release_request);
EXPORT_SYMBOL(generic_unplug_device);
+EXPORT_SYMBOL(blk_plug_device);
+EXPORT_SYMBOL(blk_remove_plug);
EXPORT_SYMBOL(blk_attempt_remerge);
EXPORT_SYMBOL(blk_max_low_pfn);
EXPORT_SYMBOL(blk_max_pfn);
diff -ur -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.20/drivers/block/umem.c linux/drivers/block/umem.c
--- /opt/kernel/linux-2.5.20/drivers/block/umem.c 2002-05-29 20:42:54.000000000 +0200
+++ linux/drivers/block/umem.c 2002-06-04 16:13:48.000000000 +0200
@@ -128,6 +128,8 @@
*/
struct bio *bio, *currentbio, **biotail;
+ request_queue_t queue;
+
struct mm_page {
dma_addr_t page_dma;
struct mm_dma_desc *desc;
@@ -141,8 +143,6 @@
struct tasklet_struct tasklet;
unsigned int dma_status;
- struct tq_struct plug_tq;
-
struct {
int good;
int warned;
@@ -383,7 +383,10 @@
static void mm_unplug_device(void *data)
{
- struct cardinfo *card = data;
+ request_queue_t *q = data;
+ struct cardinfo *card = q->queuedata;
+
+ blk_remove_plug(q);
spin_lock_bh(&card->lock);
activate(card);
@@ -577,7 +580,7 @@
card->biotail = &bio->bi_next;
spin_unlock_bh(&card->lock);
- queue_task(&card->plug_tq, &tq_disk);
+ blk_plug_device(q);
return 0;
}
@@ -1064,11 +1067,12 @@
card->bio = NULL;
card->biotail = &card->bio;
+ blk_queue_make_request(&card->queue, mm_make_request);
+ card->queue.queuedata = card;
+ card->queue.unplug_fn = mm_unplug_device;
+
tasklet_init(&card->tasklet, process_page, (unsigned long)card);
- card->plug_tq.sync = 0;
- card->plug_tq.routine = &mm_unplug_device;
- card->plug_tq.data = card;
card->check_batteries = 0;
mem_present = readb(card->csr_remap + MEMCTRLSTATUS_MEMORY);
@@ -1277,9 +1281,6 @@
add_gendisk(&mm_gendisk);
- blk_queue_make_request(BLK_DEFAULT_QUEUE(MAJOR_NR),
- mm_make_request);
-
blk_size[MAJOR_NR] = mm_gendisk.sizes;
for (i = 0; i < num_cards; i++) {
register_disk(&mm_gendisk, mk_kdev(MAJOR_NR, i<<MM_SHIFT), MM_SHIFT,
diff -ur -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.20/include/linux/bio.h linux/include/linux/bio.h
--- /opt/kernel/linux-2.5.20/include/linux/bio.h 2002-05-29 20:42:57.000000000 +0200
+++ linux/include/linux/bio.h 2002-06-04 16:14:59.000000000 +0200
@@ -123,7 +123,7 @@
#define bio_offset(bio) bio_iovec((bio))->bv_offset
#define bio_sectors(bio) ((bio)->bi_size >> 9)
#define bio_data(bio) (page_address(bio_page((bio))) + bio_offset((bio)))
-#define bio_barrier(bio) ((bio)->bi_rw & (1 << BIO_BARRIER))
+#define bio_barrier(bio) ((bio)->bi_rw & (1 << BIO_RW_BARRIER))
/*
* will die
diff -ur -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.20/include/linux/blkdev.h linux/include/linux/blkdev.h
--- /opt/kernel/linux-2.5.20/include/linux/blkdev.h 2002-06-03 10:35:40.000000000 +0200
+++ linux/include/linux/blkdev.h 2002-06-04 16:15:22.000000000 +0200
@@ -116,7 +116,7 @@
typedef request_queue_t * (queue_proc) (kdev_t dev);
typedef int (make_request_fn) (request_queue_t *q, struct bio *bio);
typedef int (prep_rq_fn) (request_queue_t *, struct request *);
-typedef void (unplug_device_fn) (void *q);
+typedef void (unplug_fn) (void *q);
enum blk_queue_state {
Queue_down,
@@ -160,6 +160,7 @@
merge_requests_fn *merge_requests_fn;
make_request_fn *make_request_fn;
prep_rq_fn *prep_rq_fn;
+ unplug_fn *unplug_fn;
struct backing_dev_info backing_dev_info;
@@ -209,13 +210,11 @@
#define RQ_SCSI_DONE 0xfffe
#define RQ_SCSI_DISCONNECTING 0xffe0
-#define QUEUE_FLAG_PLUGGED 0 /* queue is plugged */
-#define QUEUE_FLAG_CLUSTER 1 /* cluster several segments into 1 */
-#define QUEUE_FLAG_QUEUED 2 /* uses generic tag queueing */
-#define QUEUE_FLAG_STOPPED 3 /* queue is stopped */
+#define QUEUE_FLAG_CLUSTER 0 /* cluster several segments into 1 */
+#define QUEUE_FLAG_QUEUED 1 /* uses generic tag queueing */
+#define QUEUE_FLAG_STOPPED 2 /* queue is stopped */
-#define blk_queue_plugged(q) test_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags)
-#define blk_mark_plugged(q) set_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags)
+#define blk_queue_plugged(q) !list_empty(&(q)->plug_list)
#define blk_queue_tagged(q) test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags)
#define blk_queue_empty(q) elv_queue_empty(q)
#define list_entry_rq(ptr) list_entry((ptr), struct request, queuelist)
@@ -295,6 +294,7 @@
extern struct request *blk_get_request(request_queue_t *, int, int);
extern void blk_put_request(struct request *);
extern void blk_plug_device(request_queue_t *);
+extern int blk_remove_plug(request_queue_t *);
extern void blk_recount_segments(request_queue_t *, struct bio *);
extern inline int blk_phys_contig_segment(request_queue_t *q, struct bio *, struct bio *);
extern inline int blk_hw_contig_segment(request_queue_t *q, struct bio *, struct bio *);
--
Jens Axboe
Jens Axboe wrote:
> Neil,
>
> I tried converting umem to see how it fit together, this is what I came
> up with. This does use a queue per umem unit, but I think that's the
> right split anyways. Maybe at some point we can use the per-major
> statically allocated queues...
>
> /*
> + * remove the queue from the plugged list, if present. called with
> + * queue lock held and interrupts disabled.
> + */
> +inline int blk_remove_plug(request_queue_t *q)
Jens - I have noticed some unlikely() tag "optimizations" in
tcq code too.
Please tell my, why do you attribute this exported function as inline?
I "hardly doubt" that it will ever show up on any profile.
Contrary to popular school generally it only pays out to unroll vector code
on modern CPUs not decision code like the above.
On Tue, Jun 04 2002, Martin Dalecki wrote:
> Jens Axboe wrote:
> >Neil,
> >
> >I tried converting umem to see how it fit together, this is what I came
> >up with. This does use a queue per umem unit, but I think that's the
> >right split anyways. Maybe at some point we can use the per-major
> >statically allocated queues...
>
> >
> > /*
> >+ * remove the queue from the plugged list, if present. called with
> >+ * queue lock held and interrupts disabled.
> >+ */
> >+inline int blk_remove_plug(request_queue_t *q)
>
>
> Jens - I have noticed some unlikely() tag "optimizations" in
> tcq code too.
> Please tell my, why do you attribute this exported function as inline?
> I "hardly doubt" that it will ever show up on any profile.
> Contrary to popular school generally it only pays out to unroll vector code
> on modern CPUs not decision code like the above.
I doubt it matters much in this case. But it certainly isn't called
often enough to justify the inline, I'll uninline later.
WRT the unlikely(), if you have the hints available, why not pass them
on?
--
Jens Axboe
Jens Axboe wrote:
> On Tue, Jun 04 2002, Martin Dalecki wrote:
>
>>Jens Axboe wrote:
>>
>>>Neil,
>>>
>>>I tried converting umem to see how it fit together, this is what I came
>>>up with. This does use a queue per umem unit, but I think that's the
>>>right split anyways. Maybe at some point we can use the per-major
>>>statically allocated queues...
>>
>>>/*
>>>+ * remove the queue from the plugged list, if present. called with
>>>+ * queue lock held and interrupts disabled.
>>>+ */
>>>+inline int blk_remove_plug(request_queue_t *q)
>>
>>
>>Jens - I have noticed some unlikely() tag "optimizations" in
>>tcq code too.
>>Please tell my, why do you attribute this exported function as inline?
>>I "hardly doubt" that it will ever show up on any profile.
>>Contrary to popular school generally it only pays out to unroll vector code
>>on modern CPUs not decision code like the above.
>
>
> I doubt it matters much in this case. But it certainly isn't called
> often enough to justify the inline, I'll uninline later.
>
> WRT the unlikely(), if you have the hints available, why not pass them
> on?
Well it's kind like the answer to the question: why don't do it all in hand
optimized assembler? Or in other words - let's give the GCC guys good
reasons for more hard work. But more seriously:
Things like unlikely() tricks and other friends seldomly really
pay off if applied randomly. But they can:
1. Have quite contrary effects to what one would expect due to
the fact that one is targetting a single instruction set but in
reality multiple very different CPU archs or even multiple archs.
2. Changes/improvements to the compiler.
My personal rule of thumb is - don't do something like the
above unless you did:
1. Some real global profiling.
2. Some real CPU cycle counting on the micro level.
3. You really have too. (This should be number 1!)
Unless one of the above cases applies there is only one rule
for true code optimization, which appears to be generally
valid: Write tight code. It will help small and old
systems which usually have quite narrow memmory constrains
and on bigger systems it will help keeping the intruction
caches and internal instruction decode more happy.
Think for example about hints for:
1. AMD's and P4's internal instruction predecode/CISC-RISC translation.
Small functions will give immediately the information - "hey buddy
you saw this code already once...". But inlined stuff will still
trash the predecode engines along the single execution.
2. Think on the fly instruction set translation (Transmeta).
Its rather pretty obvious that a small function is acutally more likely
to be more palatable to software like that...
3. Even the Cyrix486 contained already very efficient mechanism to make
call instructions nearly zero cost in terms of instruction
cycles. (Think return address stack and hints for branch prediction.)
Of corse the above is all valid for decission code, which is the case here,
but not necessarily for tight loops of vectorized operations...
Hi Martin,
Hi Jens,
Hi LKML,
On Tue, Jun 04, 2002 at 03:45:11PM +0200, Martin Dalecki wrote:
> Jens - I have noticed some unlikely() tag "optimizations" in
> tcq code too.
unlikely() shows the reader immediately that this is (considered)
a "rare" condition. This might stop janitors optimizing these
cases at all or let people with deeper knowledge check, whether
this is REALLY rare.
likely() should lead to the analogous actions.
So this is not only a hint for the compiler and I'm very happy to
see this being used and can stand the code clutter caused by
this ;-)
OTOH your point about inline is very valid. I use it only for
code splitting or trivial functions with decisions on compile
time constants. Everything else just bloats the *.o files.
Regards
Ingo Oeser
--
Science is what we can tell a computer. Art is everything else. --- D.E.Knuth
Martin Dalecki <[email protected]> said:
> Well it's kind like the answer to the question: why don't do it all in hand
> optimized assembler? Or in other words - let's give the GCC guys good
> reasons for more hard work. But more seriously:
>
> Things like unlikely() tricks and other friends seldomly really
> pay off if applied randomly. But they can:
>
> 1. Have quite contrary effects to what one would expect due to
> the fact that one is targetting a single instruction set but in
> reality multiple very different CPU archs or even multiple archs.
>
> 2. Changes/improvements to the compiler.
>
> My personal rule of thumb is - don't do something like the
> above unless you did:
>
> 1. Some real global profiling.
> 2. Some real CPU cycle counting on the micro level.
> 3. You really have too. (This should be number 1!)
Anybody trying to tune code should read (and learn by heart):
@Book{bentley82:_writing_eff_progr,
author = {Jon Louis Bentley},
title = {Writing Efficient Programs},
publisher = {Prentice Hall},
year = 1982
}
(sadly out of print AFAIK)
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
On Tue, 2002-06-04 at 04:51, Jens Axboe wrote:
> On Mon, Jun 03 2002, Mike Black wrote:
> > RAID5 still doesn't compile....sigh....
>
> [snip]
>
> Some people do nothing but complain instead of trying to fix things.
> Sigh...
Hi Jens.
Maybe Mike Black is a kernel hacking guru and, therefore,
deserves your impatience.
I send in a lot of bug reports. I don't sigh about finding
bugs, because I enjoy testing and want to contribute to the
kernel evolution as much as possible. Testing is my core
competency, so that's what I focus on.
Anyhow, it's helpful when developers are patient with
the limitations of us testers.
Miles
On Tuesday June 4, [email protected] wrote:
> On Tue, Jun 04 2002, Jens Axboe wrote:
> > On Tue, Jun 04 2002, Jens Axboe wrote:
> > > plug? Wrt umem, it seems you could keep 'card' in the queuedata. Same
> > > for raid5 and conf.
> >
> > Ok by actually looking at it, both card and conf are the queues
> > themselves. So I'd say your approach is indeed a bit overkill. I'll send
> > a patch in half an hour for you to review.
>
> Alright here's the block part of it, you'll need to merge your umem and
> raid5 changes in with that. I'm just interested in knowing if this is
> all you want/need. It's actually just a two line changes from the last
> version posted -- set q->unplug_fn in blk_init_queue to our default
> (you'll override that for umem and raid5, or rather you'll set it
> yourself after blk_queue_make_request()), and then call that instead of
> __generic_unplug_device directly from blk_run_queues().
>
I can work with that...
Below is a patch that fixes a couple of problems with umem and add
support for md/raid5. I haven't tested it yet, but it looks right. It
is on top of your patch.
It irks me, though, that I need to embed a whole request_queue_t when
I don't use the vast majority of it.
What I would like is to have a
struct blk_dev {
make_request_fn *make_request_fn;
unplug_fn *unplug_fn;
struct list_head plug_list;
unsigned long queue_flags;
spinlock_t *queue_lock;
}
which I can embed in mddev_s and umem card, and you can embed in
struct request_queue.
Then struct block_device have have
struct blk_dev bd_bdev;
instead of
struct request_queue bd_queue;
and we could each find the containing structure from the "struct blk_dev"
using a list_entry type macro.
There would be a bit of an issue with this in that it would then not
make sense to have a full request_queue_t in struct block_device, and
users of BLK_DEFAULT_QUEUE would need to allocate their own
request_queue_t.... is that such a bad idea?
Anyway, I can live with the current situation if you don't like the
above.
NeilBrown
-------------------------------------------------------------
Extend plugging work for md/raid5 and fix umem
We embed a request_queue_t in the mddev structure and so
have a separate one for each mddev.
This is used for plugging (in raid5).
Given this embeded request_queue_t, md_make_request no-longer
needs to make from device number to mddev, but can map from
the queue to the mddev instead.
In umem, we move blk_plug_device and blk_remove_plug
inside the lock, and device a ->queue function to return
the right queue for each device.
In ll_rw_blk.c we change blk_plug_device to avoid the check for
the queue being empty as this may not be well define for
umem/raid5. Instead, blk_plug_device is not called when
the queue is not empty (which is mostly wasn' anyway).
----------- Diffstat output ------------
./drivers/block/ll_rw_blk.c | 10 +++-------
./drivers/block/umem.c | 20 ++++++++++++++++----
./drivers/md/md.c | 25 ++++++++++++++++++++++---
./drivers/md/raid5.c | 21 +++++++--------------
./include/linux/raid/md_k.h | 2 ++
./include/linux/raid/raid5.h | 5 +----
6 files changed, 51 insertions(+), 32 deletions(-)
--- ./include/linux/raid/md_k.h 2002/06/05 00:15:12 1.1
+++ ./include/linux/raid/md_k.h 2002/06/05 00:53:06 1.2
@@ -214,6 +214,8 @@
atomic_t recovery_active; /* blocks scheduled, but not written */
wait_queue_head_t recovery_wait;
+ request_queue_t queue; /* for plugging ... */
+
struct list_head all_mddevs;
};
--- ./include/linux/raid/raid5.h 2002/06/05 00:15:55 1.1
+++ ./include/linux/raid/raid5.h 2002/06/05 00:53:06 1.2
@@ -176,7 +176,7 @@
* is put on a "delayed" queue until there are no stripes currently
* in a pre-read phase. Further, if the "delayed" queue is empty when
* a stripe is put on it then we "plug" the queue and do not process it
- * until an unplg call is made. (the tq_disk list is run).
+ * until an unplug call is made. (blk_run_queues is run).
*
* When preread is initiated on a stripe, we set PREREAD_ACTIVE and add
* it to the count of prereading stripes.
@@ -228,9 +228,6 @@
* waiting for 25% to be free
*/
spinlock_t device_lock;
-
- int plugged;
- struct tq_struct plug_tq;
};
typedef struct raid5_private_data raid5_conf_t;
--- ./drivers/block/ll_rw_blk.c 2002/06/05 00:13:22 1.2
+++ ./drivers/block/ll_rw_blk.c 2002/06/05 00:53:06 1.3
@@ -803,12 +803,6 @@
*/
void blk_plug_device(request_queue_t *q)
{
- /*
- * common case
- */
- if (!elv_queue_empty(q))
- return;
-
if (!blk_queue_plugged(q)) {
spin_lock(&blk_plug_lock);
list_add_tail(&q->plug_list, &blk_plug_list);
@@ -1381,10 +1375,12 @@
req = NULL;
insert_here = q->queue_head.prev;
- if (blk_queue_empty(q) || bio_barrier(bio)) {
+ if (blk_queue_empty(q)) {
blk_plug_device(q);
goto get_rq;
}
+ if (bio_barrier(bio))
+ goto get_rq;
el_ret = elv_merge(q, &req, bio);
switch (el_ret) {
--- ./drivers/block/umem.c 2002/06/05 00:13:22 1.2
+++ ./drivers/block/umem.c 2002/06/05 00:53:06 1.3
@@ -292,7 +292,7 @@
* Whenever IO on the active page completes, the Ready page is activated
* and the ex-Active page is clean out and made Ready.
* Otherwise the Ready page is only activated when it becomes full, or
- * when mm_unplug_device is called via run_task_queue(&tq_disk).
+ * when mm_unplug_device is called via blk_run_queues().
*
* If a request arrives while both pages a full, it is queued, and b_rdev is
* overloaded to record whether it was a read or a write.
@@ -386,10 +386,10 @@
request_queue_t *q = data;
struct cardinfo *card = q->queuedata;
- blk_remove_plug(q);
spin_lock_bh(&card->lock);
- activate(card);
+ if (blk_remove_plug(q))
+ activate(card);
spin_unlock_bh(&card->lock);
}
@@ -578,9 +578,9 @@
*card->biotail = bio;
bio->bi_next = NULL;
card->biotail = &bio->bi_next;
+ blk_plug_device(q);
spin_unlock_bh(&card->lock);
- blk_plug_device(q);
return 0;
}
@@ -1240,6 +1240,17 @@
-- mm_init
-----------------------------------------------------------------------------------
*/
+
+static request_queue_t * mm_queue_proc(kdev_t dev)
+{
+ int c = DEVICE_NR(bio->bi_bdev->bd_dev);
+
+ if (c < MM_MAXCARDS)
+ return &cards[c].queue;
+ else
+ return BLK_DEFAULT_QUEUE(MAJOR_NR);
+}
+
int __init mm_init(void)
{
int retval, i;
@@ -1279,6 +1290,7 @@
mm_gendisk.part = mm_partitions;
mm_gendisk.nr_real = num_cards;
+ blk_dev[MAJOR_NR].queue = mm_queue_proc;
add_gendisk(&mm_gendisk);
blk_size[MAJOR_NR] = mm_gendisk.sizes;
--- ./drivers/md/raid5.c 2002/06/05 00:24:34 1.1
+++ ./drivers/md/raid5.c 2002/06/05 00:53:06 1.2
@@ -1225,14 +1225,14 @@
}
static void raid5_unplug_device(void *data)
{
- raid5_conf_t *conf = (raid5_conf_t *)data;
+ request_queue_t *q = data;
+ raid5_conf_t *conf = q->queuedata;
unsigned long flags;
spin_lock_irqsave(&conf->device_lock, flags);
- raid5_activate_delayed(conf);
-
- conf->plugged = 0;
+ if (blk_remove_plug(q))
+ raid5_activate_delayed(conf);
md_wakeup_thread(conf->thread);
spin_unlock_irqrestore(&conf->device_lock, flags);
@@ -1241,11 +1241,7 @@
static inline void raid5_plug_device(raid5_conf_t *conf)
{
spin_lock_irq(&conf->device_lock);
- if (list_empty(&conf->delayed_list))
- if (!conf->plugged) {
- conf->plugged = 1;
- queue_task(&conf->plug_tq, &tq_disk);
- }
+ blk_plug_device(&conf->mddev->rq);
spin_unlock_irq(&conf->device_lock);
}
@@ -1352,7 +1348,7 @@
if (list_empty(&conf->handle_list) &&
atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD &&
- !conf->plugged &&
+ !blk_queue_plugged(&mddev->queue) &&
!list_empty(&conf->delayed_list))
raid5_activate_delayed(conf);
@@ -1443,10 +1439,7 @@
atomic_set(&conf->active_stripes, 0);
atomic_set(&conf->preread_active_stripes, 0);
- conf->plugged = 0;
- conf->plug_tq.sync = 0;
- conf->plug_tq.routine = &raid5_unplug_device;
- conf->plug_tq.data = conf;
+ mddev->queue.unplug_fn = raid5_unplug_device;
PRINTK("raid5: run(md%d) called.\n", mdidx(mddev));
--- ./drivers/md/md.c 2002/06/05 00:30:58 1.1
+++ ./drivers/md/md.c 2002/06/05 00:53:06 1.2
@@ -171,7 +171,7 @@
static int md_make_request (request_queue_t *q, struct bio *bio)
{
- mddev_t *mddev = kdev_to_mddev(to_kdev_t(bio->bi_bdev->bd_dev));
+ mddev_t *mddev = q->queuedata;
if (mddev && mddev->pers)
return mddev->pers->make_request(mddev, bio_rw(bio), bio);
@@ -181,6 +181,12 @@
}
}
+static int md_fail_request (request_queue_t *q, struct bio *bio)
+{
+ bio_io_error(bio);
+ return 0;
+}
+
static mddev_t * alloc_mddev(kdev_t dev)
{
mddev_t *mddev;
@@ -1710,6 +1716,9 @@
}
mddev->pers = pers[pnum];
+ blk_queue_make_request(&mddev->queue, md_make_request);
+ mddev->queue.queuedata = mddev;
+
err = mddev->pers->run(mddev);
if (err) {
printk(KERN_ERR "md: pers->run() failed ...\n");
@@ -3615,6 +3624,15 @@
#endif
}
+request_queue_t * md_queue_proc(kdev_t dev)
+{
+ mddev_t *mddev = kdev_to_mddev(dev);
+ if (mddev == NULL)
+ return BLK_DEFAULT_QUEUE(MAJOR_NR);
+ else
+ return &mddev->queue;
+}
+
int __init md_init(void)
{
static char * name = "mdrecoveryd";
@@ -3639,8 +3657,9 @@
S_IFBLK | S_IRUSR | S_IWUSR, &md_fops, NULL);
}
- /* forward all md request to md_make_request */
- blk_queue_make_request(BLK_DEFAULT_QUEUE(MAJOR_NR), md_make_request);
+ /* all requests on an uninitialised device get failed... */
+ blk_queue_make_request(BLK_DEFAULT_QUEUE(MAJOR_NR), md_fail_request);
+ blk_dev[MAJOR_NR].queue = md_queue_proc;
add_gendisk(&md_gendisk);
On Wed, Jun 05 2002, Neil Brown wrote:
> On Tuesday June 4, [email protected] wrote:
> > On Tue, Jun 04 2002, Jens Axboe wrote:
> > > On Tue, Jun 04 2002, Jens Axboe wrote:
> > > > plug? Wrt umem, it seems you could keep 'card' in the queuedata. Same
> > > > for raid5 and conf.
> > >
> > > Ok by actually looking at it, both card and conf are the queues
> > > themselves. So I'd say your approach is indeed a bit overkill. I'll send
> > > a patch in half an hour for you to review.
> >
> > Alright here's the block part of it, you'll need to merge your umem and
> > raid5 changes in with that. I'm just interested in knowing if this is
> > all you want/need. It's actually just a two line changes from the last
> > version posted -- set q->unplug_fn in blk_init_queue to our default
> > (you'll override that for umem and raid5, or rather you'll set it
> > yourself after blk_queue_make_request()), and then call that instead of
> > __generic_unplug_device directly from blk_run_queues().
> >
> I can work with that...
>
> Below is a patch that fixes a couple of problems with umem and add
> support for md/raid5. I haven't tested it yet, but it looks right. It
> is on top of your patch.
>
> It irks me, though, that I need to embed a whole request_queue_t when
> I don't use the vast majority of it.
Yeah this annoys me too...
> What I would like is to have a
>
> struct blk_dev {
> make_request_fn *make_request_fn;
> unplug_fn *unplug_fn;
> struct list_head plug_list;
> unsigned long queue_flags;
> spinlock_t *queue_lock;
> }
>
> which I can embed in mddev_s and umem card, and you can embed in
> struct request_queue.
To some extent, I agree with you. However, I'm not sure it's worth it
abstracting this bit out. The size of the request queue right now in
2.5.20 (with plug change) is 196 bytes on i386 SMP. If you really feel
it's worth saving these bytes, then I'd much rather go in a different
direction: only keep the "main" elements of request_queue_t, and have
blk_init_queue() alloc the "remainder"
> In ll_rw_blk.c we change blk_plug_device to avoid the check for
> the queue being empty as this may not be well define for
> umem/raid5. Instead, blk_plug_device is not called when
> the queue is not empty (which is mostly wasn' anyway).
I left it that way on purpose, and yes I did see you changed that in the
previous patch as well. I think it's a bit of a mess to have both
blk_plug_device and blk_plug_queue. Without looking at it, I have no
idea which does what?! blk_queue_empty() will always be true for
make_request_fn type drivers, so no change is necessary there.
--
Jens Axboe
On Tue, Jun 04 2002, Martin Dalecki wrote:
> >WRT the unlikely(), if you have the hints available, why not pass them
> >on?
>
> Well it's kind like the answer to the question: why don't do it all in hand
> optimized assembler? Or in other words - let's give the GCC guys good
[snip]
If I didn't know better Martin, I would say that you are trolling.
Surely you have better ways to spend your time than to fly fuck [1]
every single patch posted on lkml? :-)
I would agree with your entire email if you were talking about inlining.
Yes we should not inline excessively without a good reason. However, the
likely/unlikely hints provide both clues to the code reader (as someone
else already established) about which is the likely code path or not,
and of course to the compiler. I would agree that profiling would in
some cases be needed before slapping on an unlikely() sticker. These are
the cases where you do not know what the call path is typically like.
When I put unlikely() in there, it's because I know there's a 1:50 ratio
or more. Or maybe a 1:inf ratio, meaning it should only trigger because
of a bug.
[1] To "fuck a fly" is a danish expression, meaning excessive attention
to detail.
--
Jens Axboe
On Wednesday June 5, [email protected] wrote:
>
> To some extent, I agree with you. However, I'm not sure it's worth it
> abstracting this bit out. The size of the request queue right now in
> 2.5.20 (with plug change) is 196 bytes on i386 SMP. If you really feel
> it's worth saving these bytes, then I'd much rather go in a different
> direction: only keep the "main" elements of request_queue_t, and have
> blk_init_queue() alloc the "remainder"
It's not really the bytes that worry about. More the cleanliness of
the abstraction.
I actually realy like the approach of embeding a more general data
structure inside a more specific data structure and using a list_entry
type macro to get from the one to the other... it has a very OO feel
to it (which is, in this case, good I think) but I suspect it a very
personal-taste sort of thing.
>
> > In ll_rw_blk.c we change blk_plug_device to avoid the check for
> > the queue being empty as this may not be well define for
> > umem/raid5. Instead, blk_plug_device is not called when
> > the queue is not empty (which is mostly wasn' anyway).
>
> I left it that way on purpose, and yes I did see you changed that in the
> previous patch as well. I think it's a bit of a mess to have both
> blk_plug_device and blk_plug_queue. Without looking at it, I have no
> idea which does what?! blk_queue_empty() will always be true for
> make_request_fn type drivers, so no change is necessary there.
I'm not pushing the blk_plug_device vs blk_plug_queue distinction.
It is just that I think the current blk_plug_device is wrong... let
me try to show you why..
First, look where it is called, in __make_request (which I always
thought should be spelt "elevator_make_request") - note that this is
the ONLY place that it is called other than the calls that have just
been introduced in md and umem - :
if (blk_queue_empty(q) || bio_barrier(bio)) {
blk_plug_device(q);
goto get_rq;
}
Which says "if the queue is empty or this is a barrier bio, then
plug the queue and go an make a request, skipping the merging".
One then wonders why you would want to plug the queue for a barrier
bio. The queue-empty bit makes sense, but not the barrier bit...
Ok, lets see exactly what blk_plug_device(q) does by (mentally)
inlining it:
if (blk_queue_empty(q) || bio_barrier(bio)) {
if (!elv_queue_empty(q))
goto return;
if (!blk_queue_plugged(q)) {
spin_lock(&blk_plug_lock);
list_add_tail(&q->plug_list, &blk_plug_list);
spin_unlock(&blk_plug_lock);
}
return:
goto get_rq;
}
So we are actually plugging it if it isn't already plugged (makes
sense) and elv_queue_empty, and blk_queue_empty ... or bio_barrier.
I wander what those two differnt *_queue_empty functions are .....
looks in blkdev.h.. Oh, they are the same.
So we can simplify this a bit:
If (the_queue_is_empty(q)) {
plug_if_not_plugged(q);
goto get_rq;
}
if (bio_barrier(bio)) {
/* we know the queue is not empty, so avoid that check and
simply don't plug */
goto get_rq;
}
Now that makes sense. The answer to the question "why would you want
to plug the queue for a barrier bio?" is that you don't.
This is how I came to the change the I suggested. The current code is
confusing, and testing elv_queue_empty in blk_plug_device is really a
layering violation.
You are correct from a operational sense when you say that "no change
is necessary there" (providing the queue_head is initialised, which it
is by blk_init_queue via elevator_init, but isn't by
blk_queue_make_request) but I don't think you are correct from an
abstractional purity perspective.
NeilBrown
On Thu, Jun 06 2002, Neil Brown wrote:
> > > In ll_rw_blk.c we change blk_plug_device to avoid the check for
> > > the queue being empty as this may not be well define for
> > > umem/raid5. Instead, blk_plug_device is not called when
> > > the queue is not empty (which is mostly wasn' anyway).
> >
> > I left it that way on purpose, and yes I did see you changed that in the
> > previous patch as well. I think it's a bit of a mess to have both
> > blk_plug_device and blk_plug_queue. Without looking at it, I have no
> > idea which does what?! blk_queue_empty() will always be true for
> > make_request_fn type drivers, so no change is necessary there.
>
> I'm not pushing the blk_plug_device vs blk_plug_queue distinction.
>
> It is just that I think the current blk_plug_device is wrong... let
> me try to show you why..
>
> First, look where it is called, in __make_request (which I always
> thought should be spelt "elevator_make_request") - note that this is
> the ONLY place that it is called other than the calls that have just
> been introduced in md and umem - :
>
>
> if (blk_queue_empty(q) || bio_barrier(bio)) {
> blk_plug_device(q);
> goto get_rq;
> }
>
> Which says "if the queue is empty or this is a barrier bio, then
> plug the queue and go an make a request, skipping the merging".
>
> One then wonders why you would want to plug the queue for a barrier
> bio. The queue-empty bit makes sense, but not the barrier bit...
No for the barrier bit we need not do the plugging, we just need to skip
the merge step and go directly to grabbing a new request. However if the
queue is indeed empty, then we do need to plug it. See?
> Ok, lets see exactly what blk_plug_device(q) does by (mentally)
> inlining it:
>
> if (blk_queue_empty(q) || bio_barrier(bio)) {
>
> if (!elv_queue_empty(q))
> goto return;
>
> if (!blk_queue_plugged(q)) {
> spin_lock(&blk_plug_lock);
> list_add_tail(&q->plug_list, &blk_plug_list);
> spin_unlock(&blk_plug_lock);
> }
> return:
> goto get_rq;
> }
>
> So we are actually plugging it if it isn't already plugged (makes
> sense) and elv_queue_empty, and blk_queue_empty ... or bio_barrier.
> I wander what those two differnt *_queue_empty functions are .....
> looks in blkdev.h.. Oh, they are the same.
Oh agreed, I'll get rid of one of them.
> So we can simplify this a bit:
>
> If (the_queue_is_empty(q)) {
> plug_if_not_plugged(q);
> goto get_rq;
> }
> if (bio_barrier(bio)) {
> /* we know the queue is not empty, so avoid that check and
> simply don't plug */
> goto get_rq;
> }
>
> Now that makes sense. The answer to the question "why would you want
> to plug the queue for a barrier bio?" is that you don't.
The answer is that you don't want to plug it anymore than what you do
for a regular request, but see what I wrote above.
> This is how I came to the change the I suggested. The current code is
> confusing, and testing elv_queue_empty in blk_plug_device is really a
> layering violation.
>
> You are correct from a operational sense when you say that "no change
> is necessary there" (providing the queue_head is initialised, which it
> is by blk_init_queue via elevator_init, but isn't by
> blk_queue_make_request) but I don't think you are correct from an
> abstractional purity perspective.
Alright you've convinced me about that part. Will you send me the
updated patch?
--
Jens Axboe