2004-11-24 07:37:11

by Christopher S. Aker

[permalink] [raw]
Subject: 2.6.10-rc2-bk7 - Badness in cfq_put_request at drivers/block/cfq-iosched.c:1402

Badness in cfq_put_request at drivers/block/cfq-iosched.c:1402
[<c025eed4>] cfq_put_request+0xcc/0x119
[<c0252520>] elv_put_request+0x25/0x27
[<c02553a5>] __blk_put_request+0x66/0xab
[<c0256647>] end_that_request_last+0x6a/0x10b
[<c029d836>] scsi_end_request+0xbf/0xe6
[<c029db11>] scsi_io_completion+0x117/0x4b6
[<c029b2e2>] scsi_delete_timer+0x1a/0x66
[<c02a9290>] sd_rw_intr+0x89/0x30f
[<c0114472>] rebalance_tick+0xac/0xbb
[<c0298e8a>] scsi_finish_command+0x85/0xd9
[<c0298d9d>] scsi_softirq+0xb7/0xdd
[<c011cba7>] __do_softirq+0xb7/0xc6
[<c011cbe3>] do_softirq+0x2d/0x2f
[<c01046b6>] do_IRQ+0x1e/0x24
[<c0102db2>] common_interrupt+0x1a/0x20
[<c01005da>] mwait_idle+0x31/0x48
[<c01005a0>] cpu_idle+0x33/0x3c
[<c046aa49>] start_kernel+0x175/0x1b1
[<c046a4bd>] unknown_bootoption+0x0/0x1ab

-Chris


2004-11-24 13:16:23

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.10-rc2-bk7 - Badness in cfq_put_request at drivers/block/cfq-iosched.c:1402

On Wed, Nov 24 2004, Christopher S. Aker wrote:
> Badness in cfq_put_request at drivers/block/cfq-iosched.c:1402
> [<c025eed4>] cfq_put_request+0xcc/0x119
> [<c0252520>] elv_put_request+0x25/0x27
> [<c02553a5>] __blk_put_request+0x66/0xab
> [<c0256647>] end_that_request_last+0x6a/0x10b
> [<c029d836>] scsi_end_request+0xbf/0xe6
> [<c029db11>] scsi_io_completion+0x117/0x4b6
> [<c029b2e2>] scsi_delete_timer+0x1a/0x66
> [<c02a9290>] sd_rw_intr+0x89/0x30f
> [<c0114472>] rebalance_tick+0xac/0xbb
> [<c0298e8a>] scsi_finish_command+0x85/0xd9
> [<c0298d9d>] scsi_softirq+0xb7/0xdd
> [<c011cba7>] __do_softirq+0xb7/0xc6
> [<c011cbe3>] do_softirq+0x2d/0x2f
> [<c01046b6>] do_IRQ+0x1e/0x24
> [<c0102db2>] common_interrupt+0x1a/0x20
> [<c01005da>] mwait_idle+0x31/0x48
> [<c01005a0>] cpu_idle+0x33/0x3c
> [<c046aa49>] start_kernel+0x175/0x1b1
> [<c046a4bd>] unknown_bootoption+0x0/0x1ab

It's a known issue, just not fixed yet... You can ignore the warning,
cfq recovers the condition.

--
Jens Axboe

2004-11-24 15:32:37

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.10-rc2-bk7 - Badness in cfq_put_request at drivers/block/cfq-iosched.c:1402

On Wed, Nov 24 2004, Jens Axboe wrote:
> On Wed, Nov 24 2004, Christopher S. Aker wrote:
> > Badness in cfq_put_request at drivers/block/cfq-iosched.c:1402
> > [<c025eed4>] cfq_put_request+0xcc/0x119
> > [<c0252520>] elv_put_request+0x25/0x27
> > [<c02553a5>] __blk_put_request+0x66/0xab
> > [<c0256647>] end_that_request_last+0x6a/0x10b
> > [<c029d836>] scsi_end_request+0xbf/0xe6
> > [<c029db11>] scsi_io_completion+0x117/0x4b6
> > [<c029b2e2>] scsi_delete_timer+0x1a/0x66
> > [<c02a9290>] sd_rw_intr+0x89/0x30f
> > [<c0114472>] rebalance_tick+0xac/0xbb
> > [<c0298e8a>] scsi_finish_command+0x85/0xd9
> > [<c0298d9d>] scsi_softirq+0xb7/0xdd
> > [<c011cba7>] __do_softirq+0xb7/0xc6
> > [<c011cbe3>] do_softirq+0x2d/0x2f
> > [<c01046b6>] do_IRQ+0x1e/0x24
> > [<c0102db2>] common_interrupt+0x1a/0x20
> > [<c01005da>] mwait_idle+0x31/0x48
> > [<c01005a0>] cpu_idle+0x33/0x3c
> > [<c046aa49>] start_kernel+0x175/0x1b1
> > [<c046a4bd>] unknown_bootoption+0x0/0x1ab
>
> It's a known issue, just not fixed yet... You can ignore the warning,
> cfq recovers the condition.

Is this an SMP machine, btw?

--
Jens Axboe

2004-11-24 17:42:45

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.10-rc2-bk7 - Badness in cfq_put_request at drivers/block/cfq-iosched.c:1402

On Wed, Nov 24 2004, Christopher S. Aker wrote:
> > On Wed, Nov 24 2004, Jens Axboe wrote:
> > > On Wed, Nov 24 2004, Christopher S. Aker wrote:
> > > > Badness in cfq_put_request at drivers/block/cfq-iosched.c:1402
> > > > [<c025eed4>] cfq_put_request+0xcc/0x119
> > > > [<c0252520>] elv_put_request+0x25/0x27
> > > > [<c02553a5>] __blk_put_request+0x66/0xab
> > > > [<c0256647>] end_that_request_last+0x6a/0x10b
> > > > [<c029d836>] scsi_end_request+0xbf/0xe6
> > > > [<c029db11>] scsi_io_completion+0x117/0x4b6
> > > > [<c029b2e2>] scsi_delete_timer+0x1a/0x66
> > > > [<c02a9290>] sd_rw_intr+0x89/0x30f
> > > > [<c0114472>] rebalance_tick+0xac/0xbb
> > > > [<c0298e8a>] scsi_finish_command+0x85/0xd9
> > > > [<c0298d9d>] scsi_softirq+0xb7/0xdd
> > > > [<c011cba7>] __do_softirq+0xb7/0xc6
> > > > [<c011cbe3>] do_softirq+0x2d/0x2f
> > > > [<c01046b6>] do_IRQ+0x1e/0x24
> > > > [<c0102db2>] common_interrupt+0x1a/0x20
> > > > [<c01005da>] mwait_idle+0x31/0x48
> > > > [<c01005a0>] cpu_idle+0x33/0x3c
> > > > [<c046aa49>] start_kernel+0x175/0x1b1
> > > > [<c046a4bd>] unknown_bootoption+0x0/0x1ab
> > >
> > > It's a known issue, just not fixed yet... You can ignore the warning,
> > > cfq recovers the condition.
> >
> > Is this an SMP machine, btw?
>
> Yep.

Can you try this simple check to see if it triggers anything?

===== cfq-iosched.c 1.13 vs edited =====
--- 1.13/drivers/block/cfq-iosched.c 2004-10-30 01:35:21 +02:00
+++ edited/cfq-iosched.c 2004-11-24 14:40:13 +01:00
@@ -1389,6 +1389,8 @@
struct cfq_data *cfqd = q->elevator->elevator_data;
struct cfq_rq *crq = RQ_DATA(rq);

+ WARN_ON(!spin_is_locked(q->queue_lock));
+
if (crq) {
struct cfq_queue *cfqq = crq->cfq_queue;


--
Jens Axboe

2004-11-24 17:48:10

by Christopher S. Aker

[permalink] [raw]
Subject: Re: 2.6.10-rc2-bk7 - Badness in cfq_put_request at drivers/block/cfq-iosched.c:1402

> On Wed, Nov 24 2004, Jens Axboe wrote:
> > On Wed, Nov 24 2004, Christopher S. Aker wrote:
> > > Badness in cfq_put_request at drivers/block/cfq-iosched.c:1402
> > > [<c025eed4>] cfq_put_request+0xcc/0x119
> > > [<c0252520>] elv_put_request+0x25/0x27
> > > [<c02553a5>] __blk_put_request+0x66/0xab
> > > [<c0256647>] end_that_request_last+0x6a/0x10b
> > > [<c029d836>] scsi_end_request+0xbf/0xe6
> > > [<c029db11>] scsi_io_completion+0x117/0x4b6
> > > [<c029b2e2>] scsi_delete_timer+0x1a/0x66
> > > [<c02a9290>] sd_rw_intr+0x89/0x30f
> > > [<c0114472>] rebalance_tick+0xac/0xbb
> > > [<c0298e8a>] scsi_finish_command+0x85/0xd9
> > > [<c0298d9d>] scsi_softirq+0xb7/0xdd
> > > [<c011cba7>] __do_softirq+0xb7/0xc6
> > > [<c011cbe3>] do_softirq+0x2d/0x2f
> > > [<c01046b6>] do_IRQ+0x1e/0x24
> > > [<c0102db2>] common_interrupt+0x1a/0x20
> > > [<c01005da>] mwait_idle+0x31/0x48
> > > [<c01005a0>] cpu_idle+0x33/0x3c
> > > [<c046aa49>] start_kernel+0x175/0x1b1
> > > [<c046a4bd>] unknown_bootoption+0x0/0x1ab
> >
> > It's a known issue, just not fixed yet... You can ignore the warning,
> > cfq recovers the condition.
>
> Is this an SMP machine, btw?

Yep.

-Chris

2004-11-24 21:00:44

by Christopher S. Aker

[permalink] [raw]
Subject: Re: 2.6.10-rc2-bk7 - Badness in cfq_put_request at drivers/block/cfq-iosched.c:1402

> Can you try this simple check to see if it triggers anything?
>
> ===== cfq-iosched.c 1.13 vs edited =====
> --- 1.13/drivers/block/cfq-iosched.c 2004-10-30 01:35:21 +02:00
> +++ edited/cfq-iosched.c 2004-11-24 14:40:13 +01:00
> @@ -1389,6 +1389,8 @@
> struct cfq_data *cfqd = q->elevator->elevator_data;
> struct cfq_rq *crq = RQ_DATA(rq);
>
> + WARN_ON(!spin_is_locked(q->queue_lock));
> +
> if (crq) {
> struct cfq_queue *cfqq = crq->cfq_queue;

I'd be happy to, but I won't have a free machine for a couple of days. I'll can
probably give it a shot during the weekend...

-Chris

2004-11-27 05:09:31

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.10-rc2-bk7 - Badness in cfq_put_request at drivers/block/cfq-iosched.c:1402

On Wed, Nov 24 2004, Christopher S. Aker wrote:
> > Can you try this simple check to see if it triggers anything?
> >
> > ===== cfq-iosched.c 1.13 vs edited =====
> > --- 1.13/drivers/block/cfq-iosched.c 2004-10-30 01:35:21 +02:00
> > +++ edited/cfq-iosched.c 2004-11-24 14:40:13 +01:00
> > @@ -1389,6 +1389,8 @@
> > struct cfq_data *cfqd = q->elevator->elevator_data;
> > struct cfq_rq *crq = RQ_DATA(rq);
> >
> > + WARN_ON(!spin_is_locked(q->queue_lock));
> > +
> > if (crq) {
> > struct cfq_queue *cfqq = crq->cfq_queue;
>
> I'd be happy to, but I won't have a free machine for a couple of days.
> I'll can probably give it a shot during the weekend...

Nevermind, here's a patch to fix it. I was so focused on the decrement
side of things that I forgot to check the increment, pretty silly error
really.

Signed-off-by: Jens Axboe <[email protected]>

===== drivers/block/cfq-iosched.c 1.12 vs edited =====
--- 1.12/drivers/block/cfq-iosched.c 2004-10-28 09:40:02 +02:00
+++ edited/drivers/block/cfq-iosched.c 2004-11-25 19:39:39 +01:00
@@ -1398,10 +1398,7 @@
if (crq->io_context)
put_io_context(crq->io_context->ioc);

- if (!cfqq->allocated[crq->is_write]) {
- WARN_ON(1);
- cfqq->allocated[crq->is_write] = 1;
- }
+ BUG_ON(!cfqq->allocated[crq->is_write]);
cfqq->allocated[crq->is_write]--;

mempool_free(crq, cfqd->crq_pool);
@@ -1442,6 +1439,7 @@
if (cfqq->allocated[rw] >= cfqd->max_queued)
goto out_lock;

+ cfqq->allocated[rw]++;
spin_unlock_irqrestore(q->queue_lock, flags);

/*
@@ -1465,7 +1463,6 @@
crq->in_flight = crq->accounted = crq->is_sync = 0;
crq->is_write = rw;
rq->elevator_private = crq;
- cfqq->allocated[rw]++;
cfqq->alloc_limit[rw] = 0;
return 0;
}
@@ -1473,6 +1470,7 @@
put_io_context(cic->ioc);
err:
spin_lock_irqsave(q->queue_lock, flags);
+ cfqq->allocated[rw]--;
cfq_put_queue(cfqq);
out_lock:
spin_unlock_irqrestore(q->queue_lock, flags);

--
Jens Axboe

2004-11-27 07:19:14

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.10-rc2-bk7 - Badness in cfq_put_request at drivers/block/cfq-iosched.c:1402

On Fri, Nov 26 2004, Jens Axboe wrote:
> On Thu, Nov 25 2004, Jens Axboe wrote:
> > On Wed, Nov 24 2004, Christopher S. Aker wrote:
> > > > Can you try this simple check to see if it triggers anything?
> > > >
> > > > ===== cfq-iosched.c 1.13 vs edited =====
> > > > --- 1.13/drivers/block/cfq-iosched.c 2004-10-30 01:35:21 +02:00
> > > > +++ edited/cfq-iosched.c 2004-11-24 14:40:13 +01:00
> > > > @@ -1389,6 +1389,8 @@
> > > > struct cfq_data *cfqd = q->elevator->elevator_data;
> > > > struct cfq_rq *crq = RQ_DATA(rq);
> > > >
> > > > + WARN_ON(!spin_is_locked(q->queue_lock));
> > > > +
> > > > if (crq) {
> > > > struct cfq_queue *cfqq = crq->cfq_queue;
> > >
> > > I'd be happy to, but I won't have a free machine for a couple of days.
> > > I'll can probably give it a shot during the weekend...
> >
> > Nevermind, here's a patch to fix it. I was so focused on the decrement
> > side of things that I forgot to check the increment, pretty silly error
> > really.
>
> And a small fix is needed on top of that, in case of hash type changes
> (writing to hash_key).

... And that one relied on another hashing change I made locally. 3rd
time is the charm,.

Signed-off-by: Jens Axboe <[email protected]>

===== drivers/block/cfq-iosched.c 1.13 vs edited =====
--- 1.13/drivers/block/cfq-iosched.c 2004-10-30 01:35:21 +02:00
+++ edited/drivers/block/cfq-iosched.c 2004-11-26 09:06:38 +01:00
@@ -1398,10 +1398,7 @@
if (crq->io_context)
put_io_context(crq->io_context->ioc);

- if (!cfqq->allocated[crq->is_write]) {
- WARN_ON(1);
- cfqq->allocated[crq->is_write] = 1;
- }
+ BUG_ON(!cfqq->allocated[crq->is_write]);
cfqq->allocated[crq->is_write]--;

mempool_free(crq, cfqd->crq_pool);
@@ -1421,7 +1418,7 @@
struct cfq_data *cfqd = q->elevator->elevator_data;
struct cfq_io_context *cic;
const int rw = rq_data_dir(rq);
- struct cfq_queue *cfqq;
+ struct cfq_queue *cfqq, *saved_cfqq;
struct cfq_rq *crq;
unsigned long flags;

@@ -1439,20 +1436,30 @@
#endif
}

+repeat:
if (cfqq->allocated[rw] >= cfqd->max_queued)
goto out_lock;

+ cfqq->allocated[rw]++;
spin_unlock_irqrestore(q->queue_lock, flags);

/*
- * if hashing type has changed, the cfq_queue might change here. we
- * don't bother rechecking ->allocated since it should be a rare
- * event
+ * if hashing type has changed, the cfq_queue might change here.
*/
+ saved_cfqq = cfqq;
cic = cfq_get_io_context(&cfqq, gfp_mask);
if (!cic)
goto err;

+ /*
+ * repeat allocation checks on queue change
+ */
+ if (unlikely(saved_cfqq != cfqq)) {
+ spin_lock_irqsave(q->queue_lock, flags);
+ saved_cfqq->allocated[rw]--;
+ goto repeat;
+ }
+
crq = mempool_alloc(cfqd->crq_pool, gfp_mask);
if (crq) {
RB_CLEAR(&crq->rb_node);
@@ -1465,7 +1472,6 @@
crq->in_flight = crq->accounted = crq->is_sync = 0;
crq->is_write = rw;
rq->elevator_private = crq;
- cfqq->allocated[rw]++;
cfqq->alloc_limit[rw] = 0;
return 0;
}
@@ -1473,6 +1479,7 @@
put_io_context(cic->ioc);
err:
spin_lock_irqsave(q->queue_lock, flags);
+ cfqq->allocated[rw]--;
cfq_put_queue(cfqq);
out_lock:
spin_unlock_irqrestore(q->queue_lock, flags);

--
Jens Axboe

2004-11-27 07:47:15

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.10-rc2-bk7 - Badness in cfq_put_request at drivers/block/cfq-iosched.c:1402

On Thu, Nov 25 2004, Jens Axboe wrote:
> On Wed, Nov 24 2004, Christopher S. Aker wrote:
> > > Can you try this simple check to see if it triggers anything?
> > >
> > > ===== cfq-iosched.c 1.13 vs edited =====
> > > --- 1.13/drivers/block/cfq-iosched.c 2004-10-30 01:35:21 +02:00
> > > +++ edited/cfq-iosched.c 2004-11-24 14:40:13 +01:00
> > > @@ -1389,6 +1389,8 @@
> > > struct cfq_data *cfqd = q->elevator->elevator_data;
> > > struct cfq_rq *crq = RQ_DATA(rq);
> > >
> > > + WARN_ON(!spin_is_locked(q->queue_lock));
> > > +
> > > if (crq) {
> > > struct cfq_queue *cfqq = crq->cfq_queue;
> >
> > I'd be happy to, but I won't have a free machine for a couple of days.
> > I'll can probably give it a shot during the weekend...
>
> Nevermind, here's a patch to fix it. I was so focused on the decrement
> side of things that I forgot to check the increment, pretty silly error
> really.

And a small fix is needed on top of that, in case of hash type changes
(writing to hash_key).

Signed-off-by: Jens Axboe <[email protected]>

===== drivers/block/cfq-iosched.c 1.13 vs edited =====
--- 1.13/drivers/block/cfq-iosched.c 2004-10-30 01:35:21 +02:00
+++ edited/drivers/block/cfq-iosched.c 2004-11-26 08:40:36 +01:00
@@ -1398,10 +1398,7 @@
if (crq->io_context)
put_io_context(crq->io_context->ioc);

- if (!cfqq->allocated[crq->is_write]) {
- WARN_ON(1);
- cfqq->allocated[crq->is_write] = 1;
- }
+ BUG_ON(!cfqq->allocated[crq->is_write]);
cfqq->allocated[crq->is_write]--;

mempool_free(crq, cfqd->crq_pool);
@@ -1439,20 +1436,30 @@
#endif
}

+repeat:
if (cfqq->allocated[rw] >= cfqd->max_queued)
goto out_lock;

+ cfqq->allocated[rw]++;
spin_unlock_irqrestore(q->queue_lock, flags);

/*
- * if hashing type has changed, the cfq_queue might change here. we
- * don't bother rechecking ->allocated since it should be a rare
- * event
+ * if hashing type has changed, the cfq_queue might change here.
*/
cic = cfq_get_io_context(&cfqq, gfp_mask);
if (!cic)
goto err;

+ /*
+ * repeat allocation checks on queue change
+ */
+ if (unlikely(cic->cfqq != cfqq)) {
+ spin_lock_irqsave(q->queue_lock, flags);
+ cfqq->allocated[rw]--;
+ cfqq = cic->cfqq;
+ goto repeat;
+ }
+
crq = mempool_alloc(cfqd->crq_pool, gfp_mask);
if (crq) {
RB_CLEAR(&crq->rb_node);
@@ -1465,7 +1472,6 @@
crq->in_flight = crq->accounted = crq->is_sync = 0;
crq->is_write = rw;
rq->elevator_private = crq;
- cfqq->allocated[rw]++;
cfqq->alloc_limit[rw] = 0;
return 0;
}
@@ -1473,6 +1479,7 @@
put_io_context(cic->ioc);
err:
spin_lock_irqsave(q->queue_lock, flags);
+ cfqq->allocated[rw]--;
cfq_put_queue(cfqq);
out_lock:
spin_unlock_irqrestore(q->queue_lock, flags);

--
Jens Axboe