2004-04-19 11:39:16

by Warren Togami

[permalink] [raw]
Subject: [PATCH] i2o_block Fix, possible CFQ elevator problem?

Test Systems:
x86 AMD Duron with PM3754 DPT controller, Two disks
x86-64 2x Opteron with ASR-3010S Adaptec controller, Four disks

Software:
Gentoo 1.4.3.13 x86 with gcc-3.3.2
Fedora Core 2 x86-64 development with gcc-3.3.3-7

The i2o_block driver currently in kernel-2.6.5 has a problem when there
is more than one logical block device on the I2O controller. Two or
more devices causes kernel memory corruption,[1] sometimes accompanied
by an oops, sometimes fatally in IRQ context and livelocks. In SMP mode
this tended to cause panic=5 reboot to fail.

http://lwn.net/Articles/58199/
Markus finally figured out what was going wrong when he read in this LWN
article that in the 2.6 kernel, the generic block layer handles
partition code and there is almost nothing the individual drivers have
to do. The attached patch from Markus removes this unused partition
logic and seemed to make things work extremely well.

After some testing we began to realize why the kernel memory corruption
that we saw with 2 or more block arrays was happening. The first array
always had a valid partition table, while second and more arrays never
did after they were split from the original single large array. The
unused partition logic was copying the corrupted partition table from
the 2nd and higher arrays into kernel memory, causing the trouble in
kernel-2.6.5.

Subsequent testing was going extremely well, with simultaneous heavy
disk usage on two I2O block devices with bonnie++ going stable. We then
chopped the controller into four I2O block devices, but unfortunately
this ran us into trouble. bonnie++ on three devices simultaneously
would panic. Below is a photo of the call trace from this panic.

http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.jpg

I noticed that the call trace contained "cfq_next_request", so I was
curious what would happened if we changed to the deadline scheduler.
Booted with the same kernel but with "elevator=deadline". To our
surprise, bonnie++ ran simultaneously on all four I2O block devices
without crashing the server. For another test we tried "elevator=as"
and it too remained stable.

Possible CFQ I/O scheduler problem?

This research has been a joint venture of Markus Lidel and Warren
Togami, with thanks to Mid-Pacific Institute for the loan of hardware.

Warren Togami
[email protected]
Markus Lidel
[email protected]

[1]
With two block arrays, ls in the filesystem would sometimes have
corrupted looking characters as the directories and files, making the
system unusable. With four block arrays this happened, also with an
accompanying kernel oops.


Attachments:
i2o_block-cleanup.patch (12.41 kB)

2004-04-19 12:12:37

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?

On Mon, Apr 19 2004, Warren Togami wrote:
> Test Systems:
> x86 AMD Duron with PM3754 DPT controller, Two disks
> x86-64 2x Opteron with ASR-3010S Adaptec controller, Four disks
>
> Software:
> Gentoo 1.4.3.13 x86 with gcc-3.3.2
> Fedora Core 2 x86-64 development with gcc-3.3.3-7
>
> The i2o_block driver currently in kernel-2.6.5 has a problem when there
> is more than one logical block device on the I2O controller. Two or
> more devices causes kernel memory corruption,[1] sometimes accompanied
> by an oops, sometimes fatally in IRQ context and livelocks. In SMP mode
> this tended to cause panic=5 reboot to fail.
>
> http://lwn.net/Articles/58199/
> Markus finally figured out what was going wrong when he read in this LWN
> article that in the 2.6 kernel, the generic block layer handles
> partition code and there is almost nothing the individual drivers have
> to do. The attached patch from Markus removes this unused partition
> logic and seemed to make things work extremely well.
>
> After some testing we began to realize why the kernel memory corruption
> that we saw with 2 or more block arrays was happening. The first array
> always had a valid partition table, while second and more arrays never
> did after they were split from the original single large array. The
> unused partition logic was copying the corrupted partition table from
> the 2nd and higher arrays into kernel memory, causing the trouble in
> kernel-2.6.5.
>
> Subsequent testing was going extremely well, with simultaneous heavy
> disk usage on two I2O block devices with bonnie++ going stable. We then
> chopped the controller into four I2O block devices, but unfortunately
> this ran us into trouble. bonnie++ on three devices simultaneously
> would panic. Below is a photo of the call trace from this panic.
>
> http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.jpg
>
> I noticed that the call trace contained "cfq_next_request", so I was
> curious what would happened if we changed to the deadline scheduler.
> Booted with the same kernel but with "elevator=deadline". To our
> surprise, bonnie++ ran simultaneously on all four I2O block devices
> without crashing the server. For another test we tried "elevator=as"
> and it too remained stable.
>
> Possible CFQ I/O scheduler problem?

That looks pretty damn strange. Any chance you can capture a full oops,
with a serial console or similar?

A quick look at i2o_block doesn't show anything that should cause this.
There are a number of request handling badnesses in there, though:

- kill check fo RQ_INACTIVE, it's pointless. even if it wasn't, the
inactive check return is buggy.

- you must not just clear req->waiting! remove that code.

- ->queue_depth should not be an expensive atomic type, you have the
device lock when you look at/inc/dec it.

> /*
> + * Set up the queue
> + */
> + for(i = 0; i < MAX_I2O_CONTROLLERS; i++)
> + {
> + i2ob_queues[i] = NULL;
> + i2ob_init_iop(i);
> + }

Only initialize queues that will be used. Each allocated but unusued
queues wastes memory.

--
Jens Axboe

2004-04-20 00:43:04

by Warren Togami

[permalink] [raw]
Subject: Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?

Jens Axboe wrote:
>>
>>http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.jpg
>>
>>I noticed that the call trace contained "cfq_next_request", so I was
>>curious what would happened if we changed to the deadline scheduler.
>>Booted with the same kernel but with "elevator=deadline". To our
>>surprise, bonnie++ ran simultaneously on all four I2O block devices
>>without crashing the server. For another test we tried "elevator=as"
>>and it too remained stable.
>>
>>Possible CFQ I/O scheduler problem?
>
>
> That looks pretty damn strange. Any chance you can capture a full oops,
> with a serial console or similar?

Fedora Core 2 x86_64 SMP (2 x Opteron) with kernel-2.6.5-1.326 (based on
2.6.6-rc1)
gcc-3.3.3-7 plus attached patch. This panic was captured by netconsole.

http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.txt
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at cfq_iosched:407
invalid operand: 0000 [1] SMP
CPU 0
Pid: 1523, comm: bonnie++ Not tainted 2.6.5-1.326custom
RIP: 0010:[<ffffffff80234f93>] <ffffffff80234f93>{cfq_next_request+66}
RSP: 0000:ffffffff804a9048 EFLAGS: 00010016
RAX: 000001006cebf578 RBX: 000001007f3f1e00 RCX: 000001006ceb6340
RDX: 000001006cebf540 RSI: 0000000000000001 RDI: 000001007a225000
RBP: 000001007a225000 R08: ffffffff804a8ec8 R09: 0000000000000000
R10: 000001000289cb60 R11: 0000000000000000 R12: 000001007a225000
R13: 000001007a225000 R14: 000001001703bf58 R15: 0000000000509010
FS: 0000002a9555f360(0000) GS:ffffffff804a4e80(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a95558000 CR3: 0000000000101000 CR4: 00000000000006e0
Process bonnie++ (pid: 1523, stackpage=1007fd5e1c0)
Stack: 000001007aa01118 000001007d95abc0 000001007aa01118 ffffffff8022b444
000001007d95abc0 000001007aa01118 ffffffffa0137bc0 ffffffffa0133b82
0000000000000001 000001007d95abc0
Call Trace:<IRQ> <ffffffff8022b444>{elv_next_request+238}
<ffffffffa0133b82>{:i2o_block:i2ob_request+233}
<ffffffffa013370e>{:i2o_block:i2o_block_reply+1162}
<ffffffff80113d85>{handle_IRQ_event+44}
<ffffffffa00893ac>{:tg3:tg3_enable_ints+23}
<ffffffff8013e7f9>{update_wall_time+12}
<ffffffffa002908a>{:i2o_core:i2o_run_queue+124}
<ffffffffa002b60e>{:i2o_core:i2o_pci_interrupt+9}
<ffffffff80113d85>{handle_IRQ_event+44}
<ffffffff801140b8>{do_IRQ+285} <ffffffff8013a894>{__do_softirq+76}
<ffffffff8011185f>{ret_from_intr+0} <EOI>

Code: 0f 0b 74 3a 32 80 ff ff ff ff 97 01 48 89 c8 eb 0e 48 89 de
RIP <ffffffff80234f93>{cfq_next_request+66} RSP <ffffffff804a9048>
<0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
<0>Rebooting in 5 seconds..Badness in panic at kernel/panic.c:87

Call Trace:<IRQ> <ffffffff80136024>{panic+416}
<ffffffff8011411d>{do_IRQ+386}
<ffffffff8011185f>{ret_from_intr+0} <ffffffff80138cd3>{do_exit+87}
<ffffffff80112ab5>{oops_end+80} <ffffffff80112bc5>{do_trap+0}
<ffffffff80112cc2>{do_trap+253}
<ffffffff80112f45>{do_invalid_op+145}
<ffffffff80234f93>{cfq_next_request+66}
<ffffffff8012ff68>{recalc_task_prio+343}
<ffffffff8012ff68>{recalc_task_prio+343}
<ffffffff80111d31>{error_exit+0}
<ffffffff80234f93>{cfq_next_request+66}
<ffffffff80156bea>{mempool_free+227}
<ffffffff8022b444>{elv_next_request+238}
<ffffffffa0133b82>{:i2o_block:i2ob_request+233}
<ffffffffa013370e>{:i2o_block:i2o_block_reply+1162}
<ffffffff80113d85>{handle_IRQ_event+44}
<ffffffffa00893ac>{:tg3:tg3_enable_ints+23}
<ffffffff8013e7f9>{update_wall_time+12}
<ffffffffa002908a>{:i2o_core:i2o_run_queue+124}
<ffffffffa002b60e>{:i2o_core:i2o_pci_interrupt+9}
<ffffffff80113d85>{handle_IRQ_event+44}
<ffffffff801140b8>{do_IRQ+285} <ffffffff8013a894>{__do_softirq+76}
<ffffffff8011185f>{ret_from_intr+0} <EOI>


>
> A quick look at i2o_block doesn't show anything that should cause this.
> There are a number of request handling badnesses in there, though:
>
> - kill check fo RQ_INACTIVE, it's pointless. even if it wasn't, the
> inactive check return is buggy.
>
> - you must not just clear req->waiting! remove that code.
>
> - ->queue_depth should not be an expensive atomic type, you have the
> device lock when you look at/inc/dec it.
>
>
>> /*
>>+ * Set up the queue
>>+ */
>>+ for(i = 0; i < MAX_I2O_CONTROLLERS; i++)
>>+ {
>>+ i2ob_queues[i] = NULL;
>>+ i2ob_init_iop(i);
>>+ }
>
>
> Only initialize queues that will be used. Each allocated but unusued
> queues wastes memory.
>

Markus implemented all of your recommendations except the atomic part,
which seemed to cause a kernel panic while loading the i2o_block driver.
Attached is the latest i2o_block.c cleanup, while at this below URL is
his attempt of removing the atomic type. He is not sure if he
implemented it properly.

http://i2o.shadowconnect.com/i2o_block-cleanup.patch-broke

Your recommendations would be greatly appreciated!

Thank you,
Warren Togami
[email protected]
Markus Lidel
[email protected]


Attachments:
i2o_block-cleanup.patch (13.13 kB)

2004-04-20 07:08:31

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?

On Mon, Apr 19 2004, Warren Togami wrote:
> Warren Togami wrote:
> >Jens Axboe wrote:
> >
> >>>
> >>>http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.jpg
> >>>
> >>>I noticed that the call trace contained "cfq_next_request", so I was
> >>>curious what would happened if we changed to the deadline scheduler.
> >>>Booted with the same kernel but with "elevator=deadline". To our
> >>>surprise, bonnie++ ran simultaneously on all four I2O block devices
> >>>without crashing the server. For another test we tried "elevator=as"
> >>>and it too remained stable.
> >>>
> >>>Possible CFQ I/O scheduler problem?
> >>
> >>
> >>
> >>That looks pretty damn strange. Any chance you can capture a full oops,
> >>with a serial console or similar?
> >
> >
> >Fedora Core 2 x86_64 SMP (2 x Opteron) with kernel-2.6.5-1.326 (based on
> >2.6.6-rc1)
> >gcc-3.3.3-7 plus attached patch. This panic was captured by netconsole.
> >
> >http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.txt
>
> Next we tested cfq with the following section of code commented out.
> With this change the kernel no longer panics and seems to survive with
> four simultaneous bonnie++'s on all four block devices.
>
> --- cfq-iosched.c 2004-04-20 13:52:55.000000000 -1000
> +++ /root/linux-2.6.5-1.326/drivers/block/cfq-iosched.c 2004-04-20
> 14:09:43.000000000 -1000
> @@ -401,10 +401,12 @@
> dispatch:
> rq = list_entry_rq(cfqd->dispatch->next);
>
> +/*
> BUG_ON(q->last_merge == rq);
> crq = RQ_DATA(rq);
> if (crq)
> BUG_ON(ON_MHASH(crq));
> +*/
>
> return rq;
> }

This is not safe, the BUG_ON is there for a reason. If the request in on
the merge hash when handed to the driver, you risk corrupting data. The
fix would be figuring out why this is happening. Maybe it's looking at
bad data, could you test with this patch applied and see if the oops
still triggers?

===== drivers/block/cfq-iosched.c 1.1 vs edited =====
--- 1.1/drivers/block/cfq-iosched.c Mon Apr 12 19:55:20 2004
+++ edited/drivers/block/cfq-iosched.c Tue Apr 20 09:07:20 2004
@@ -403,7 +403,7 @@

BUG_ON(q->last_merge == rq);
crq = RQ_DATA(rq);
- if (crq)
+ if (blk_fs_request(rq) && crq)
BUG_ON(ON_MHASH(crq));

return rq;

--
Jens Axboe

2004-04-20 07:59:12

by Warren Togami

[permalink] [raw]
Subject: Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?

Jens Axboe wrote:
>>>http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.txt
>>
>>Next we tested cfq with the following section of code commented out.
>>With this change the kernel no longer panics and seems to survive with
>>four simultaneous bonnie++'s on all four block devices.
>>
>>--- cfq-iosched.c 2004-04-20 13:52:55.000000000 -1000
>>+++ /root/linux-2.6.5-1.326/drivers/block/cfq-iosched.c 2004-04-20
>>14:09:43.000000000 -1000
>>@@ -401,10 +401,12 @@
>> dispatch:
>> rq = list_entry_rq(cfqd->dispatch->next);
>>
>>+/*
>> BUG_ON(q->last_merge == rq);
>> crq = RQ_DATA(rq);
>> if (crq)
>> BUG_ON(ON_MHASH(crq));
>>+*/
>>
>> return rq;
>> }
>
>
> This is not safe, the BUG_ON is there for a reason. If the request in on
> the merge hash when handed to the driver, you risk corrupting data. The
> fix would be figuring out why this is happening. Maybe it's looking at
> bad data, could you test with this patch applied and see if the oops
> still triggers?
>
> ===== drivers/block/cfq-iosched.c 1.1 vs edited =====
> --- 1.1/drivers/block/cfq-iosched.c Mon Apr 12 19:55:20 2004
> +++ edited/drivers/block/cfq-iosched.c Tue Apr 20 09:07:20 2004
> @@ -403,7 +403,7 @@
>
> BUG_ON(q->last_merge == rq);
> crq = RQ_DATA(rq);
> - if (crq)
> + if (blk_fs_request(rq) && crq)
> BUG_ON(ON_MHASH(crq));
>
> return rq;
>

We figured removing error handling was not safe, the previous post was
only reporting test results to ask for more suggestions. I have now
tested your suggested patch above and it seems to crash in the same way
as originally.

http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie2.txt

This makes me curious, the other elevators lacked this type of error
checking. Did this mean they were possibly allowing data corruption to
happen with buggy drivers like this? Kind of scary! We were lucky to
test this now, because this was one of the first FC kernels that
included cfq by default.

Do you have any advice regarding the atomic type removal problem that we
experienced from our previous post?

Warren Togami
[email protected]

2004-04-20 08:03:40

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?

On Mon, Apr 19 2004, Warren Togami wrote:
> Jens Axboe wrote:
> >>>http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.txt
> >>
> >>Next we tested cfq with the following section of code commented out.
> >>With this change the kernel no longer panics and seems to survive with
> >>four simultaneous bonnie++'s on all four block devices.
> >>
> >>--- cfq-iosched.c 2004-04-20 13:52:55.000000000 -1000
> >>+++ /root/linux-2.6.5-1.326/drivers/block/cfq-iosched.c 2004-04-20
> >>14:09:43.000000000 -1000
> >>@@ -401,10 +401,12 @@
> >>dispatch:
> >> rq = list_entry_rq(cfqd->dispatch->next);
> >>
> >>+/*
> >> BUG_ON(q->last_merge == rq);
> >> crq = RQ_DATA(rq);
> >> if (crq)
> >> BUG_ON(ON_MHASH(crq));
> >>+*/
> >>
> >> return rq;
> >> }
> >
> >
> >This is not safe, the BUG_ON is there for a reason. If the request in on
> >the merge hash when handed to the driver, you risk corrupting data. The
> >fix would be figuring out why this is happening. Maybe it's looking at
> >bad data, could you test with this patch applied and see if the oops
> >still triggers?
> >
> >===== drivers/block/cfq-iosched.c 1.1 vs edited =====
> >--- 1.1/drivers/block/cfq-iosched.c Mon Apr 12 19:55:20 2004
> >+++ edited/drivers/block/cfq-iosched.c Tue Apr 20 09:07:20 2004
> >@@ -403,7 +403,7 @@
> >
> > BUG_ON(q->last_merge == rq);
> > crq = RQ_DATA(rq);
> >- if (crq)
> >+ if (blk_fs_request(rq) && crq)
> > BUG_ON(ON_MHASH(crq));
> >
> > return rq;
> >
>
> We figured removing error handling was not safe, the previous post was
> only reporting test results to ask for more suggestions. I have now
> tested your suggested patch above and it seems to crash in the same way
> as originally.
>
> http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie2.txt

As a temporary safe work-around, you can apply this patch.

> This makes me curious, the other elevators lacked this type of error
> checking. Did this mean they were possibly allowing data corruption to
> happen with buggy drivers like this? Kind of scary! We were lucky to
> test this now, because this was one of the first FC kernels that
> included cfq by default.

Not necessarily, it's most likely a CFQ bug. Otherwise it would have
surfaced before :-)

> Do you have any advice regarding the atomic type removal problem that we
> experienced from our previous post?

Just change the type to an unsigned integer instead. Double check that
all decrements/increments and reads of that integer are inside the
device lock, it looked like they were.

===== drivers/block/cfq-iosched.c 1.1 vs edited =====
--- 1.1/drivers/block/cfq-iosched.c Mon Apr 12 19:55:20 2004
+++ edited/drivers/block/cfq-iosched.c Tue Apr 20 10:02:01 2004
@@ -404,7 +404,7 @@
BUG_ON(q->last_merge == rq);
crq = RQ_DATA(rq);
if (crq)
- BUG_ON(ON_MHASH(crq));
+ cfq_remove_merge_hints(q, crq);

return rq;
}

--
Jens Axboe

2004-04-20 08:59:44

by Warren Togami

[permalink] [raw]
Subject: Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?

Jens Axboe wrote:
>>We figured removing error handling was not safe, the previous post was
>>only reporting test results to ask for more suggestions. I have now
>>tested your suggested patch above and it seems to crash in the same way
>>as originally.
>>
>>http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie2.txt
>
>
> As a temporary safe work-around, you can apply this patch.
>
>
>>This makes me curious, the other elevators lacked this type of error
>>checking. Did this mean they were possibly allowing data corruption to
>>happen with buggy drivers like this? Kind of scary! We were lucky to
>>test this now, because this was one of the first FC kernels that
>>included cfq by default.
>
>
> Not necessarily, it's most likely a CFQ bug. Otherwise it would have
> surfaced before :-)
>

I forgot to mention in the previous reports:

Prior to three of your original suggested cleanups of i2o_block, four
simultaneous bonnie++'s on four independent arrays would almost
immediately cause the crash while running elevator=cfq. After those
three cleanups four simultaneous bonnie++ would survive for a while
without crashing... until you run "sync" in another terminal. We
however did not test it enough times to determine if without "sync" it
can survive the test run. Do you want this tested without "sync"?

With the deadline scheduler "sync" would take maybe 30 seconds and
return. With the cfq scheduler "sync" would be stuck there for much
longer, then trigger the crash. Markus has suspected that it crashes
when sync returns, but we have not confirmed that.

With the cfq scheduler with the error checking completely removed,
"sync" would be stuck there for a minute or more but eventually return
without crashing. (The array is reformatted between every test so we
don't really care about data corruption.)

I hope this data is helpful.

Warren Togami
[email protected]

2004-04-20 09:06:30

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?

On Mon, Apr 19 2004, Warren Togami wrote:
> Jens Axboe wrote:
> >>We figured removing error handling was not safe, the previous post was
> >>only reporting test results to ask for more suggestions. I have now
> >>tested your suggested patch above and it seems to crash in the same way
> >>as originally.
> >>
> >>http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie2.txt
> >
> >
> >As a temporary safe work-around, you can apply this patch.
> >
> >
> >>This makes me curious, the other elevators lacked this type of error
> >>checking. Did this mean they were possibly allowing data corruption to
> >>happen with buggy drivers like this? Kind of scary! We were lucky to
> >>test this now, because this was one of the first FC kernels that
> >>included cfq by default.
> >
> >
> >Not necessarily, it's most likely a CFQ bug. Otherwise it would have
> >surfaced before :-)
> >
>
> I forgot to mention in the previous reports:
>
> Prior to three of your original suggested cleanups of i2o_block, four
> simultaneous bonnie++'s on four independent arrays would almost
> immediately cause the crash while running elevator=cfq. After those
> three cleanups four simultaneous bonnie++ would survive for a while
> without crashing... until you run "sync" in another terminal. We
> however did not test it enough times to determine if without "sync" it
> can survive the test run. Do you want this tested without "sync"?

Repeat the tests that made it crash. The last patch I sent should work
for you, at least until the real issue is found.

> With the deadline scheduler "sync" would take maybe 30 seconds and
> return. With the cfq scheduler "sync" would be stuck there for much
> longer, then trigger the crash. Markus has suspected that it crashes
> when sync returns, but we have not confirmed that.

Probably not the case. Running sync only initiates the dirty data
flushing, the actual write out happens out of that context. So it's
probably just running the sync that changes the timings a little bit,
triggering the bug.

--
Jens Axboe

2004-04-20 10:54:11

by Warren Togami

[permalink] [raw]
Subject: Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?

Jens Axboe wrote:
>>>Not necessarily, it's most likely a CFQ bug. Otherwise it would have
>>>surfaced before :-)
>>>
>>
>>I forgot to mention in the previous reports:
>>
>>Prior to three of your original suggested cleanups of i2o_block, four
>>simultaneous bonnie++'s on four independent arrays would almost
>>immediately cause the crash while running elevator=cfq. After those
>>three cleanups four simultaneous bonnie++ would survive for a while
>>without crashing... until you run "sync" in another terminal. We
>>however did not test it enough times to determine if without "sync" it
>>can survive the test run. Do you want this tested without "sync"?
>
>
> Repeat the tests that made it crash. The last patch I sent should work
> for you, at least until the real issue is found.
>

Tested your patch, it indeed does seem to keep the system stable. If I
am understanding it right, the patch disables merging in the case where
it would have caused a BUG condition? (Less efficiency.)

In any case, for now we are doing our i2o development using the other
schedulers like deadline. Let us know if you have updated cfq patches
to try, and we will.

Warren Togami
[email protected]

2004-04-20 10:56:35

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?

On Tue, Apr 20 2004, Warren Togami wrote:
> Jens Axboe wrote:
> >>>Not necessarily, it's most likely a CFQ bug. Otherwise it would have
> >>>surfaced before :-)
> >>>
> >>
> >>I forgot to mention in the previous reports:
> >>
> >>Prior to three of your original suggested cleanups of i2o_block, four
> >>simultaneous bonnie++'s on four independent arrays would almost
> >>immediately cause the crash while running elevator=cfq. After those
> >>three cleanups four simultaneous bonnie++ would survive for a while
> >>without crashing... until you run "sync" in another terminal. We
> >>however did not test it enough times to determine if without "sync" it
> >>can survive the test run. Do you want this tested without "sync"?
> >
> >
> >Repeat the tests that made it crash. The last patch I sent should work
> >for you, at least until the real issue is found.
> >
>
> Tested your patch, it indeed does seem to keep the system stable. If I
> am understanding it right, the patch disables merging in the case where
> it would have caused a BUG condition? (Less efficiency.)

Not quite, it just removes the crq from the hash before it can do any
damage. It doesn't impact efficiency, the request is removed from the io
scheduler at this point anyway.

> In any case, for now we are doing our i2o development using the other
> schedulers like deadline. Let us know if you have updated cfq patches
> to try, and we will.

I'll see if I can reproduce it here.

--
Jens Axboe

2004-04-20 11:29:43

by Warren Togami

[permalink] [raw]
Subject: Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?

Jens Axboe wrote:
>>>
>>>Repeat the tests that made it crash. The last patch I sent should work
>>>for you, at least until the real issue is found.
>>>
>>
>>Tested your patch, it indeed does seem to keep the system stable. If I
>>am understanding it right, the patch disables merging in the case where
>>it would have caused a BUG condition? (Less efficiency.)
>
>

Bad news... much later during the test the system locked up. During
this test we did not use "sync" but just let all four bonnie++'s run.

http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie3.txt
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at cfq_iosched:404
invalid operand: 0000 [1] SMP
CPU 0
Pid: 0, comm: swapper Not tainted 2.6.5-1.326changeme
RIP: 0010:[<ffffffff80234f74>] <ffffffff80234f74>{cfq_next_request+35}
RSP: 0018:ffffffff804a9048 EFLAGS: 00010046
RAX: 000001007f2c6000 RBX: 000001004483b340 RCX: 0000000000000000
RDX: 000001007f2ca600 RSI: 0000000000180000 RDI: 000001007f2c6000
RBP: 000001007f2c6000 R08: 0000010037ffa568 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000001000 R12: 000001007f2c6000
R13: 000001007f2c6000 R14: ffffffff804adf08 R15: 0000000000000000
FS: 0000002a9555f360(0000) GS:ffffffff804a4e80(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000856348 CR3: 0000000000101000 CR4: 00000000000006e0
Process swapper (pid: 0, stackpage=ffffffff803cf5c0)
Stack: ffffffffa003a880 00000100348b2300 ffffffffa003a880 ffffffff8022b444
00000100348b2300 ffffffffa003a880 000000000001ac00 ffffffffa0036b82
ffffffff803f40a0 00000100629a5e80
Call Trace:<IRQ> <ffffffff8022b444>{elv_next_request+238}
<ffffffffa0036b82>{:i2o_block:i2ob_request+233}
<ffffffffa003670e>{:i2o_block:i2o_block_reply+1162}
<ffffffff8013e13a>{__mod_timer+807}
<ffffffff801310b5>{load_balance+1046}
<ffffffff8013e7f9>{update_wall_time+12}
<ffffffffa002908a>{:i2o_core:i2o_run_queue+124}
<ffffffffa002b60e>{:i2o_core:i2o_pci_interrupt+9}
<ffffffff80113d85>{handle_IRQ_event+44}
<ffffffff801140b8>{do_IRQ+285} <ffffffff8013a894>{__do_softirq+76}
<ffffffff8010f590>{default_idle+0}
<ffffffff8011185f>{ret_from_intr+0}
<EOI> <ffffffff8010f590>{default_idle+0}
<ffffffff8010f5b4>{default_idle+36}
<ffffffff8010f627>{cpu_idle+24} <ffffffff804af7ea>{start_kernel+512}
<ffffffff804af17c>{x86_64_start_kernel+144}

Code: 0f 0b 78 3a 32 80 ff ff ff ff 94 01 f6 43 10 10 48 8b 73 68
RIP <ffffffff80234f74>{cfq_next_request+35} RSP <ffffffff804a9048>
<0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
NMI Watchdog detected LOCKUP on CPU1, registers
CPU 1
Pid: 1525, comm: bonnie++ Not tainted 2.6.5-1.326changeme
RIP: 0010:[<ffffffff8022f332>] <ffffffff8022f332>{.text.lock.ll_rw_blk+95}
RSP: 0018:000001005b553808 EFLAGS: 00000086
RAX: 0000000000000000 RBX: 0000010037ffb118 RCX: 0000010064b61800
RDX: 0000000000000000 RSI: 000001005b553830 RDI: 000001007f2c6000
RBP: 000001007f2c6000 R08: 0000000000000000 R09: 00000100498666c0
R10: 00000100498666c0 R11: 00000100498666c0 R12: 0000000000000060
R13: 0000000000000008 R14: 00000000007ca9b7 R15: 0000000000000000
FS: 0000002a9555f360(0000) GS:ffffffff804a4f00(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a9555a000 CR3: 000000007fe2f000 CR4: 00000000000006e0
Process bonnie++ (pid: 1525, stackpage=1007e5363c0)
Stack: 0000010000000000 0000000000000001 000001005cc8d6e8 0000000000000000
0000000000001000 00000100583a0cc0 000001005b553898 0000000000000000
00000100583a0cc0 0000000000000060
Call Trace:<ffffffff8022e4a7>{generic_make_request+331}
<ffffffff8022e5b1>{submit_bio+247}
<ffffffff801961c6>{mpage_bio_submit+27}
<ffffffff801964d7>{do_mpage_readpage+493}
<ffffffff801b3f63>{ext2_get_block+0}
<ffffffff801b3f63>{ext2_get_block+0}
<ffffffff801b3f63>{ext2_get_block+0}
<ffffffff8015428c>{add_to_page_cache+129}
<ffffffff801b3f63>{ext2_get_block+0}
<ffffffff80196653>{mpage_readpages+161}
<ffffffff8015a5c3>{read_pages+57}
<ffffffff80158035>{buffered_rmqueue+510}
<ffffffff801580fe>{__alloc_pages+158}
<ffffffff8015ab04>{do_page_cache_readahead+515}
<ffffffff801545e8>{__lock_page+213}
<ffffffff8015acbd>{page_cache_readahead+396}
<ffffffff80154c04>{do_generic_mapping_read+193}
<ffffffff80154e28>{file_read_actor+0}
<ffffffff80155084>{__generic_file_aio_read+360}
<ffffffff80155157>{generic_file_read+116}
<ffffffff80181a75>{link_path_walk+2806}
<ffffffff80165bf4>{vma_merge+166}
<ffffffff802f5707>{thread_return+0} <ffffffff80172e01>{vfs_read+208}
<ffffffff80173006>{sys_read+57} <ffffffff801112ba>{system_call+126}


Code: 7e f9 e9 aa ea ff ff 90 90 90 41 57 41 56 41 55 49 89 fd bf
console shuts up ...

2004-04-20 11:34:41

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?

On Tue, Apr 20 2004, Warren Togami wrote:
> Jens Axboe wrote:
> >>>
> >>>Repeat the tests that made it crash. The last patch I sent should work
> >>>for you, at least until the real issue is found.
> >>>
> >>
> >>Tested your patch, it indeed does seem to keep the system stable. If I
> >>am understanding it right, the patch disables merging in the case where
> >>it would have caused a BUG condition? (Less efficiency.)
> >
> >
>
> Bad news... much later during the test the system locked up. During
> this test we did not use "sync" but just let all four bonnie++'s run.
>
> http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie3.txt
> ----------- [cut here ] --------- [please bite here ] ---------
> Kernel BUG at cfq_iosched:404
> invalid operand: 0000 [1] SMP

Sorry about that, that's actually expected when we know this bug
exists. You need to move the cfq_remove_merge_hints(q, crq) before the
BUG_ON(q->last_merge == rq) check, or (better) just remove it
completely. There's no way that q->last_merge could be set to this
request after cfq_remove_merge_hints() was called.

--
Jens Axboe

2004-04-20 11:39:04

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?

On Tue, Apr 20 2004, Jens Axboe wrote:
> On Tue, Apr 20 2004, Warren Togami wrote:
> > Jens Axboe wrote:
> > >>>
> > >>>Repeat the tests that made it crash. The last patch I sent should work
> > >>>for you, at least until the real issue is found.
> > >>>
> > >>
> > >>Tested your patch, it indeed does seem to keep the system stable. If I
> > >>am understanding it right, the patch disables merging in the case where
> > >>it would have caused a BUG condition? (Less efficiency.)
> > >
> > >
> >
> > Bad news... much later during the test the system locked up. During
> > this test we did not use "sync" but just let all four bonnie++'s run.
> >
> > http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie3.txt
> > ----------- [cut here ] --------- [please bite here ] ---------
> > Kernel BUG at cfq_iosched:404
> > invalid operand: 0000 [1] SMP
>
> Sorry about that, that's actually expected when we know this bug
> exists. You need to move the cfq_remove_merge_hints(q, crq) before the
> BUG_ON(q->last_merge == rq) check, or (better) just remove it
> completely. There's no way that q->last_merge could be set to this
> request after cfq_remove_merge_hints() was called.

In short, this patch. I can see this happening for an aliased request,
but you should not be hitting that with bonnie (you are not doing any
form of raw or O_DIRECT io, are you?).

===== drivers/block/cfq-iosched.c 1.1 vs edited =====
--- 1.1/drivers/block/cfq-iosched.c Mon Apr 12 19:55:20 2004
+++ edited/drivers/block/cfq-iosched.c Tue Apr 20 13:37:33 2004
@@ -401,10 +401,9 @@
dispatch:
rq = list_entry_rq(cfqd->dispatch->next);

- BUG_ON(q->last_merge == rq);
crq = RQ_DATA(rq);
if (crq)
- BUG_ON(ON_MHASH(crq));
+ cfq_remove_merge_hints(q, crq);

return rq;
}

--
Jens Axboe