I believe this bug has existed for some time now (I think I've seen it on
2.6.9 on another box), but I seem to have hardware that is reproducing it
easily now.
Setup: Dual P3 1 GHz (SMP kernel). 2.6.10-bk7, though it also occurred
with 2.6.10-rc3-bk16 and I believe older kernels. One 2.5 TB linear MD
array consisting of 24 medium-old PATA drives. Four PDC20270 (Promise)
controllers, one 3ware raid, onboard adaptec, etc. The problem is
happening with one of the (Maxtor!) drives on a PDC20270:
hdn: dma_timer_expiry: dma status == 0x61
hdn: DMA timeout error
hdn: dma timeout error: status=0x51 { DriveReady SeekComplete Error }
hdn: dma timeout error: error=0x40 { UncorrectableError }, LBAsect=375917014, high=22, low=6818262, sector=375917012
ide: failed opcode was: unknown
end_request: I/O error, dev hdn, sector 375917012
Unable to handle kernel NULL pointer dereference at virtual address 0000002c
printing eip:
c026ce16
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: ipt_state ip_conntrack acenic
CPU: 0
EIP: 0060:[<c026ce16>] Not tainted VLI
EFLAGS: 00010092 (2.6.10-bk7-alfie)
EIP is at ide_dma_timeout_retry+0x56/0xf0
eax: dfcbf540 ebx: 00000000 ecx: 00000006 edx: 00001462
esi: c0419b10 edi: 00000000 ebp: ffffffff esp: c03b3f0c
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c03b2000 task=c0366b40)
Stack: c0419d40 c0337267 00000051 dfcbf540 c0419d40 c0273d20 c026cfe5 c0419d40
ffffffff c0366b40 c0419b10 00000286 dfcbf540 c026ceb0 c1409920 c03b3f60
c012195a dfcbf540 00000000 00000000 c03b3f60 c03b3f60 c03b3f60 c0121845
Call Trace:
[<c0273d20>] ide_dma_intr+0x0/0xb0
[<c026cfe5>] ide_timer_expiry+0x135/0x1e0
[<c026ceb0>] ide_timer_expiry+0x0/0x1e0
[<c012195a>] run_timer_softirq+0xda/0x1a0
[<c0121845>] update_process_times+0x45/0x60
[<c011d5da>] __do_softirq+0xba/0xd0
[<c011d61d>] do_softirq+0x2d/0x30
[<c0103110>] apic_timer_interrupt+0x1c/0x24
[<c0100710>] default_idle+0x0/0x40
[<c010073c>] default_idle+0x2c/0x40
[<c01007cb>] cpu_idle+0x3b/0x50
[<c03b498f>] start_kernel+0x14f/0x170
[<c03b43a0>] unknown_bootoption+0x0/0x1e0
Code: 96 50 04 00 00 fe 83 8f 00 00 00 c6 83 90 00 00 00 01 89 1c 24 ff 96 3c 04 00 00 8b 43 70 8b 40 08 8b 58 20 c7 40 20 00 0
<0>Kernel panic - not syncing: Fatal exception in interrupt
EXT3-fs error (device md0): ext3_get_inode_loc: unable to read inode block - inode=143556922, block=574226453
Aborting journal on device md0.
The disassembly of ide_dma_timeout_retry is:
c026cdc0 <ide_dma_timeout_retry>:
c026cdc0: 57 push %edi
c026cdc1: 31 ff xor %edi,%edi
c026cdc3: 56 push %esi
c026cdc4: 53 push %ebx
c026cdc5: 83 ec 0c sub $0xc,%esp
c026cdc8: 8b 44 24 20 mov 0x20(%esp),%eax
c026cdcc: 8b 5c 24 1c mov 0x1c(%esp),%ebx
c026cdd0: 85 c0 test %eax,%eax
c026cdd2: 8b 73 70 mov 0x70(%ebx),%esi
c026cdd5: 78 7f js c026ce56 <ide_dma_timeout_retry+0x96>
c026cdd7: 89 5c 24 04 mov %ebx,0x4(%esp)
c026cddb: c7 04 24 33 72 33 c0 movl $0xc0337233,(%esp)
c026cde2: e8 a9 b9 ea ff call c0118790 <printk>
c026cde7: 89 1c 24 mov %ebx,(%esp)
c026cdea: ff 96 50 04 00 00 call *0x450(%esi)
c026cdf0: fe 83 8f 00 00 00 incb 0x8f(%ebx)
c026cdf6: c6 83 90 00 00 00 01 movb $0x1,0x90(%ebx)
c026cdfd: 89 1c 24 mov %ebx,(%esp)
c026ce00: ff 96 3c 04 00 00 call *0x43c(%esi)
c026ce06: 8b 43 70 mov 0x70(%ebx),%eax
c026ce09: 8b 40 08 mov 0x8(%eax),%eax
c026ce0c: 8b 58 20 mov 0x20(%eax),%ebx
c026ce0f: c7 40 20 00 00 00 00 movl $0x0,0x20(%eax)
c026ce16: ----> 8b 4b 2c mov 0x2c(%ebx),%ecx <--------
c026ce19: c7 43 40 00 00 00 00 movl $0x0,0x40(%ebx)
c026ce20: 85 c9 test %ecx,%ecx
c026ce22: 74 29 je c026ce4d <ide_dma_timeout_retry+0x8d>
c026ce24: 8b 01 mov (%ecx),%eax
c026ce26: 8b 51 04 mov 0x4(%ecx),%edx
c026ce29: 89 43 0c mov %eax,0xc(%ebx)
c026ce2c: 89 53 10 mov %edx,0x10(%ebx)
c026ce2f: 8b 51 30 mov 0x30(%ecx),%edx
c026ce32: 0f b7 41 1a movzwl 0x1a(%ecx),%eax
c026ce36: 8d 04 40 lea (%eax,%eax,2),%eax
c026ce39: 8b 44 82 04 mov 0x4(%edx,%eax,4),%eax
c026ce3d: c7 43 50 00 00 00 00 movl $0x0,0x50(%ebx)
c026ce44: c1 e8 09 shr $0x9,%eax
c026ce47: 89 43 18 mov %eax,0x18(%ebx)
c026ce4a: 89 43 28 mov %eax,0x28(%ebx)
c026ce4d: 83 c4 0c add $0xc,%esp
c026ce50: 89 f8 mov %edi,%eax
c026ce52: 5b pop %ebx
c026ce53: 5e pop %esi
c026ce54: 5f pop %edi
c026ce55: c3 ret
c026ce56: 89 5c 24 04 mov %ebx,0x4(%esp)
c026ce5a: c7 04 24 4d 72 33 c0 movl $0xc033724d,(%esp)
c026ce61: e8 2a b9 ea ff call c0118790 <printk>
c026ce66: 8b 43 70 mov 0x70(%ebx),%eax
c026ce69: 89 1c 24 mov %ebx,(%esp)
c026ce6c: ff 90 30 04 00 00 call *0x430(%eax)
c026ce72: 8b 43 70 mov 0x70(%ebx),%eax
c026ce75: 8b 40 34 mov 0x34(%eax),%eax
c026ce78: 89 04 24 mov %eax,(%esp)
c026ce7b: ff 96 6c 04 00 00 call *0x46c(%esi)
c026ce81: 8b 53 1c mov 0x1c(%ebx),%edx
c026ce84: b9 67 72 33 c0 mov $0xc0337267,%ecx
c026ce89: 0f b6 c0 movzbl %al,%eax
c026ce8c: 89 44 24 08 mov %eax,0x8(%esp)
c026ce90: 89 4c 24 04 mov %ecx,0x4(%esp)
c026ce94: 89 1c 24 mov %ebx,(%esp)
c026ce97: ff 52 20 call *0x20(%edx)
c026ce9a: 89 c7 mov %eax,%edi
c026ce9c: e9 4f ff ff ff jmp c026cdf0 <ide_dma_timeout_retry+0x30>
c026cea1: eb 0d jmp c026ceb0 <ide_timer_expiry>
It appears the compiler reordered this code in drivers/ide/ide-io.c:
/*
* un-busy drive etc (hwgroup->busy is cleared on return) and
* make sure request is sane
*/
rq = HWGROUP(drive)->rq;
HWGROUP(drive)->rq = NULL;
rq->errors = 0;
if (!rq->bio)
goto out;
...to something like:
ebx = HWGROUP(drive)->rq; (->rq is NULL)
HWGROUP(drive)->rq = NULL; (->rq already was NULL)
ecx = ebx->bio; (ebx is NULL, Oops)
rq->errors = 0;
if (!ecx)
goto out;
...and so it's Oopsing on (!rq->bio) because rq is NULL. But why
is rq NULL in this case?
I added the usual test for NULL, printk, goto out, and the machine
continues without visibly exploding when this case occurs. Is the
correct fix or is it expected that rq should never be NULL?
Simon-
On Mer, 2005-01-05 at 23:33, Simon Kirby wrote:
> I added the usual test for NULL, printk, goto out, and the machine
> continues without visibly exploding when this case occurs. Is the
> correct fix or is it expected that rq should never be NULL?
The initial UncorrectableError is the drive erroring the request due to
real failure of the drive to get the data. There are some races in the
base code when that occurs
rq should never be NULL at that point because after all there has to be
a request which has timed out. If the timeout isn't being cleared on the
error path that would make sense of the trace or if the timeout occurred
at the same time as the error completed it would have raced.
It could also be due to the fact base 2.6.10 will corrupt requests on
errors sometimes (which SGI now fixed)
Alan
On Thu, Jan 06, 2005 at 04:30:14PM +0000, Alan Cox wrote:
> The initial UncorrectableError is the drive erroring the request due to
> real failure of the drive to get the data. There are some races in the
> base code when that occurs
>
> rq should never be NULL at that point because after all there has to be
> a request which has timed out. If the timeout isn't being cleared on the
> error path that would make sense of the trace or if the timeout occurred
> at the same time as the error completed it would have raced.
>
> It could also be due to the fact base 2.6.10 will corrupt requests on
> errors sometimes (which SGI now fixed)
Well, it seems I was able to reproduce it each and every time by trying
to read the one sector. The drive is now out of the box but I can place
it in my desktop or anywhere else and probably do the same.
I can't seem to find a post with the SGI fixes, and I don't see anything
from a quick skim of the bk8 and bk9 logs (problem reproduced in bk7).
Could you point me to the relevant thread?
Simon-
On Iau, 2005-01-06 at 19:23, Simon Kirby wrote:
> On Thu, Jan 06, 2005 at 04:30:14PM +0000, Alan Cox wrote:
> I can't seem to find a post with the SGI fixes, and I don't see anything
> from a quick skim of the bk8 and bk9 logs (problem reproduced in bk7).
> Could you point me to the relevant thread?
Don't think it got discussed on l/k. Dunno if its in the 2.6.10bk code
yet either
rq->errors = 0;
if (!rq->bio)
goto out;
rq->sector = rq->bio->bi_sector;
rq->current_nr_sectors = bio_iovec(rq->bio)->bv_len >> 9;
rq->hard_cur_sectors = rq->current_nr_sectors;
rq->buffer = bio_data(rq->bio);
out:
return ret;
Is how the code should look at the end of ide_dma_timeout_retry (plus
your rq == NULL check I imagine)
On Thu, Jan 06, 2005 at 08:18:22PM +0000, Alan Cox wrote:
> Don't think it got discussed on l/k. Dunno if its in the 2.6.10bk code
> yet either
>
> rq->errors = 0;
>
> if (!rq->bio)
> goto out;
>
> rq->sector = rq->bio->bi_sector;
> rq->current_nr_sectors = bio_iovec(rq->bio)->bv_len >> 9;
> rq->hard_cur_sectors = rq->current_nr_sectors;
> rq->buffer = bio_data(rq->bio);
> out:
> return ret;
>
>
> Is how the code should look at the end of ide_dma_timeout_retry (plus
> your rq == NULL check I imagine)
The only difference with the code in 2.6.10 (ide-io.c is currently
unchanged in bk9) is rq->buffer is set to NULL instead of
bio_data(rq->bio)... I suppose this is the corruption fix?
Simon-