2003-03-21 19:46:51

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: aic7(censored) dying horribly in 2.5.65-mm2

> Hi Justin i got this booting 2.5.65-mm2, 2.5.65 was fine there is an oops
> right at the end. Is there anything specific you want?

It would be nice to know the devices that are attached to the controller.
Could you also use the latest driver from here:

http://people.FreeBSD.org/~gibbs/linux/SRC/

Be sure to enable "Decode registers during diagnostics" when configuring
the driver and send me the output you get.

Thanks,
Justin


2003-03-21 23:37:29

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: aic7(censored) dying horribly in 2.5.65-mm2

On Fri, 21 Mar 2003, Justin T. Gibbs wrote:

> > Hi Justin i got this booting 2.5.65-mm2, 2.5.65 was fine there is an oops
> > right at the end. Is there anything specific you want?
>
> It would be nice to know the devices that are attached to the controller.
> Could you also use the latest driver from here:

This is from a 2.4.18-RH kernel

scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.5
<Adaptec aic7870 SCSI adapter>
aic7870: Wide Channel A, SCSI Id=7, 16/253 SCBs

scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.5
<Adaptec aic7870 SCSI adapter>
aic7870: Wide Channel A, SCSI Id=7, 16/253 SCBs

Vendor: DEC Model: DLT2000 Rev: 830A
Type: Sequential-Access ANSI SCSI revision: 02
Vendor: TOSHIBA Model: CD-ROM XM-5401TA Rev: 3605
Type: CD-ROM ANSI SCSI revision: 02
Vendor: SEAGATE Model: ST15150W Rev: 9103
Type: Direct-Access ANSI SCSI revision: 02
Vendor: SEAGATE Model: ST15150W Rev: 9103
Type: Direct-Access ANSI SCSI revision: 02
Vendor: SEAGATE Model: ST15150W Rev: 9103
Type: Direct-Access ANSI SCSI revision: 02
Vendor: SEAGATE Model: ST15150W Rev: 9103
Type: Direct-Access ANSI SCSI revision: 02
Vendor: SEAGATE Model: ST15150W Rev: 9103
Type: Direct-Access ANSI SCSI revision: 02
Vendor: SEAGATE Model: ST15150W Rev: 9103
Type: Direct-Access ANSI SCSI revision: 02
scsi1:A:0:0: Tagged Queuing enabled. Depth 253
scsi1:A:1:0: Tagged Queuing enabled. Depth 253
scsi1:A:2:0: Tagged Queuing enabled. Depth 253
scsi1:A:3:0: Tagged Queuing enabled. Depth 253
scsi1:A:4:0: Tagged Queuing enabled. Depth 253
scsi1:A:5:0: Tagged Queuing enabled. Depth 253
Attached scsi disk sda at scsi1, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi1, channel 0, id 1, lun 0
Attached scsi disk sdc at scsi1, channel 0, id 2, lun 0
Attached scsi disk sdd at scsi1, channel 0, id 3, lun 0
Attached scsi disk sde at scsi1, channel 0, id 4, lun 0
Attached scsi disk sdf at scsi1, channel 0, id 5, lun 0
(scsi1:A:0): 20.000MB/s transfers (10.000MHz, offset 8, 16bit)
SCSI device sda: 8388315 512-byte hdwr sectors (4295 MB)
Partition check:
sda: sda1
(scsi1:A:1): 20.000MB/s transfers (10.000MHz, offset 8, 16bit)
SCSI device sdb: 8388315 512-byte hdwr sectors (4295 MB)
sdb: sdb1 sdb2
(scsi1:A:2): 20.000MB/s transfers (10.000MHz, offset 8, 16bit)
SCSI device sdc: 8388315 512-byte hdwr sectors (4295 MB)
sdc: sdc1
(scsi1:A:3): 20.000MB/s transfers (10.000MHz, offset 8, 16bit)
SCSI device sdd: 8388315 512-byte hdwr sectors (4295 MB)
sdd: sdd1
(scsi1:A:4): 20.000MB/s transfers (10.000MHz, offset 8, 16bit)
SCSI device sde: 8388315 512-byte hdwr sectors (4295 MB)
sde: sde1
(scsi1:A:5): 20.000MB/s transfers (10.000MHz, offset 8, 16bit)
SCSI device sdf: 8388315 512-byte hdwr sectors (4295 MB)
sdf: sdf1


> http://people.FreeBSD.org/~gibbs/linux/SRC/

I'll try that driver this evening.

> the driver and send me the output you get.

Thanks,
Zwane

--
function.linuxpower.ca

2003-03-22 01:35:43

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: aic7(censored) dying horribly in 2.5.65-mm2

Here is a boot with your latest from the URL.

Starting xfs: SCSI disk error : host 1 channel 0 id 0 lun 0 return code = 6000000
end_request: I/O error, dev sda, sector 21496
SCSI disk error : host 1 channel 0 id 0 lun 0 return code = 6030000
end_request: I/O error, dev sda, sector 786536
Buffer I/O error on device sd(8,1), logical block 98313
SCSI disk error : host 1 channel 0 id 0 lun 0 return code = 6000000
end_request: I/O error, dev sda, sector 796424
Buffer I/O error on device sd(8,1), logical block 99549
Debug: sleeping function called from illegal context at
include/linux/rwsem.h:43
Call Trace:
[<c011ed18>] __might_sleep+0x58/0x60
[<c028ec00>] __bdevname+0x50/0xc0
[<c015cb41>] buffer_io_error+0x11/0x30
[<c015d406>] end_buffer_async_write+0x166/0x190
[<c0160101>] end_bio_bh_io_sync+0x21/0x30
[<c0161515>] bio_endio+0x35/0x60
[<c028df8f>] __end_that_request_first+0x1ff/0x220
[<c028dfc7>] end_that_request_first+0x17/0x20
[<c028a864>] elv_next_request+0x84/0xe0
[<c02e187f>] scsi_request_fn+0x3f/0x2a0
[<c028c442>] __blk_run_queue+0x12/0x20
[<c02e022d>] scsi_restart_operations+0xbd/0x120
[<c02e05b8>] scsi_error_handler+0x138/0x1a0
[<c02e0480>] scsi_error_handler+0x0/0x1a0
[<c0107095>] kernel_thread_helper+0x5/0x10

Buffer I/O error on device sd(8,1), logical block 720900
Buffer I/O error on device sd(8,1), logical block 65708
Buffer I/O error on device sd(8,1), logical block 753672
Buffer I/O error on device sd(8,1), logical block 98306
Buffer I/O error on device sd(8,1), logical block 98307
Buffer I/O error on device sd(8,1), logical block 98314
Buffer I/O error on device sd(8,1), logical block 524288
Buffer I/O error on device sd(8,1), logical block 524289
Buffer I/O error on device sd(8,1), logical block 524804
Buffer I/O error on device sd(8,1), logical block 524292
Buffer I/O error on device sd(8,1), logical block 524301
Buffer I/O error on device sd(8,1), logical block 884746
Buffer I/O error on device sd(8,1), logical block 720896
Buffer I/O error on device sd(8,1), logical block 720897
Buffer I/O error on device sd(8,1), logical block 722105
Buffer I/O error on device sd(8,1), logical block 720904
Buffer I/O error on device sd(8,1), logical block 720905
Buffer I/O error on device sd(8,1), logical block 720912
Buffer I/O error on device sd(8,1), logical block 720913
Buffer I/O error on device sd(8,1), logical block 393221
Buffer I/O error on device sd(8,1), logical block 786438
Buffer I/O error on device sd(8,1), logical block 557062
Buffer I/O error on device sd(8,1), logical block 589831
Buffer I/O error on device sd(8,1), logical block 327694
Buffer I/O error on device sd(8,1), logical block 65536
Buffer I/O error on device sd(8,1), logical block 65537
Buffer I/O error on device sd(8,1), logical block 66054
Buffer I/O error on device sd(8,1), logical block 65540
Buffer I/O error on device sd(8,1), logical block 425998
Buffer I/O error on device sd(8,1), logical block 425999
Unable to handle kernel paging request at virtual address 6b6b6bd3
printing eip:
c02e92f9
*pde = 00000000
Oops: 0000 [#1]
CPU: 1
EIP: 0060:[<c02e92f9>] Not tainted
EFLAGS: 00010002 VLI
EIP is at aic7xxx_done+0x19/0x570
eax: 6b6b6b6b ebx: c152040c ecx: c1521868 edx: c161e4c0
esi: c152040c edi: c153c54c ebp: c153c54c esp: c93f5a14
ds: 007b es: 007b ss: 0068
Process S90xfs (pid: 917, threadinfo=c93f4000 task=c977c660)
Stack: c161e4c0 c152040c 00000016 c153c54c 00000001 c02e989d c153c54c c152040c
00000001 00000006 c153c54c 00000000 00000000 c02f2550 c153c54c 00000001
c153c54c 00000000 00000000 ffffffff 000000ff 00000000 00000000 c153c54c
Call Trace:
[<c02f2d98>] do_aic7xxx_isr+0x78/0x120
[<c01106c4>] timer_interrupt+0xd4/0x1f0
[<c010bb0d>] handle_IRQ_event+0x2d/0x50
[<c010be49>] do_IRQ+0x109/0x210
[<c010a398>] common_interrupt+0x18/0x20
[<c011ef4e>] .text.lock.sched+0x10a/0x12c
[<c011c5c0>] schedule+0x320/0x610
[<c01a80b6>] do_get_write_access+0x5d6/0x720
[<c011c900>] default_wake_function+0x0/0x20
[<c015e2a4>] __bread+0x14/0x30
[<c01a8249>] journal_get_write_access+0x49/0x70
[<c019efd0>] ext3_reserve_inode_write+0x50/0xb0
[<c01aebec>] __jbd_kmalloc+0x1c/0x70
[<c019f048>] ext3_mark_inode_dirty+0x18/0x40
[<c019f1ba>] ext3_dirty_inode+0x14a/0x180
[<c017e7f3>] __mark_inode_dirty+0x143/0x150
[<c0177fb5>] update_atime+0xb5/0xc0
[<c013c79e>] __generic_file_aio_read+0x18e/0x1d0
[<c013c4d0>] file_read_actor+0x0/0x140
[<c013c821>] generic_file_aio_read+0x41/0x60
[<c015b2cd>] do_sync_read+0x7d/0xb0
[<c011f6f1>] mm_init+0xe1/0x120
[<c015b3b1>] vfs_read+0xb1/0x1b0
[<c01665db>] kernel_read+0x3b/0x50
[<c0167508>] prepare_binprm+0xb8/0xd0
[<c0167c77>] do_execve+0x1a7/0x240
[<c0107865>] sys_execve+0x35/0x70
[<c0109477>] syscall_call+0x7/0xb

Code: 89 01 8b 41 04 85 c0 75 05 8b 01 89 41 04 c3 90 89 f6 55 57 56 53 53 8b 74 24 1c 8b 6c 24 18 8b 46 04 89 04 24 8b
<0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing

2003-03-22 02:27:48

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: aic7(censored) dying horribly in 2.5.65-mm2

Actually, please disregard that last oops i looked at the version numbers
and they can't have been right. i don't think i booted the right image.

Here are some oopses with 6.2.30, there does appear to be a relation wrt
interrupt routing because if i boot with noapic it passes the boot test
and appears to be functional. If you have any more suggestions please send
them my way, i shall also be trying a number of things.

Cheers,
Zwane

0xc02ff2dc is in ahc_linux_run_complete_queue
(drivers/scsi/aic7xxx/aic7xxx_osm.c:687).
682 || (cmd->result & 0xFF) != SCSI_STATUS_OK)
683 with_errors++;
684
685 cmd->scsi_done(cmd); <===
686
687 if (with_errors > AHC_LINUX_MAX_RETURNED_ERRORS) {


Bringing up loopback interface: Unable to handle kernel paging request at virtual address 6b6b6b6b
*pde = 00000000
Oops: 0000 [#1]
CPU: 2
EIP: 0060:[<6b6b6b6b>] Not tainted
EFLAGS: 00010002 VLI
EIP is at 0x6b6b6b6b
eax: c168da10 ebx: 00000000 ecx: cbe32290 edx: 00000000
esi: 00000001 edi: cb10dcac ebp: 00000005 esp: cb10dc2c
ds: 007b es: 007b ss: 0068
Process ifup-routes (pid: 317, threadinfo=cb10c000 task=cbaa5360)
Stack: c02ff2dc c168da10 c168da10 cbe32290 c0305807 cbe32290 c168da10 00000296
cb10dcac c04a3dcc cbffe4f4 04000001 c010bb0d 00000005 cbe32290 cb10dcac
c05060a0 cb10c000 cb10c000 00000005 c010be49 00000005 cb10dcac cbffe4f4
Call Trace:
[<c02ff2dc>] ahc_linux_run_complete_queue+0x3c/0x50
[<c0305807>] ahc_linux_isr+0x1d7/0x3a0
[<c010bb0d>] handle_IRQ_event+0x2d/0x50
[<c010be49>] do_IRQ+0x109/0x210
[<c010a398>] common_interrupt+0x18/0x20
[<c011ef53>] .text.lock.sched+0x10f/0x12c
[<c011c5c0>] schedule+0x320/0x610
[<c0119d80>] do_page_fault+0x210/0x47a
[<c01a80b6>] do_get_write_access+0x5d6/0x720
[<c011c900>] default_wake_function+0x0/0x20
[<c015e2a4>] __bread+0x14/0x30
[<c01a8249>] journal_get_write_access+0x49/0x70
[<c019efd0>] ext3_reserve_inode_write+0x50/0xb0
[<c01a7421>] get_transaction+0x91/0x100
[<c019f048>] ext3_mark_inode_dirty+0x18/0x40
[<c019f1ba>] ext3_dirty_inode+0x14a/0x180
[<c017e7f3>] __mark_inode_dirty+0x143/0x150
[<c0177fb5>] update_atime+0xb5/0xc0
[<c013c79e>] __generic_file_aio_read+0x18e/0x1d0
[<c013c4d0>] file_read_actor+0x0/0x140
[<c013c821>] generic_file_aio_read+0x41/0x60
[<c015b2cd>] do_sync_read+0x7d/0xb0
[<c0110c56>] old_mmap+0xe6/0x140
[<c015b3b1>] vfs_read+0xb1/0x1b0
[<c015b73a>] sys_read+0x2a/0x40
[<c0109477>] syscall_call+0x7/0xb

Code: Bad EIP value.
<0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing

--
function.linuxpower.ca

2003-03-23 03:41:52

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: aic7(censored) dying horribly in 2.5.65-mm2

On Fri, 21 Mar 2003, Zwane Mwaikambo wrote:

> Actually, please disregard that last oops i looked at the version numbers
> and they can't have been right. i don't think i booted the right image.
>
> Here are some oopses with 6.2.30, there does appear to be a relation wrt
> interrupt routing because if i boot with noapic it passes the boot test
> and appears to be functional. If you have any more suggestions please send
> them my way, i shall also be trying a number of things.

Hi Justin,
Ok i enabled the second IOAPIC on my system which didn't make a
difference really apart from changing the IRQ assignments. However booting
the same SMP kernel with maxcpus=1 (IRQ assignments remains the same and
devices are serviced by IOAPIC) i can't reproduce the oopses. Here is
one w/o maxcpus... The reason why noapic worked before is because only one cpu was
servicing interrupts, however we are serialized per irq line...

Starting system logger: Unable to handle kernel paging request at virtual
address 6b6b6b6b
*pde = 00000000
Oops: 0000 [#1]
CPU: 0
EIP: 0060:[<6b6b6b6b>] Not tainted
EFLAGS: 00010002 VLI
EIP is at 0x6b6b6b6b
eax: c17bda10 ebx: 00000000 ecx: cbe31290 edx: 00000000
esi: 00000001 edi: c0511fa0 ebp: 0000001d esp: c0511f20
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c0510000 task=c04a12a0)
Stack: c02ff2dc c17bda10 c17bda10 cbe31290 c0305807 cbe31290 c17bda10 00000296
c011bceb c0511f44 cbffe4f4 04000001 c010bb0d 0000001d cbe31290 c0511fa0
c05063a0 c0510000 c0510000 0000001d c010be49 0000001d c0511fa0 cbffe4f4
Call Trace:
[<c02ff2dc>] ahc_linux_run_complete_queue+0x3c/0x50
[<c0305807>] ahc_linux_isr+0x1d7/0x3a0
[<c011bceb>] rebalance_tick+0x3b/0x100
[<c010bb0d>] handle_IRQ_event+0x2d/0x50
[<c010be49>] do_IRQ+0x109/0x210
[<c0106ea0>] default_idle+0x0/0x40
[<c010a398>] common_interrupt+0x18/0x20
[<c0106ea0>] default_idle+0x0/0x40
[<c0106ece>] default_idle+0x2e/0x40
[<c0106f5a>] cpu_idle+0x3a/0x50
[<c0105000>] rest_init+0x0/0x80

Code: Bad EIP value.
<0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing

--
function.linuxpower.ca