2003-03-09 16:34:48

by Marc Zyngier

[permalink] [raw]

2003-03-09 16:57:57

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: [PATCH] EISA aic7770 broken

> Justin,
>
> I'm having troubles getting an Adaptec AHA-2740 (EISA) running on
> 2.5.64.
>
> First thing is the initial request_region succeeds, but the driver
> thinks it failed... The enclosed patch fixes it.

Take a look in kernel/resource.c. request_region returns *non-zero*
if the region is already in use. The driver doesn't want to try and
probe a region that is in use by another device. Your patch is incorrect.

> But the driver crashes badly while probing the card, somewhere in
> ahc_runq_tasklet.
>
> Any idea ?

Not without more information.

--
Justin

2003-03-09 17:11:17

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH] EISA aic7770 broken

>>>>> "Justin" == Justin T Gibbs <[email protected]> writes:

Justin> Take a look in kernel/resource.c. request_region returns
Justin> *non-zero* if the region is already in use. The driver
Justin> doesn't want to try and probe a region that is in use by
Justin> another device. Your patch is incorrect.

request_region returns a pointer to the newly allocated resource when
it succeds, and NULL when it failed. It's the opposite logic
check_region uses.

Or am I *that* blind ?

Without this patch, the driver happily requests all EISA IO space, and
exits without any probing.

>> But the driver crashes badly while probing the card, somewhere in
>> ahc_runq_tasklet.
>>
>> Any idea ?

Justin> Not without more information.

Ok, what can I do ?

System is a dual PII-300, with two integrated PCI AIC-7xxx.
Moreover, I have one 1740 and one 2740, which crashes at probe time.

Is there any trace you want me to put in ?

Thanks,

M.
--
Places change, faces change. Life is so very strange.

2003-03-09 21:05:50

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: [PATCH] EISA aic7770 broken

> Justin> Take a look in kernel/resource.c. request_region returns
> Justin> *non-zero* if the region is already in use. The driver
> Justin> doesn't want to try and probe a region that is in use by
> Justin> another device. Your patch is incorrect.
>
> request_region returns a pointer to the newly allocated resource when
> it succeds, and NULL when it failed. It's the opposite logic
> check_region uses.

Sorry. I missread the comment in kernel/resource.c.

>
>>> But the driver crashes badly while probing the card, somewhere in
>>> ahc_runq_tasklet.
>>>
>>> Any idea ?
>
> Justin> Not without more information.
>
> Ok, what can I do ?

Define crashes badly. Driver messages or kernel panic strings typically
help.

--
Justin

2003-03-10 07:45:16

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH] EISA aic7770 broken

>>>>> "Justin" == Justin T Gibbs <[email protected]> writes:

Justin> Define crashes badly. Driver messages or kernel panic strings
Justin> typically help.

Here it is :

<quote>
[...]
PCI: PCI BIOS revision 2.10 entry at 0xf80cd, last bus=1
PCI: Using configuration type 1
BIO: pool of 256 setup, 14Kb (56 bytes/bio)
biovec pool[0]: 1 bvecs: 256 entries (12 bytes)
biovec pool[1]: 4 bvecs: 256 entries (48 bytes)
biovec pool[2]: 16 bvecs: 256 entries (192 bytes)
biovec pool[3]: 64 bvecs: 256 entries (768 bytes)
biovec pool[4]: 128 bvecs: 256 entries (1536 bytes)
biovec pool[5]: 256 bvecs: 256 entries (3072 bytes)
Linux Plug and Play Support v0.95 (c) Adam Belay
pnp: Enabling Plug and Play Card Services.
block request queues:
128 requests per read queue
128 requests per write queue
8 requests per batch
enter congestion at 15
exit congestion at 17
SCSI subsystem initialized
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
Starting balanced_irq
Enabling SEP on CPU 1
Enabling SEP on CPU 0
Journalled Block Device driver loaded
Installing knfsd (copyright (C) 1996 [email protected]).
Limiting direct PCI/PCI transfers.
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Serial: 8250/16550 driver $Revision: 1.90 $ IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
pty: 256 Unix98 ptys configured
Real Time Clock Driver v1.11
scsi HBA driver Adaptec 174x (EISA) didn't set a release method, please fix the template
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.28
<Adaptec aic7880 Ultra SCSI adapter>
aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs

(scsi0:A:0): 40.000MB/s transfers (20.000MHz, offset 8, 16bit)
Vendor: WDIGTL Model: WDE4360-1808A3 Rev: 1.80
Type: Direct-Access ANSI SCSI revision: 02
scsi0:A:0:0: Tagged Queuing enabled. Depth 253
Vendor: DELL Model: 6UW BACKPLANE Rev: 9
Type: Processor ANSI SCSI revision: 02
scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.28
<Adaptec aic7860 Ultra SCSI adapter>
aic7860: Ultra Single Channel A, SCSI Id=7, 3/253 SCBs

(scsi1:A:5): 10.000MB/s transfers (10.000MHz, offset 15)
Vendor: NEC Model: CD-ROM DRIVE:464 Rev: 1.05
Type: CD-ROM ANSI SCSI revision: 02
scsi2 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.28
<Adaptec 274X SCSI adapter>
aic7770: Twin Channel, A SCSI Id=7, B SCSI Id=7, primary A, 4/253 SCBs

Unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
c01f8cec
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0060:[<c01f8cec>] Not tainted
EFLAGS: 00010046
EIP is at ahc_runq_tasklet+0x54/0x140
eax: 00000000 ebx: dfe00400 ecx: c1535600 edx: 00000000
esi: dfe00600 edi: c02fdf48 ebp: 00000001 esp: c02fdf3c
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c02fc000 task=c02bdce0)
Stack: dfe00454 00000000 c02fc000 00000286 c011f722 dfe00600 00000001 c02faf08
ffffffdd 00000000 c0352438 c0352438 c011f50a c02faf08 00000000 00000000
c02fdf9c 0008e000 00000046 c0113a83 c02fc000 c0106d90 c0105000 c010974a
Call Trace:
[<c011f722>] tasklet_action+0x86/0xe4
[<c011f50a>] do_softirq+0x5a/0xac
[<c0113a83>] smp_apic_timer_interrupt+0x13f/0x150
[<c0106d90>] default_idle+0x0/0x34
[<c0105000>] _stext+0x0/0x58
[<c010974a>] apic_timer_interrupt+0x1a/0x20
[<c0106d90>] default_idle+0x0/0x34
[<c0105000>] _stext+0x0/0x58
[<c0106db9>] default_idle+0x29/0x34
[<c0106e43>] cpu_idle+0x37/0x48
[<c0105055>] _stext+0x55/0x58

Code: 89 02 8b 41 24 89 c2 83 e2 f7 89 51 24 a8 02 74 0e 83 79 10
<0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
</quote>

Running the dump through ksymoops :

<quote>
>>EIP; c01f8cec <ahc_runq_tasklet+54/140> <=====

>>edi; c02fdf48 <init_thread_union+1f48/2000>
>>esp; c02fdf3c <init_thread_union+1f3c/2000>

Trace; c011f722 <tasklet_action+86/e4>
Trace; c011f50a <do_softirq+5a/ac>
Trace; c0113a83 <smp_apic_timer_interrupt+13f/150>
Trace; c0106d90 <default_idle+0/34>
Trace; c0105000 <_stext+0/0>
Trace; c010974a <apic_timer_interrupt+1a/20>
Trace; c0106d90 <default_idle+0/34>
Trace; c0105000 <_stext+0/0>
Trace; c0106db9 <default_idle+29/34>
Trace; c0106e43 <cpu_idle+37/48>
Trace; c0105055 <rest_init+55/58>

Code; c01f8cec <ahc_runq_tasklet+54/140>
00000000 <_EIP>:
Code; c01f8cec <ahc_runq_tasklet+54/140> <=====
0: 89 02 mov %eax,(%edx) <=====
Code; c01f8cee <ahc_runq_tasklet+56/140>
2: 8b 41 24 mov 0x24(%ecx),%eax
Code; c01f8cf1 <ahc_runq_tasklet+59/140>
5: 89 c2 mov %eax,%edx
Code; c01f8cf3 <ahc_runq_tasklet+5b/140>
7: 83 e2 f7 and $0xfffffff7,%edx
Code; c01f8cf6 <ahc_runq_tasklet+5e/140>
a: 89 51 24 mov %edx,0x24(%ecx)
Code; c01f8cf9 <ahc_runq_tasklet+61/140>
d: a8 02 test $0x2,%al
Code; c01f8cfb <ahc_runq_tasklet+63/140>
f: 74 0e je 1f <_EIP+0x1f> c01f8d0b <ahc_runq_tasklet+73/140>
Code; c01f8cfd <ahc_runq_tasklet+65/140>
11: 83 79 10 00 cmpl $0x0,0x10(%ecx)
</quote>

There is a single disk connected to the EISA card (channel A).

Does it help ?

Thanks,

M.
--
Places change, faces change. Life is so very strange.

2003-03-10 15:29:08

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: [PATCH] EISA aic7770 broken

>>>>>> "Justin" == Justin T Gibbs <[email protected]> writes:
>
> Justin> Define crashes badly. Driver messages or kernel panic strings
> Justin> typically help.
>
> Here it is :
>
> <quote>
> [...]
> (scsi1:A:5): 10.000MB/s transfers (10.000MHz, offset 15)
> Vendor: NEC Model: CD-ROM DRIVE:464 Rev: 1.05
> Type: CD-ROM ANSI SCSI revision: 02
> scsi2 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.28
> <Adaptec 274X SCSI adapter>
> aic7770: Twin Channel, A SCSI Id=7, B SCSI Id=7, primary A, 4/253 SCBs
>
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> printing eip:
> c01f8cec
> *pde = 00000000
> Oops: 0002
> CPU: 0
> EIP: 0060:[<c01f8cec>] Not tainted
> EFLAGS: 00010046
> EIP is at ahc_runq_tasklet+0x54/0x140

This is so close to the beginning of the function, that it only makes
sense that "ahc" is NULL. Can you instrument both ahc_runq_tasklet()
and ahc_platform_alloc() to see if it is indeed the case that "ahc"
is NULL, and to verify that "ahc" was valid when we registered the
tasklet?

--
Justin

2003-03-10 21:35:13

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH] EISA aic7770 broken

>>>>> "Justin" == Justin T Gibbs <[email protected]> writes:

Justin> This is so close to the beginning of the function, that it
Justin> only makes sense that "ahc" is NULL. Can you instrument both
Justin> ahc_runq_tasklet() and ahc_platform_alloc() to see if it is
Justin> indeed the case that "ahc" is NULL, and to verify that "ahc"
Justin> was valid when we registered the tasklet?

It's a little bit more complicated...

The thing crashes in the TAILQ_REMOVE macro, in ahc_runq_tasklet :

TAILQ_REMOVE(&ahc->platform_data->device_runq, dev, links);

I tracked it down to the last line of TAILQ_REMOVE :

#define TAILQ_REMOVE(head, elm, field) do { \
if ((TAILQ_NEXT((elm), field)) != NULL) \
TAILQ_NEXT((elm), field)->field.tqe_prev = \
(elm)->field.tqe_prev; \
else \
(head)->tqh_last = (elm)->field.tqe_prev; \
*(elm)->field.tqe_prev = TAILQ_NEXT((elm), field); \
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
\ It crashes here
} while (0)

The thing is, if I put enough printks before the macro, it slows the
thing down (9600 bauds serial console effet, maybe), and the machine
comes up.

So it looks like a race of some sort... Concurent tasklets effect ?

Any idea ?

M.
--
Places change, faces change. Life is so very strange.