2002-01-16 15:35:44

by Richard Harman

[permalink] [raw]
Subject: aic7xxx driver v6.2.4 "queue abort message" questions

I've got a box that will nolonger boot off it's scsi disk anymore, (but dual booting to windows works just fine...) did anyone ever get to the bottom of what caused the "attempting to queue an abort message" bug was? I've tried booting my normal 2.4.16+preempt and a 2.4.2 kernel known to work previously and neither get pass trying to identify the devices on both channels.

Richard G Harman Jr <[email protected]>


2002-01-16 15:48:19

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: aic7xxx driver v6.2.4 "queue abort message" questions

>I've got a box that will nolonger boot off it's scsi disk anymore, (but dual b
>ooting to windows works just fine...) did anyone ever get to the bottom of wha
>t caused the "attempting to queue an abort message" bug was?

Those messages don't usually indicate bugs. Without knowing more about
your system, the devices attached to it, if you happen to have one of
those broken VIA chipsets, etc. its hard to diagnose your problem.

--
Justin

2002-01-16 15:59:38

by Richard Harman

[permalink] [raw]
Subject: Re: aic7xxx driver v6.2.4 "queue abort message" questions

It's a dual P3 600/100 with 512mb of ram, tyan thunder 100 gx (s1836dulan model) with an onboard aic-7895 dual channel UW SCSI. I'm booting off channel B (the 68pin only channel) Id 1, which is my seagate 36g SCA drive in a 5 bay sca enclosure. Id 0 is a ultraplex 40x. Channel A has a 50pin 8x2x20 plexwriter. The motherboard has a PIIX4 (GX) chipset. (http://www.tyan.com/products/html/a_thunder100gx.html)

I've hand copied down what I could of the v6.2.4 driver's debug messages, but wasn't able to catch all of it. (I hope to switch to serial console as soon as I find a null modem cable and log it that way.) Shall I send the screenfull to the list or you directly?

Thanks,
Richard G Harman Jr <[email protected]>

Quoted from "Justin T. Gibbs":
> >I've got a box that will nolonger boot off it's scsi disk anymore, (but dual b
> >ooting to windows works just fine...) did anyone ever get to the bottom of wha
> >t caused the "attempting to queue an abort message" bug was?
>
> Those messages don't usually indicate bugs. Without knowing more about
> your system, the devices attached to it, if you happen to have one of
> those broken VIA chipsets, etc. its hard to diagnose your problem.
>
> --
> Justin
>

2002-03-15 19:09:51

by Len Sorensen

[permalink] [raw]
Subject: Re: aic7xxx driver v6.2.4 "queue abort message" questions

On Wed, Jan 16, 2002 at 11:01:11AM -0500, Richard Harman wrote:
> It's a dual P3 600/100 with 512mb of ram, tyan thunder 100 gx (s1836dulan model) with an onboard aic-7895 dual channel UW SCSI. I'm booting off channel B (the 68pin only channel) Id 1, which is my seagate 36g SCA drive in a 5 bay sca enclosure. Id 0 is a ultraplex 40x. Channel A has a 50pin 8x2x20 plexwriter. The motherboard has a PIIX4 (GX) chipset. (http://www.tyan.com/products/html/a_thunder100gx.html)
>
> I've hand copied down what I could of the v6.2.4 driver's debug messages, but wasn't able to catch all of it. (I hope to switch to serial console as soon as I find a null modem cable and log it that way.) Shall I send the screenfull to the list or you directly?

I was having this problem as well on an iBM M-Pro P2 450 with the aic7895
onboard (dual channel), while an identical P2 400 did not seem to have
the same problem with the same kernel build.

I think the problem started around 2.4.13 or so. I can boot from warn
reboot, but not cold reboot.

I just tried applying the aic7xxx 6.2.5 driver patch to replace 6.2.4
that is in 2.4.18, and it actually appears to have removed the problem.
I know the new version asks in the config if you want to probe for
EISA/VLB cards, which I set to no, so either that fixed it (I should
try aic7xxx=no_probe with the other kernel), or something else in the
changes in the code has fixed it. I personally suspect a marginal timing
issue during init given the 400mhz machine is fine and the 450mhz machine
was not. Having not read through all the code changes in the patch,
I am not sure.

Len Sorensen

2002-03-15 19:31:55

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: aic7xxx driver v6.2.4 "queue abort message" questions

>I just tried applying the aic7xxx 6.2.5 driver patch to replace 6.2.4
>that is in 2.4.18, and it actually appears to have removed the problem.

This was a known issue that was corrected in 6.2.5. The driver was
referencing an uninitialized register on the card, which cause the
parity error. The uninitialized reference was harmless as the value
was ignored in the cases that it was uninitialized, but the panic it
created was a bit rough on users. 8-)

--
Justin

2002-03-18 12:59:01

by Andrey Slepuhin

[permalink] [raw]
Subject: Re: aic7xxx driver v6.2.4 "queue abort message" questions

On Fri, Mar 15, 2002 at 12:31:22PM -0700, Justin T. Gibbs wrote:
> >I just tried applying the aic7xxx 6.2.5 driver patch to replace 6.2.4
> >that is in 2.4.18, and it actually appears to have removed the problem.
>
> This was a known issue that was corrected in 6.2.5. The driver was
> referencing an uninitialized register on the card, which cause the
> parity error. The uninitialized reference was harmless as the value
> was ignored in the cases that it was uninitialized, but the panic it
> created was a bit rough on users. 8-)

This weekend I ran into exactly the same problem with parity errors,
but after updating to 6.2.5 driver version, kernel completely stalls just
after the line
SCSI subsystem driver Revision: 1.00

The system in problem is:

Dual PIII-1266,
SuperMicro P3TDER motherboard,
onboard aic7899 SCSI controller:
Bus 0, device 5, function 1:
SCSI storage controller: Adaptec 7899P (#2) (rev 1).
IRQ 27.
Master Capable. Latency=64. Min Gnt=40.Max Lat=25.
I/O at 0xd800 [0xd8ff].
Non-prefetchable 64 bit memory at 0xfeaff000 [0xfeafffff].
Bus 0, device 5, function 0:
SCSI storage controller: Adaptec 7899P (rev 1).
IRQ 26.
Master Capable. Latency=64. Min Gnt=40.Max Lat=25.
I/O at 0xd000 [0xd0ff].
Non-prefetchable 64 bit memory at 0xfeafc000 [0xfeafcfff].


I tried both updating driver for kernel 2.4.18-ac3 and switching to
kernel 2.4.19-pre3-ac1 - the same effect. Though on another computer with
Asus P2B-DS motherboard (onboard aic7890) kernel 2.4.19-pre3-ac1 works fine.

Regards,
Andrey.

--
A right thing should be simple (tm)

2002-03-18 16:25:01

by Andrey Slepuhin

[permalink] [raw]
Subject: aic7xxx driver v6.2.5 freezes the kernel

On Mon, Mar 18, 2002 at 03:58:32PM +0300, Andrey Slepuhin wrote:
> On Fri, Mar 15, 2002 at 12:31:22PM -0700, Justin T. Gibbs wrote:
> > >I just tried applying the aic7xxx 6.2.5 driver patch to replace 6.2.4
> > >that is in 2.4.18, and it actually appears to have removed the problem.
> >
> > This was a known issue that was corrected in 6.2.5. The driver was
> > referencing an uninitialized register on the card, which cause the
> > parity error. The uninitialized reference was harmless as the value
> > was ignored in the cases that it was uninitialized, but the panic it
> > created was a bit rough on users. 8-)
>
> This weekend I ran into exactly the same problem with parity errors,
> but after updating to 6.2.5 driver version, kernel completely stalls just
> after the line
> SCSI subsystem driver Revision: 1.00

[snip]

I tracked the problem down to ahc_read_seeprom(), which hangs in
CLOCK_PULSE() at aic7xxx_93cx6.c:161. But I have no idea what happens,
because this code is the same as in 6.2.4 version of the driver.

Regards,
Andrey.

--
A right thing should be simple (tm)

2002-03-18 18:26:42

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: aic7xxx driver v6.2.5 freezes the kernel

>I tracked the problem down to ahc_read_seeprom(), which hangs in
>CLOCK_PULSE() at aic7xxx_93cx6.c:161. But I have no idea what happens,
>because this code is the same as in 6.2.4 version of the driver.

Is the driver using memory mapped I/O with the new driver but I/O
mapped in the old? I will add a timeout to the CLOCK_PULSE() code,
but that still doesn't explain why the hang is happening now.

--
Justin

2002-03-18 18:31:41

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: aic7xxx driver v6.2.4 "queue abort message" questions

>The system in problem is:
>
>Dual PIII-1266,
>SuperMicro P3TDER motherboard,

BTW, I have done extensive testing on a P3TDE6 which uses the
same chipset. Can you send me your kernel configuration in private
email so I can try to reproduce this?

--
Justin

2002-04-17 11:15:25

by Andrey Slepuhin

[permalink] [raw]
Subject: Re: aic7xxx driver v6.2.5 freezes the kernel

On Tue, Mar 19, 2002 at 02:33:47PM -0700, Justin T. Gibbs wrote:
> >lspci output attached. BTW, I tried new driver on another computer with
> >the same hardware configuration - effect is repeatable, so the problem is
> >unlikely a hardware bug.
>
> No, but it is certainly hardware dependent. As soon as I get a break here
> at work, I'll see what I can dig out from your lspci output.

Hi Justin,
I tracked the problem down and I find that the following change between
versions 6.2.4 and 6.2.5 causes system freeze:

--- aic7xxx/aic7xxx_core.c Wed Apr 17 14:36:21 2002
+++ aic7xxx.new/aic7xxx_core.c Mon Mar 18 12:54:23 2002
@@ -3770,9 +3770,8 @@
* Ensure that the reset has finished
*/
wait = 1000;
- do {
+ while (--wait && !(ahc_inb(ahc, HCNTRL) & CHIPRSTACK))
ahc_delay(1000);
- } while (--wait && !(ahc_inb(ahc, HCNTRL) & CHIPRSTACK));

if (wait == 0) {
printf("%s: WARNING - Failed chip reset! "

All other changes were successfully merged without any problems.
BTW, version 6.2.6 of the driver from 2.4.19-pre7 freezes the system too.

Regards,
Andrey.

--
A right thing should be simple (tm)

2002-04-17 14:09:00

by Mr. James W. Laferriere

[permalink] [raw]
Subject: Re: aic7xxx driver v6.2.5 freezes the kernel


Hello Andrey , Version 6.2.6 on my patched upto pre-6 kernel
does not lock my present system . Hth , JimL

Linux version 2.4.19-pre6 (root@(none)) (gcc version 2.95.3 20010315
(release)) #1 SMP Sat Apr 13 22:17:13 UTC 2002
...
SCSI subsystem driver Revision: 1.00
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.6
<Adaptec aic7899 Ultra160 SCSI adapter>
aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.6
<Adaptec aic7899 Ultra160 SCSI adapter>
aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs

Vendor: HITACHI Model: DK32CJ-36MC Rev: JBBB
Type: Direct-Access ANSI SCSI revision: 03
(scsi0:A:0): 160.000MB/s transfers (80.000MHz DT, offset 126, 16bit)
Vendor: HITACHI Model: DK32CJ-36MC Rev: JBBB
Type: Direct-Access ANSI SCSI revision: 03
(scsi0:A:1): 160.000MB/s transfers (80.000MHz DT, offset 126, 16bit)
scsi0:A:0:0: Tagged Queuing enabled. Depth 8
scsi0:A:1:0: Tagged Queuing enabled. Depth 8
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0
SCSI device sda: 72205440 512-byte hdwr sectors (36969 MB)
Partition check:
sda: sda1 sda2 sda3
SCSI device sdb: 72205440 512-byte hdwr sectors (36969 MB)
sdb: sdb1 sdb2 sdb3


On Wed, 17 Apr 2002, Andrey Slepuhin wrote:

> On Tue, Mar 19, 2002 at 02:33:47PM -0700, Justin T. Gibbs wrote:
> > >lspci output attached. BTW, I tried new driver on another computer with
> > >the same hardware configuration - effect is repeatable, so the problem is
> > >unlikely a hardware bug.
> >
> > No, but it is certainly hardware dependent. As soon as I get a break here
> > at work, I'll see what I can dig out from your lspci output.

> Hi Justin,
> I tracked the problem down and I find that the following change between
> versions 6.2.4 and 6.2.5 causes system freeze:

+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | P.O. Box 854 | Give me Linux |
| [email protected] | Coudersport PA 16915 | only on AXP |
+------------------------------------------------------------------+


2002-04-17 14:55:00

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: aic7xxx driver v6.2.5 freezes the kernel

>All other changes were successfully merged without any problems.
>BTW, version 6.2.6 of the driver from 2.4.19-pre7 freezes the system too.

What motherboard is this again? Perhaps your PCI bus is running just
a hair bit faster than 66MHz? A similar issue was discovered with the
U320 controllers running at 133MHz PCI-X where some amount of delay is
required prior to accessing chip registers again after setting
CHIPRST.

The code was flipped so that the delay was acurate. In PCI, you
are only guaranteed that the write has been flushed all the way to the
device by performing a read to that device. I guess we'll just have to
hope that our write transaction isn't stalled.

I'll make a 6.2.7 <sigh> drop later today.

--
Justin

2002-04-17 15:32:05

by Andrey Slepuhin

[permalink] [raw]
Subject: Re: aic7xxx driver v6.2.5 freezes the kernel

On Wed, Apr 17, 2002 at 08:54:11AM -0600, Justin T. Gibbs wrote:
> >All other changes were successfully merged without any problems.
> >BTW, version 6.2.6 of the driver from 2.4.19-pre7 freezes the system too.
>
> What motherboard is this again?

P3TDER with dual channel U160 aic7899 controller onboard:

00:05.0 SCSI storage controller: Adaptec 7899P (rev 01)
Subsystem: Unknown device 9d15:0001
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 40 min, 25 max, 64 set, cache line size 04
Interrupt: pin A routed to IRQ 26
BIST result: 00
Region 0: I/O ports at d000 [disabled] [size=256]
Region 1: Memory at feafc000 (64-bit, non-prefetchable) [size=4K]
Expansion ROM at feaa0000 [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- AuxPwr- DSI- D1- D2- PME-
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:05.1 SCSI storage controller: Adaptec 7899P (rev 01)
Subsystem: Unknown device 9d15:0001
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 40 min, 25 max, 64 set, cache line size 04
Interrupt: pin B routed to IRQ 27
BIST result: 00
Region 0: I/O ports at d800 [disabled] [size=256]
Region 1: Memory at feaff000 (64-bit, non-prefetchable) [size=4K]
Expansion ROM at feac0000 [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- AuxPwr- DSI- D1- D2- PME-
Status: D0 PME-Enable- DSel=0 DScale=0 PME-



> Perhaps your PCI bus is running just
> a hair bit faster than 66MHz?

I doubt it.

> A similar issue was discovered with the
> U320 controllers running at 133MHz PCI-X where some amount of delay is
> required prior to accessing chip registers again after setting
> CHIPRST.
>
> The code was flipped so that the delay was acurate. In PCI, you
> are only guaranteed that the write has been flushed all the way to the
> device by performing a read to that device. I guess we'll just have to
> hope that our write transaction isn't stalled.
>
> I'll make a 6.2.7 <sigh> drop later today.

Ok, I'll test it.

Andrey.

--
A right thing should be simple (tm)