2005-11-22 21:34:10

by Andreas Haumer

[permalink] [raw]
Subject: [2.4.31 + aic79xx] SCSI error: Infinite interrupt loop, INTSTAT = 0

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

I'm in the process of setting up a new fileserver and
have some troubles with an Adaptec ASC-29320ALP U320
SCSI card and an external Infortrend EonStor RAID!

This is a Tyan TA26 barebone system (dual opteron CPU,
4GB RAM) with two on-board AIC-7902B SCSI controllers
(Tyan Thunder K8SD Pro motherboard) for internal system disks
(SW-RAID1) and two additional Adaptec 29320ALP U320 cards
for externally connected RAID (Infortrend EonStor A16U-G2421
RAID subsystem) and backup hardware.

I'm running linux-2.4.31 in 32 bit mode.

root@setup:~ {521} $ lspci
00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07)
00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05)
00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03)
00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02)
00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05)
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:03.0 SCSI storage controller: Adaptec ASC-29320ALP U320 (rev 10)
01:04.0 SCSI storage controller: Adaptec ASC-29320ALP U320 (rev 10)
02:06.0 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
02:06.1 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)
02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)
03:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
03:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
03:05.0 Mass storage controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
03:06.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
03:08.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 10)

The SCSI devices are connected as follows:

root@setup:~ {520} $ cat /proc/scsi/scsi
Attached devices:
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: IFT Model: A16U-G2421 Rev: 342A
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 00 Lun: 01
Vendor: IFT Model: A16U-G2421 Rev: 342A
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: MAXTOR Model: ATLAS10K5_73SCA Rev: JNZH
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi3 Channel: 00 Id: 05 Lun: 00
Vendor: MAXTOR Model: ATLAS10K5_73SCA Rev: JNZH
Type: Direct-Access ANSI SCSI revision: 03

SCSI driver boot messages:
[...]
Nov 22 19:53:52 setup kernel: SCSI subsystem driver Revision: 1.00
Nov 22 19:53:52 setup kernel: scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.10
Nov 22 19:53:52 setup kernel: <Adaptec 29320ALP Ultra320 SCSI adapter>
Nov 22 19:53:52 setup kernel: aic7901: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
Nov 22 19:53:52 setup kernel:
Nov 22 19:53:52 setup kernel: scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.10
Nov 22 19:53:52 setup kernel: <Adaptec 29320ALP Ultra320 SCSI adapter>
Nov 22 19:53:52 setup kernel: aic7901: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
Nov 22 19:53:52 setup kernel:
Nov 22 19:53:52 setup kernel: scsi2 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.10
Nov 22 19:53:52 setup kernel: <Adaptec AIC7902 Ultra320 SCSI adapter>
Nov 22 19:53:52 setup kernel: aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
Nov 22 19:53:52 setup kernel:
Nov 22 19:53:52 setup kernel: scsi3 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.10
Nov 22 19:53:52 setup kernel: <Adaptec AIC7902 Ultra320 SCSI adapter>
Nov 22 19:53:52 setup kernel: aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
Nov 22 19:53:52 setup kernel:
Nov 22 19:53:52 setup kernel: blk: queue f7ace618, I/O limit 4095Mb (mask 0xffffffff)
Nov 22 19:53:52 setup kernel: (scsi2:A:0): 320.000MB/s transfers (160.000MHz DT|IU|RTI|QAS, 16bit)
Nov 22 19:53:52 setup kernel: (scsi1:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
Nov 22 19:53:52 setup kernel: (scsi3:A:5): 320.000MB/s transfers (160.000MHz DT|IU|RTI|QAS, 16bit)
Nov 22 19:53:52 setup kernel: Vendor: IFT Model: A16U-G2421 Rev: 342A
Nov 22 19:53:52 setup kernel: Type: Direct-Access ANSI SCSI revision: 03
Nov 22 19:53:52 setup kernel: blk: queue f7ace418, I/O limit 4095Mb (mask 0xffffffff)
Nov 22 19:53:52 setup kernel: Vendor: IFT Model: A16U-G2421 Rev: 342A
Nov 22 19:53:52 setup kernel: Type: Direct-Access ANSI SCSI revision: 03
Nov 22 19:53:52 setup kernel: blk: queue f7ace018, I/O limit 4095Mb (mask 0xffffffff)
Nov 22 19:53:52 setup kernel: scsi1:A:0:0: Tagged Queuing enabled. Depth 32
Nov 22 19:53:52 setup kernel: scsi1:A:0:1: Tagged Queuing enabled. Depth 32
Nov 22 19:53:52 setup kernel: Vendor: MAXTOR Model: ATLAS10K5_73SCA Rev: JNZH
Nov 22 19:53:52 setup kernel: Type: Direct-Access ANSI SCSI revision: 03
Nov 22 19:53:52 setup kernel: blk: queue f7261c18, I/O limit 4095Mb (mask 0xffffffff)
Nov 22 19:53:52 setup kernel: scsi2:A:0:0: Tagged Queuing enabled. Depth 32
Nov 22 19:53:52 setup kernel: Vendor: MAXTOR Model: ATLAS10K5_73SCA Rev: JNZH
Nov 22 19:53:52 setup kernel: Type: Direct-Access ANSI SCSI revision: 03
Nov 22 19:53:52 setup kernel: blk: queue f7261a18, I/O limit 4095Mb (mask 0xffffffff)
Nov 22 19:53:52 setup kernel: scsi3:A:5:0: Tagged Queuing enabled. Depth 32
Nov 22 19:53:52 setup kernel: Attached scsi disk sda at scsi1, channel 0, id 0, lun 0
Nov 22 19:53:52 setup kernel: Attached scsi disk sdb at scsi1, channel 0, id 0, lun 1
Nov 22 19:53:52 setup kernel: Attached scsi disk sdc at scsi2, channel 0, id 0, lun 0
Nov 22 19:53:52 setup kernel: Attached scsi disk sdd at scsi3, channel 0, id 5, lun 0
[...]

Both Maxtor discs are used in a SW-RAID1 for the system
volume group and they work fine for a few days now.

Today I tried to integrate the external EonStor RAID and first
it seemd to work fine, too. The system did find the devices
and I could create a new volume group with several logical
volumes out of them.

But as soon as I try to create a filesystem on the new logical
volumes or do some other work with the devices, the SCSI driver
goes berserk:
[...]
Nov 22 19:56:14 setup kernel: scsi1:0:0:0: Attempting to abort cmd f71dec00: 0x2a 0x0 0x1 0x71 0x3 0x0 0x0 0x0 0x8 0x0
Nov 22 19:56:14 setup kernel: Infinite interrupt loop, INTSTAT = 0scsi1: At time of recovery, card was not paused
Nov 22 19:56:14 setup kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
Nov 22 19:56:14 setup kernel: scsi1: Dumping Card State at program address 0x26 Mode 0x22
Nov 22 19:56:14 setup kernel: Card was paused
Nov 22 19:56:14 setup kernel: HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]
Nov 22 19:56:14 setup kernel: DFFSTAT[0x33] SCSISIGI[0x25] SCSIPHASE[0x0] SCSIBUS[0x0]
Nov 22 19:56:14 setup kernel: LASTPHASE[0x1] SCSISEQ0[0x40] SCSISEQ1[0x12] SEQCTL0[0x0]
Nov 22 19:56:14 setup kernel: SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x0] SSTAT0[0x10]
Nov 22 19:56:14 setup kernel: SSTAT1[0x0] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0xc0]
Nov 22 19:56:14 setup kernel: SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x80]
Nov 22 19:56:14 setup kernel: LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x40]
Nov 22 19:56:14 setup kernel:
Nov 22 19:56:14 setup kernel: SCB Count = 32 CMDS_PENDING = 32 LASTSCB 0x10 CURRSCB 0x7 NEXTSCB 0xff80
Nov 22 19:56:14 setup kernel: qinstart = 268 qinfifonext = 268
Nov 22 19:56:14 setup kernel: QINFIFO:
Nov 22 19:56:14 setup kernel: WAITING_TID_QUEUES:
Nov 22 19:56:14 setup kernel: 0 ( 0x7 )
Nov 22 19:56:14 setup kernel: Pending list:
Nov 22 19:56:14 setup kernel: 7 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 19 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 6 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 15 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 10 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 11 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 12 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 5 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 13 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 4 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 3 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 2 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 1 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 0 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 9 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 8 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 14 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 30 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 31 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 25 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 26 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 27 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 28 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 29 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 20 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 21 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 22 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 23 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 24 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 16 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 17 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: 18 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 22 19:56:14 setup kernel: Total 32
Nov 22 19:56:14 setup kernel: Kernel Free SCB list:
Nov 22 19:56:14 setup kernel: Sequencer Complete DMA-inprog list:
Nov 22 19:56:14 setup kernel: Sequencer Complete list:
Nov 22 19:56:14 setup kernel: Sequencer DMA-Up and Complete list:
Nov 22 19:56:14 setup kernel:
Nov 22 19:56:14 setup kernel: scsi1: FIFO0 Free, LONGJMP == 0x8251, SCB 0x11
Nov 22 19:56:14 setup kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89]
Nov 22 19:56:14 setup kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
Nov 22 19:56:14 setup kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
Nov 22 19:56:15 setup kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
[...]

And so on, until the external SCSI devices become unusable.
The system is still running on the internally connected
SCSI drives, though.

I found some messages reporting similar problems on this
list, a few weeks ago (beginning of October 2005). There
was also a patch for the aic79xx driver mentioned, but I
haven't found any report about it since then, so I don't
know the status of the patch (it was for the 2.6 kernel,
anyway, as far as I remember)

What can I do to make the external RAID usable?
Dump the Adaptec cards and replace them with something better?
Patch the driver?

Any help is appreciated!

Thanks!

- - andreas

- --
Andreas Haumer | mailto:[email protected]
*x Software + Systeme | http://www.xss.co.at/
Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0
A-1100 Vienna, Austria | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDg47JxJmyeGcXPhERAmHJAKDDneUcGWBG/DO6BmErT+EFm3WDUgCfYrW7
jjGW+en9tiILjo5XhcFa5Cc=
=GR+f
-----END PGP SIGNATURE-----


2005-11-24 05:39:56

by Willy Tarreau

[permalink] [raw]
Subject: Re: [2.4.31 + aic79xx] SCSI error: Infinite interrupt loop, INTSTAT = 0

Hello Andreas,

On Tue, Nov 22, 2005 at 10:34:04PM +0100, Andreas Haumer wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi!
>
> I'm in the process of setting up a new fileserver and
> have some troubles with an Adaptec ASC-29320ALP U320
> SCSI card and an external Infortrend EonStor RAID!
>
> This is a Tyan TA26 barebone system (dual opteron CPU,
> 4GB RAM) with two on-board AIC-7902B SCSI controllers
> (Tyan Thunder K8SD Pro motherboard) for internal system disks
> (SW-RAID1) and two additional Adaptec 29320ALP U320 cards
> for externally connected RAID (Infortrend EonStor A16U-G2421
> RAID subsystem) and backup hardware.
>
> I'm running linux-2.4.31 in 32 bit mode.

just for the record, I've checked 2.4.32 and the driver is exactly the
same as in 2.4.31.

> root@setup:~ {521} $ lspci
(...)
> 01:03.0 SCSI storage controller: Adaptec ASC-29320ALP U320 (rev 10)
> 01:04.0 SCSI storage controller: Adaptec ASC-29320ALP U320 (rev 10)
> 02:06.0 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
> 02:06.1 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
(...)

I've never tried an adaptec U320 yet, only a few 29160 in various servers.

(...)
> Today I tried to integrate the external EonStor RAID and first
> it seemd to work fine, too. The system did find the devices
> and I could create a new volume group with several logical
> volumes out of them.
>
> But as soon as I try to create a filesystem on the new logical
> volumes or do some other work with the devices, the SCSI driver
> goes berserk:

So could we say when you have very low traffic (device identification,
write a few sectors to create the volume), everything's OK, and when
you write larger amounts of data, the problem strikes ?

It may be possible that you have a termination and/or cable problem
and that the driver does not correctly recover from such a condition.

> [...]
>
> And so on, until the external SCSI devices become unusable.
> The system is still running on the internally connected
> SCSI drives, though.
>
> I found some messages reporting similar problems on this
> list, a few weeks ago (beginning of October 2005). There
> was also a patch for the aic79xx driver mentioned, but I
> haven't found any report about it since then, so I don't
> know the status of the patch (it was for the 2.6 kernel,
> anyway, as far as I remember)

would you please send a link to this patch, or even the
whole thread if there were responses ?

> What can I do to make the external RAID usable?
> Dump the Adaptec cards and replace them with something better?

I've heard several people tell me that they have no problem with LSI
logic cards, but as I don't have problems either with AIC79xx, I don't
know how that should be interpreted.

> Patch the driver?

There is a large patch from the driver's author on his site. In fact,
it's not really a patch, it's the whole driver directory. I've used
it for a long time now (a few years) in my kernels without any problem.
You may want to try it :

http://people.freebsd.org/~gibbs/linux/

You can also get it as a patch from my tree :

http://w.ods.org/kernel/2.4-wt/2.4.31-wt1/patches-2.4.31-wt1/pool/aic79xx-20040522-linux-2.4.30-pre3.rediff

> Any help is appreciated!

good luck !

Regards,
Willy

> Thanks!
>
> - - andreas
>
> - --
> Andreas Haumer | mailto:[email protected]
> *x Software + Systeme | http://www.xss.co.at/
> Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0
> A-1100 Vienna, Austria | Fax: +43-1-6060114-71
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
>
> iD8DBQFDg47JxJmyeGcXPhERAmHJAKDDneUcGWBG/DO6BmErT+EFm3WDUgCfYrW7
> jjGW+en9tiILjo5XhcFa5Cc=
> =GR+f
> -----END PGP SIGNATURE-----
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2005-11-24 13:05:30

by Andreas Haumer

[permalink] [raw]
Subject: Re: [2.4.31 + aic79xx] SCSI error: Infinite interrupt loop, INTSTAT = 0

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Willy,

many thanks for your reply!

Willy Tarreau schrieb:
> Hello Andreas,
>
> On Tue, Nov 22, 2005 at 10:34:04PM +0100, Andreas Haumer wrote:
>
[...]
>
> I'm running linux-2.4.31 in 32 bit mode.
>
>
>> just for the record, I've checked 2.4.32 and the driver is exactly the
>> same as in 2.4.31.
>
Ok.

>
> root@setup:~ {521} $ lspci
>
>> (...)
>
> 01:03.0 SCSI storage controller: Adaptec ASC-29320ALP U320 (rev 10)
> 01:04.0 SCSI storage controller: Adaptec ASC-29320ALP U320 (rev 10)
> 02:06.0 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
> 02:06.1 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
>
>> (...)
>
>> I've never tried an adaptec U320 yet, only a few 29160 in various servers.
>
I have tried the external RAID with the following controllers now:

Adaptec 29160 - works fine (with the aic7xxx driver)
Adaptec 29320ALP - does not work (tried with two different cards)
Adaptec 29320A - does not work

>> (...)
>
> Today I tried to integrate the external EonStor RAID and first
> it seemd to work fine, too. The system did find the devices
> and I could create a new volume group with several logical
> volumes out of them.
>
> But as soon as I try to create a filesystem on the new logical
> volumes or do some other work with the devices, the SCSI driver
> goes berserk:
>
>
>> So could we say when you have very low traffic (device identification,
>> write a few sectors to create the volume), everything's OK, and when
>> you write larger amounts of data, the problem strikes ?
>

The probability for SCSI timeouts and bus resets is higher with
higher bus activity. I did a lot of testing in the past few days
and sometimes (but not always) I get timeouts and bus resets even
when scanning the partition table in the initial ramdisk...

>> It may be possible that you have a termination and/or cable problem
>> and that the driver does not correctly recover from such a condition.
>
I can not completely rule out bad SCSI cabling, but:

* I replaced cables
-> problem remains the same
* I tried with three different 29320 controller boards (LP and non-LP)
-> problem remains the same
* I tried with the original setup (cables, RAID device, server, software),
but used a 29160 controller and it worked. But of course at a lower data
transfer speed on the SCSI bus and with a different SCSI driver (aic7xxx)!

This is not a cheap setup: high quality SCSI cables and VHDCI connectors,
not-so-cheap external RAID and server, everything connected to an 2200VA
UPS, air conditioned computer room. SCSI bus termination is internal and
automatically handled by the RAID subsystem.
I can't rule out bad hardware, but to me it seems unlikely. I have set up
several similar systems in the past year (same server, same RAID subsystem)
and never had any problems.
This is the first one where I want to use those Adaptec 29320 controllers,
though... ;-)

[...]
>
> I found some messages reporting similar problems on this
> list, a few weeks ago (beginning of October 2005). There
> was also a patch for the aic79xx driver mentioned, but I
> haven't found any report about it since then, so I don't
> know the status of the patch (it was for the 2.6 kernel,
> anyway, as far as I remember)
>
>
>> would you please send a link to this patch, or even the
>> whole thread if there were responses ?
>
This was a thread crossposted on both linux-kernel and linux-scsi,
starting on September 28th, 2005 going until October 4th, 2005.
Subject was "Infinite interrupt loop, INTSTAT = 0"
(See http://marc.theaimsgroup.com/?l=linux-scsi&m=112791530210044&w=2)

A patch was posted by James Bottomley on October 3rd, 2004
(See http://marc.theaimsgroup.com/?l=linux-scsi&m=112837144508743&w=2)

>
> What can I do to make the external RAID usable?
> Dump the Adaptec cards and replace them with something better?
>
>
>> I've heard several people tell me that they have no problem with LSI
>> logic cards, but as I don't have problems either with AIC79xx, I don't
>> know how that should be interpreted.
>
I also have good experience with LSI Logic cards (Fusion MPT driver).
Yesterday I ordered several of them, I hope I'll get them soon so I
can do further tests.

>
> Patch the driver?
>
>
>> There is a large patch from the driver's author on his site. In fact,
>> it's not really a patch, it's the whole driver directory. I've used
>> it for a long time now (a few years) in my kernels without any problem.
>> You may want to try it :
>
>> http://people.freebsd.org/~gibbs/linux/
>
>> You can also get it as a patch from my tree :
>
>> http://w.ods.org/kernel/2.4-wt/2.4.31-wt1/patches-2.4.31-wt1/pool/aic79xx-20040522-linux-2.4.30-pre3.rediff
>
I downloaded the driver from Justin's site (aic79xx-linux-2.4-20040522-tar.gz)
and compiled the driver for kernel 2.4.31. Compilation went well and without
errors or warnings. Driver version is 2.0.12 for the aic79xx driver.

With the new driver (v2.0.12) the problem basically remains the same, though the
messages are a little bit different:
[...]
Nov 24 13:15:37 setup kernel: SCSI subsystem driver Revision: 1.00
Nov 24 13:15:37 setup kernel: scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 2.0.12
Nov 24 13:15:37 setup kernel: <Adaptec 29320A Ultra320 SCSI adapter>
Nov 24 13:15:37 setup kernel: aic7901: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
Nov 24 13:15:37 setup kernel:
Nov 24 13:15:37 setup kernel: scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 2.0.12
Nov 24 13:15:38 setup kernel: <Adaptec AIC7902 Ultra320 SCSI adapter>
Nov 24 13:15:38 setup kernel: aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
Nov 24 13:15:38 setup kernel:
Nov 24 13:15:38 setup kernel: scsi2 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 2.0.12
Nov 24 13:15:38 setup kernel: <Adaptec AIC7902 Ultra320 SCSI adapter>
Nov 24 13:15:38 setup kernel: aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
Nov 24 13:15:38 setup kernel:
Nov 24 13:15:38 setup kernel: blk: queue f7ae9a18, I/O limit 4095Mb (mask 0xffffffff)
Nov 24 13:15:38 setup kernel: (scsi1:A:0): 320.000MB/s transfers (160.000MHz DT|IU|RTI|QAS, 16bit)
Nov 24 13:15:38 setup kernel: (scsi2:A:5): 320.000MB/s transfers (160.000MHz DT|IU|RTI|QAS, 16bit)
Nov 24 13:15:38 setup kernel: Vendor: MAXTOR Model: ATLAS10K5_73SCA Rev: JNZH
Nov 24 13:15:38 setup kernel: Type: Direct-Access ANSI SCSI revision: 03
Nov 24 13:15:38 setup kernel: blk: queue f7ae9818, I/O limit 4095Mb (mask 0xffffffff)
Nov 24 13:15:38 setup kernel: scsi1:A:0:0: Tagged Queuing enabled. Depth 32
Nov 24 13:15:38 setup kernel: Vendor: MAXTOR Model: ATLAS10K5_73SCA Rev: JNZH
Nov 24 13:15:38 setup kernel: Type: Direct-Access ANSI SCSI revision: 03
Nov 24 13:15:38 setup kernel: blk: queue f7ae9618, I/O limit 4095Mb (mask 0xffffffff)
Nov 24 13:15:38 setup kernel: scsi2:A:5:0: Tagged Queuing enabled. Depth 32
Nov 24 13:15:38 setup kernel: Attached scsi disk sda at scsi1, channel 0, id 0, lun 0
Nov 24 13:15:38 setup kernel: Attached scsi disk sdb at scsi2, channel 0, id 5, lun 0
Nov 24 13:15:38 setup kernel: SCSI device sda: 143666192 512-byte hdwr sectors (73557 MB)
Nov 24 13:15:38 setup kernel: Partition check:
Nov 24 13:15:38 setup kernel: /dev/scsi/host1/bus0/target0/lun0: p1 p2
Nov 24 13:15:38 setup kernel: SCSI device sdb: 143666192 512-byte hdwr sectors (73557 MB)
Nov 24 13:15:38 setup kernel: /dev/scsi/host2/bus0/target5/lun0: p1 p2
[...]
Messages for SW-RAID, LVM and network setup omitted.
[...]
Nov 24 13:17:27 setup kernel: scsi singledevice 0 0 0 0
Nov 24 13:17:27 setup kernel: blk: queue eef37618, I/O limit 4095Mb (mask 0xffffffff)
Nov 24 13:17:27 setup kernel: Vendor: IFT scsi0:A:0:0: Tagged Queuing enabled. Depth 32
Nov 24 13:17:27 setup kernel: Model: A16U-G2421 Rev: 342A
Nov 24 13:17:27 setup kernel: Type: Direct-Access ANSI SCSI revision: 03
Nov 24 13:17:27 setup kernel: blk: queue eef37418, I/O limit 4095Mb (mask 0xffffffff)
Nov 24 13:17:27 setup kernel: scsi0:A:0:0: Tagged Queuing enabled. Depth 32
Nov 24 13:17:27 setup kernel: Attached scsi disk sdc at scsi0, channel 0, id 0, lun 0
Nov 24 13:17:32 setup kernel: (scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
Nov 24 13:17:32 setup kernel: SCSI device sdc: 4096000000 512-byte hdwr sectors (2097152 MB)
Nov 24 13:17:32 setup kernel: /dev/scsi/host0/bus0/target0/lun0: p1
Nov 24 13:18:56 setup kernel: scsi0: Recovery Initiated - Card was not paused
Nov 24 13:18:56 setup kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
Nov 24 13:18:56 setup kernel: scsi0: Dumping Card State at program address 0x22 Mode 0x33
Nov 24 13:18:56 setup kernel: Card was paused
Nov 24 13:18:56 setup kernel: HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]
Nov 24 13:18:56 setup kernel: DFFSTAT[0x24] SCSISIGI[0x24] SCSIPHASE[0x0] SCSIBUS[0x0]
Nov 24 13:18:56 setup kernel: LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x0]
Nov 24 13:18:56 setup kernel: SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x4] SSTAT0[0x0]
Nov 24 13:18:56 setup kernel: SSTAT1[0x8] SSTAT2[0x0] SSTAT3[0x80] PERRDIAG[0xc0]
Nov 24 13:18:56 setup kernel: SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x80]
Nov 24 13:18:56 setup kernel: LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0xe1]
Nov 24 13:18:56 setup kernel:
Nov 24 13:18:56 setup kernel: SCB Count = 32 CMDS_PENDING = 32 LASTSCB 0x1f CURRSCB 0x5 NEXTSCB 0xff80
Nov 24 13:18:56 setup kernel: qinstart = 236 qinfifonext = 236
Nov 24 13:18:56 setup kernel: QINFIFO:
Nov 24 13:18:56 setup kernel: WAITING_TID_QUEUES:
Nov 24 13:18:56 setup kernel: Pending list:
Nov 24 13:18:56 setup kernel: 5 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 6 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 7 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 30 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 26 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 31 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 25 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 29 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 27 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 28 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 21 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 20 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 22 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 18 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 14 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 23 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 17 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 16 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 24 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 19 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 10 FIFO_USE[0x1] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 15 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 11 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 12 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 13 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 4 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 3 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 1 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 2 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 0 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 8 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: 9 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:56 setup kernel: Total 32
Nov 24 13:18:56 setup kernel: Kernel Free SCB list:
Nov 24 13:18:56 setup kernel: Sequencer Complete DMA-inprog list:
Nov 24 13:18:56 setup kernel: Sequencer Complete list:
Nov 24 13:18:56 setup kernel: Sequencer DMA-Up and Complete list:
Nov 24 13:18:56 setup kernel: Sequencer On QFreeze and Complete list:
Nov 24 13:18:56 setup kernel:
Nov 24 13:18:56 setup kernel: scsi0: FIFO0 Active, LONGJMP == 0x261, SCB 0x13
Nov 24 13:18:56 setup kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0xc] DFSTATUS[0x89]
Nov 24 13:18:56 setup kernel: SG_CACHE_SHADOW[0xcb] SG_STATE[0x0] DFFSXFRCTL[0x0]
Nov 24 13:18:56 setup kernel: SOFFCNT[0x0] MDFFSTAT[0x16] SHADDR = 0x03287b000, SHCNT = 0x0
Nov 24 13:18:56 setup kernel: HADDR = 0x03287b000, HCNT = 0x0 CCSGCTL[0x10]
Nov 24 13:18:56 setup kernel: scsi0: FIFO1 Free, LONGJMP == 0x80ff, SCB 0x4
Nov 24 13:18:56 setup kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
Nov 24 13:18:56 setup kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
Nov 24 13:18:56 setup kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
Nov 24 13:18:56 setup kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
Nov 24 13:18:56 setup kernel: LQIN: 0x4 0x0 0x0 0x13 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x10 0x0 0x0 0x0 0x0 0x0
Nov 24 13:18:56 setup kernel: scsi0: LQISTATE = 0x2a, LQOSTATE = 0x0, OPTIONMODE = 0x52
Nov 24 13:18:56 setup kernel: scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x1
Nov 24 13:18:56 setup kernel: SIMODE0[0xc]
Nov 24 13:18:56 setup kernel: CCSCBCTL[0x4]
Nov 24 13:18:56 setup kernel: scsi0: REG0 == 0x6, SINDEX = 0x102, DINDEX = 0x102
Nov 24 13:18:56 setup kernel: scsi0: SCBPTR == 0x5, SCB_NEXT == 0xff80, SCB_NEXT2 == 0xff2b
Nov 24 13:18:56 setup kernel: CDB 2a 0 1 6c 0 3f
Nov 24 13:18:56 setup kernel: STACK: 0x1e 0x0 0x0 0x0 0x0 0x0 0x0 0x0
Nov 24 13:18:57 setup kernel: <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
Nov 24 13:18:57 setup kernel: scsi0: Host Status: Failed(0)
Nov 24 13:18:57 setup kernel: DevQ(0:0:0): 0 waiting
Nov 24 13:18:57 setup kernel: (scsi0:A:0:0): SCB 0x5 - timed out
Nov 24 13:18:57 setup kernel: (scsi0:A:0:0): BDR message in message buffer
Nov 24 13:18:58 setup kernel: scsi0: Recovery Initiated - Card was not paused
Nov 24 13:18:58 setup kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
Nov 24 13:18:58 setup kernel: scsi0: Dumping Card State at program address 0x9 Mode 0x33
Nov 24 13:18:58 setup kernel: Card was paused
Nov 24 13:18:58 setup kernel: HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11]
Nov 24 13:18:58 setup kernel: DFFSTAT[0x24] SCSISIGI[0x34] SCSIPHASE[0x0] SCSIBUS[0x0]
Nov 24 13:18:58 setup kernel: LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x0]
Nov 24 13:18:58 setup kernel: SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x4] SSTAT0[0x0]
Nov 24 13:18:58 setup kernel: SSTAT1[0x8] SSTAT2[0x0] SSTAT3[0x80] PERRDIAG[0xc0]
Nov 24 13:18:58 setup kernel: SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x80]
Nov 24 13:18:58 setup kernel: LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0xe1]
Nov 24 13:18:58 setup kernel:
Nov 24 13:18:58 setup kernel: SCB Count = 32 CMDS_PENDING = 32 LASTSCB 0x1f CURRSCB 0x5 NEXTSCB 0xff80
Nov 24 13:18:58 setup kernel: qinstart = 236 qinfifonext = 236
Nov 24 13:18:58 setup kernel: QINFIFO:
Nov 24 13:18:58 setup kernel: WAITING_TID_QUEUES:
Nov 24 13:18:58 setup kernel: Pending list:
Nov 24 13:18:58 setup kernel: 5 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 6 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 7 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 30 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 26 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 31 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 25 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 29 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 27 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 28 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 21 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 20 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 22 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 18 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 14 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 23 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 17 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 16 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 24 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 19 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 10 FIFO_USE[0x1] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 15 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 11 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 12 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 13 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 4 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 3 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 1 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 2 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 0 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 8 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: 9 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x7]
Nov 24 13:18:58 setup kernel: Total 32
Nov 24 13:18:58 setup kernel: Kernel Free SCB list:
Nov 24 13:18:58 setup kernel: Sequencer Complete DMA-inprog list:
Nov 24 13:18:58 setup kernel: Sequencer Complete list:
Nov 24 13:18:58 setup kernel: Sequencer DMA-Up and Complete list:
Nov 24 13:18:58 setup kernel: Sequencer On QFreeze and Complete list:
Nov 24 13:18:58 setup kernel:
Nov 24 13:18:58 setup kernel: scsi0: FIFO0 Active, LONGJMP == 0x261, SCB 0x13
Nov 24 13:18:58 setup kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0xc] DFSTATUS[0x89]
Nov 24 13:18:59 setup kernel: SG_CACHE_SHADOW[0xcb] SG_STATE[0x0] DFFSXFRCTL[0x0]
Nov 24 13:18:59 setup kernel: SOFFCNT[0x0] MDFFSTAT[0x16] SHADDR = 0x03287b000, SHCNT = 0x0
Nov 24 13:18:59 setup kernel: HADDR = 0x03287b000, HCNT = 0x0 CCSGCTL[0x10]
Nov 24 13:18:59 setup kernel: scsi0: FIFO1 Free, LONGJMP == 0x80ff, SCB 0x4
Nov 24 13:18:59 setup kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]
Nov 24 13:18:59 setup kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]
Nov 24 13:18:59 setup kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0
Nov 24 13:18:59 setup kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]
Nov 24 13:18:59 setup kernel: LQIN: 0x4 0x0 0x0 0x13 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x10 0x0 0x0 0x0 0x0 0x0
Nov 24 13:18:59 setup kernel: scsi0: LQISTATE = 0x2a, LQOSTATE = 0x0, OPTIONMODE = 0x52
Nov 24 13:18:59 setup kernel: scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x1
Nov 24 13:18:59 setup kernel: SIMODE0[0xc]
Nov 24 13:18:59 setup kernel: CCSCBCTL[0x4]
Nov 24 13:18:59 setup kernel: scsi0: REG0 == 0x6, SINDEX = 0x102, DINDEX = 0x102
Nov 24 13:18:59 setup kernel: scsi0: SCBPTR == 0x5, SCB_NEXT == 0xff80, SCB_NEXT2 == 0xff2b
Nov 24 13:18:59 setup kernel: CDB 2a 0 1 6c 0 3f
Nov 24 13:18:59 setup kernel: STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
Nov 24 13:18:59 setup kernel: <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
Nov 24 13:18:59 setup kernel: scsi0: Host Status: Failed(0) host_self_blocked
Nov 24 13:18:59 setup kernel: DevQ(0:0:0): 0 waiting
Nov 24 13:18:59 setup kernel: (scsi0:A:0:0): SCB 0x5 - timed out
Nov 24 13:18:59 setup kernel: Recovery SCB completes
Nov 24 13:18:59 setup kernel: scsi0: Issued Channel A Bus Reset. 32 SCBs aborted
Nov 24 13:19:13 setup kernel: (scsi0:A:0): 3.300MB/s transfers
Nov 24 13:19:13 setup kernel: scsi0: Returning to Idle Loop
Nov 24 13:19:38 setup kernel: (scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
Nov 24 13:20:08 setup kernel: scsi0: Recovery Initiated - Card was not paused
[...]

And so on...

I have now run out of test hardware. For further testing
I'll have to wait until the new LSI Logic controllers arrive,
hopefully until tomorrow.

Any other idea?

- - andreas

- --
Andreas Haumer | mailto:[email protected]
*x Software + Systeme | http://www.xss.co.at/
Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0
A-1100 Vienna, Austria | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDhbqSxJmyeGcXPhERAr2VAJ9VnlWM8jMR/UXNaDqVeg/aZr7bEwCgm5ED
fld7SP7skPCRwQ9N0z/QNKw=
=xonw
-----END PGP SIGNATURE-----

2005-11-24 23:22:14

by Willy Tarreau

[permalink] [raw]
Subject: Re: [2.4.31 + aic79xx] SCSI error: Infinite interrupt loop, INTSTAT = 0

Hi Andreas,

On Thu, Nov 24, 2005 at 02:05:26PM +0100, Andreas Haumer wrote:

> >> I've never tried an adaptec U320 yet, only a few 29160 in various servers.
> >
> I have tried the external RAID with the following controllers now:
>
> Adaptec 29160 - works fine (with the aic7xxx driver)
> Adaptec 29320ALP - does not work (tried with two different cards)
> Adaptec 29320A - does not work

OK so the problem is only related to 29320 + aic79xx driver.

> > Today I tried to integrate the external EonStor RAID and first
> > it seemd to work fine, too. The system did find the devices
> > and I could create a new volume group with several logical
> > volumes out of them.
> >
> > But as soon as I try to create a filesystem on the new logical
> > volumes or do some other work with the devices, the SCSI driver
> > goes berserk:
> >
> >
> >> So could we say when you have very low traffic (device identification,
> >> write a few sectors to create the volume), everything's OK, and when
> >> you write larger amounts of data, the problem strikes ?
> >
>
> The probability for SCSI timeouts and bus resets is higher with
> higher bus activity. I did a lot of testing in the past few days
> and sometimes (but not always) I get timeouts and bus resets even
> when scanning the partition table in the initial ramdisk...
>
> >> It may be possible that you have a termination and/or cable problem
> >> and that the driver does not correctly recover from such a condition.
> >
> I can not completely rule out bad SCSI cabling, but:
>
> * I replaced cables
> -> problem remains the same
> * I tried with three different 29320 controller boards (LP and non-LP)
> -> problem remains the same
> * I tried with the original setup (cables, RAID device, server, software),
> but used a 29160 controller and it worked. But of course at a lower data
> transfer speed on the SCSI bus and with a different SCSI driver (aic7xxx)!
>
> This is not a cheap setup: high quality SCSI cables and VHDCI connectors,
> not-so-cheap external RAID and server, everything connected to an 2200VA
> UPS, air conditioned computer room. SCSI bus termination is internal and
> automatically handled by the RAID subsystem.
> I can't rule out bad hardware, but to me it seems unlikely. I have set up
> several similar systems in the past year (same server, same RAID subsystem)
> and never had any problems.

OK, I'll fully trust you on your setup then and won't ask you is the power
plug is connected to the wall :-)

> This is the first one where I want to use those Adaptec 29320 controllers,
> though... ;-)

:-)

> [...]
> >
> > I found some messages reporting similar problems on this
> > list, a few weeks ago (beginning of October 2005). There
> > was also a patch for the aic79xx driver mentioned, but I
> > haven't found any report about it since then, so I don't
> > know the status of the patch (it was for the 2.6 kernel,
> > anyway, as far as I remember)
> >
> >
> >> would you please send a link to this patch, or even the
> >> whole thread if there were responses ?
> >
> This was a thread crossposted on both linux-kernel and linux-scsi,
> starting on September 28th, 2005 going until October 4th, 2005.
> Subject was "Infinite interrupt loop, INTSTAT = 0"
> (See http://marc.theaimsgroup.com/?l=linux-scsi&m=112791530210044&w=2)
>
> A patch was posted by James Bottomley on October 3rd, 2004
> (See http://marc.theaimsgroup.com/?l=linux-scsi&m=112837144508743&w=2)

Interesting, I've archived it. James presented it as a workaround,
waiting for something cleaner, but I've not seen any followup (may
be I've not searched well).

> > What can I do to make the external RAID usable?
> > Dump the Adaptec cards and replace them with something better?
> >
> >
> >> I've heard several people tell me that they have no problem with LSI
> >> logic cards, but as I don't have problems either with AIC79xx, I don't
> >> know how that should be interpreted.
> >
> I also have good experience with LSI Logic cards (Fusion MPT driver).
> Yesterday I ordered several of them, I hope I'll get them soon so I
> can do further tests.

OK, it will definitely rule out bad cables and RAID array.

> > Patch the driver?
> >
> >
> >> There is a large patch from the driver's author on his site. In fact,
> >> it's not really a patch, it's the whole driver directory. I've used
> >> it for a long time now (a few years) in my kernels without any problem.
> >> You may want to try it :
> >
> >> http://people.freebsd.org/~gibbs/linux/
> >
> >> You can also get it as a patch from my tree :
> >
> >> http://w.ods.org/kernel/2.4-wt/2.4.31-wt1/patches-2.4.31-wt1/pool/aic79xx-20040522-linux-2.4.30-pre3.rediff
> >
> I downloaded the driver from Justin's site (aic79xx-linux-2.4-20040522-tar.gz)
> and compiled the driver for kernel 2.4.31. Compilation went well and without
> errors or warnings. Driver version is 2.0.12 for the aic79xx driver.
>
> With the new driver (v2.0.12) the problem basically remains the same, though the
> messages are a little bit different:

Often (in my experience), when different versions of a driver find different
error conditions, it is caused by timing problems. Perhaps the driver has to
respect some pauses on the bus that are not quite correctly respected. Have
you tried to lower the speed to 160 MB/s ?

> [...]
... log output ...
> [...]
>
> And so on...
>
> I have now run out of test hardware. For further testing
> I'll have to wait until the new LSI Logic controllers arrive,
> hopefully until tomorrow.
>
> Any other idea?

Unfortunately not. I had thought about running a "verify media" test
from the adaptec bios on one of the RAID disks, but I then realized
that it will only transfer commands and no data, so the test will be
useless.

Regards,
Willy

2005-11-26 15:44:11

by Andreas Haumer

[permalink] [raw]
Subject: Re: [2.4.31 + aic79xx] SCSI error: Infinite interrupt loop, INTSTAT = 0

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Yesterday the new LSI Logic hostadapters arrived. I replaced the
Adaptec 29320 controllers with the new LSI boards and did various
tests.

Executive summary: with the LSI controllers everything works fine!

Willy Tarreau schrieb:
> Hi Andreas,
>
> On Thu, Nov 24, 2005 at 02:05:26PM +0100, Andreas Haumer wrote:
>
>
>>>>I've never tried an adaptec U320 yet, only a few 29160 in various servers.
>>>
>>I have tried the external RAID with the following controllers now:
>>
>>Adaptec 29160 - works fine (with the aic7xxx driver)
>>Adaptec 29320ALP - does not work (tried with two different cards)
>>Adaptec 29320A - does not work
>
>
> OK so the problem is only related to 29320 + aic79xx driver.
>
It indeed looks that way.

I just replaced the Adaptec 29320ALP controller with a LSI-22320
board and all my problems are gone. No SCSI errors, no timeouts.

Here's some information about my current setup:

[lspci]
00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07)
00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05)
00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03)
00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02)
00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05)
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:03.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)
01:03.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)
02:06.0 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
02:06.1 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)
02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)
03:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
03:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
03:05.0 Mass storage controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
03:06.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
03:08.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 10)

[lsmod]
Module Size Used by Not tainted
nfsd 67824 16 (autoclean)
nfs 74072 1 (autoclean)
lockd 45616 1 (autoclean) [nfsd nfs]
sunrpc 71072 1 (autoclean) [nfsd nfs lockd]
tg3 59308 1 (autoclean)
w83627hf 13144 0 (unused)
eeprom 3500 0 (unused)
lm85 17064 0 (unused)
i2c-isa 808 0 (unused)
i2c-amd756 3082 0 (unused)
i2c-proc 6052 0 [w83627hf eeprom lm85]
i2c-core 15364 0 [w83627hf eeprom lm85 i2c-isa i2c-amd756 i2c-proc]
processor 8528 0 (unused)
button 2572 0 (unused)
keybdev 1796 0 (unused)
mousedev 3960 0 (unused)
hid 20356 0 (unused)
input 3584 0 [keybdev mousedev hid]
usb-ohci 19048 0 (unused)
usbcore 60940 0 [hid usb-ohci]
xfs 481764 6 (autoclean)
ext2 35424 1 (autoclean)
unix 16080 14 (autoclean)
reiserfs 175312 6 (autoclean)
lvm-mod 59544 34 (autoclean)
raid1 14360 2 (autoclean)
md 57856 4 (autoclean) [raid1]
sd_mod 11144 12 (autoclean)
mptscsih 34640 2 (autoclean)
mptbase 31968 3 (autoclean) [mptscsih]
aic79xx 164220 4 (autoclean)
scsi_mod 94932 3 (autoclean) [sd_mod mptscsih aic79xx]

The aic79xx now drives the internal Maxtor SCSI disks, the
mptscsih driver is used for the external RAID subsystem.


[cat /proc/scsi/scsi]
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: MAXTOR Model: ATLAS10K5_73SCA Rev: JNZH
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 05 Lun: 00
Vendor: MAXTOR Model: ATLAS10K5_73SCA Rev: JNZH
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: IFT Model: A16U-G2421 Rev: 342A
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 00 Lun: 01
Vendor: IFT Model: A16U-G2421 Rev: 342A
Type: Direct-Access ANSI SCSI revision: 03


[dmesg]
SCSI subsystem driver Revision: 1.00
scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.10
<Adaptec AIC7902 Ultra320 SCSI adapter>
aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs

scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.10
<Adaptec AIC7902 Ultra320 SCSI adapter>
aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs

blk: queue f7ae9a18, I/O limit 4095Mb (mask 0xffffffff)
(scsi0:A:0): 320.000MB/s transfers (160.000MHz DT|IU|RTI|QAS, 16bit)
(scsi1:A:5): 320.000MB/s transfers (160.000MHz DT|IU|RTI|QAS, 16bit)
Vendor: MAXTOR Model: ATLAS10K5_73SCA Rev: JNZH
Type: Direct-Access ANSI SCSI revision: 03
blk: queue f7ae9818, I/O limit 4095Mb (mask 0xffffffff)
scsi0:A:0:0: Tagged Queuing enabled. Depth 32
Vendor: MAXTOR Model: ATLAS10K5_73SCA Rev: JNZH
Type: Direct-Access ANSI SCSI revision: 03
blk: queue f7ae9618, I/O limit 4095Mb (mask 0xffffffff)
scsi1:A:5:0: Tagged Queuing enabled. Depth 32
Fusion MPT base driver 2.05.16
Copyright (c) 1999-2004 LSI Logic Corporation
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator}
mptbase: Initiating ioc1 bringup
ioc1: 53C1030: Capabilities={Initiator}
mptbase: 2 MPT adapters found, 2 installed.
Fusion MPT SCSI Host driver 2.05.16
scsi2 : ioc0: LSI53C1030, FwRev=01030a00h, Ports=1, MaxQ=222, IRQ=28
scsi3 : ioc1: LSI53C1030, FwRev=01030a00h, Ports=1, MaxQ=222, IRQ=29
Vendor: IFT Model: A16U-G2421 Rev: 342A
Type: Direct-Access ANSI SCSI revision: 03
blk: queue f7ae9018, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
Vendor: IFT Model: A16U-G2421 Rev: 342A
Type: Direct-Access ANSI SCSI revision: 03
blk: queue f723de18, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi1, channel 0, id 5, lun 0
Attached scsi disk sdc at scsi2, channel 0, id 0, lun 0
Attached scsi disk sdd at scsi2, channel 0, id 0, lun 1
SCSI device sda: 143666192 512-byte hdwr sectors (73557 MB)
Partition check:
/dev/scsi/host0/bus0/target0/lun0: p1 p2
SCSI device sdb: 143666192 512-byte hdwr sectors (73557 MB)
/dev/scsi/host1/bus0/target5/lun0: p1 p2
SCSI device sdc: 4096000000 512-byte hdwr sectors (2097152 MB)
/dev/scsi/host2/bus0/target0/lun0: p1
SCSI device sdd: 2734071808 512-byte hdwr sectors (1399845 MB)
/dev/scsi/host2/bus0/target0/lun1: p1


[performance test]
root@setup:~ {504} $ time dd if=/dev/zero of=/platten/gisdat/bigfile bs=1M count=20000
20000+0 records in
20000+0 records out

real 1m27.764s
user 0m0.080s
sys 0m32.500s
root@setup:~ {505} $ umount /platten/gisdat/
root@setup:~ {506} $ mount /platten/gisdat/
root@setup:~ {507} $ time dd if=/platten/gisdat/bigfile of=/dev/null bs=1M
20000+0 records in
20000+0 records out

real 2m29.681s
user 0m0.030s
sys 0m31.510s

This simple performance test gives a rough estimation of
write throughput at 229MB/s and read throughput at 134MB/s

This is not bad IMHO

(Note: I did the umount/mount in order to clear the OS cache
before executing the read throughput test)

[...]
>>
>>A patch was posted by James Bottomley on October 3rd, 2004
>>(See http://marc.theaimsgroup.com/?l=linux-scsi&m=112837144508743&w=2)
>
>
> Interesting, I've archived it. James presented it as a workaround,
> waiting for something cleaner, but I've not seen any followup (may
> be I've not searched well).
>
I haven't found anything, either.

[...]
>>I also have good experience with LSI Logic cards (Fusion MPT driver).
>>Yesterday I ordered several of them, I hope I'll get them soon so I
>>can do further tests.
>
>
> OK, it will definitely rule out bad cables and RAID array.
>
Yes, and indeed it does as everything works fine now with just the
Adaptec controller replaced with the LSI controller. So the problem
is with almost absolute certainty the Adaptec driver (I don't believe
that three out of three new Adaptec controller boards show the same
hardware defect)

[...]
>
>
> Often (in my experience), when different versions of a driver find different
> error conditions, it is caused by timing problems. Perhaps the driver has to
> respect some pauses on the bus that are not quite correctly respected. Have
> you tried to lower the speed to 160 MB/s ?
>
No, I haven't.

But I tried to reduce the TCQ queue depth from 32 (default) to 8
but this did not solve the problem (this was a suggestion by
ari@http://www.goron.de. Thanks for your mail, Ari, but your mail address
bounces so I couldn't send you a reply!)


I will have this particular hardware setup available for
testing for about two or three days (until end of next Tuesday).
If anyone wants me to try any patches for the aic79xx driver in
this timeframe I'm willing to do so if time permits.
But the system will go into production by the end of next
week and after that there is no way for me to do any further
tests with this hardware.

- - andreas

- --
Andreas Haumer | mailto:[email protected]
*x Software + Systeme | http://www.xss.co.at/
Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0
A-1100 Vienna, Austria | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDiILCxJmyeGcXPhERAnJ6AJ9368FfWJanLvzU9Wv3Ts8L5BG2NACghI9O
XYkHANb/iSHAM6Bx6+Oz5Jw=
=B0RM
-----END PGP SIGNATURE-----

2005-11-26 16:32:18

by Roberto Nibali

[permalink] [raw]
Subject: Re: [2.4.31 + aic79xx] SCSI error: Infinite interrupt loop, INTSTAT = 0

Hello,

> Yesterday the new LSI Logic hostadapters arrived. I replaced the
> Adaptec 29320 controllers with the new LSI boards and did various
> tests.
>
> Executive summary: with the LSI controllers everything works fine!

During the past 2 years we have also changed our HW configuration
regarding SCSI from Adaptec to LSI. Always patching the kernel (2.2.x
and 2.4.x) with gibbs' code was too cumbersome and on top of that we
experienced similar interrupts and I/O aborts. Not being too familiar
with indepth SCSI technology we had to go with what proved (seemed) to
work more reliable in the long term, which were LSI based SCSI controllers.

> 01:03.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)
> 01:03.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)

[...]

> Fusion MPT base driver 2.05.16
> Copyright (c) 1999-2004 LSI Logic Corporation
> mptbase: Initiating ioc0 bringup
> ioc0: 53C1030: Capabilities={Initiator}
> mptbase: Initiating ioc1 bringup
> ioc1: 53C1030: Capabilities={Initiator}
> mptbase: 2 MPT adapters found, 2 installed.
> Fusion MPT SCSI Host driver 2.05.16
> scsi2 : ioc0: LSI53C1030, FwRev=01030a00h, Ports=1, MaxQ=222, IRQ=28
> scsi3 : ioc1: LSI53C1030, FwRev=01030a00h, Ports=1, MaxQ=222, IRQ=29
> Vendor: IFT Model: A16U-G2421 Rev: 342A
> Type: Direct-Access ANSI SCSI revision: 03
> blk: queue f7ae9018, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
> Vendor: IFT Model: A16U-G2421 Rev: 342A
> Type: Direct-Access ANSI SCSI revision: 03
> blk: queue f723de18, I/O limit 4294967295Mb (mask 0xffffffffffffffff)

[...]

> I will have this particular hardware setup available for
> testing for about two or three days (until end of next Tuesday).
> If anyone wants me to try any patches for the aic79xx driver in
> this timeframe I'm willing to do so if time permits.

More related to LSI, but would you be willing to try following for me
and report back (in private, since this is getting off-topic), please:

http://www.drugphish.ch/~ratz/mpt-status/mpt-status-1.1.4.tar.bz2

And if it does not compile on your 2.4.x system, use the following patch:

http://www.drugphish.ch/~ratz/mpt-status/mpt-status-1.1.4-fix-compilation-on-2.4.x-1.diff

It should allow you to do basic HW raid monitoring on LSI-53C1030 chipsets.

Thanks and good luck,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc