2002-09-20 05:23:34

by Ville Herva

[permalink] [raw]
Subject: 2.4.20pre7, aic7xxx-6.2.8: Panic: HOST_MSG_LOOP with invalid SCB 0

Celeron 1.3GHz, Intel i815 chipset, 512MB ram.

AIC-2640 PCI card with uw and narrow connectors. A Seagate scsi disk
(rootfs) attached to uw, and a HP tape drive attached to narrow. Tape drive
never used.

I only ran 2.4.20pre7 (no other patches) for a night and it crashed:

-------------------------------------------------------------------
Kernel panic: HOST_MSG_LOOP with invalid SCB 0

In interrupt handler, not syncing
-------------------------------------------------------------------


Boot log snippet:
-------------------------------------------------------------------
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
<Adaptec 2940 SCSI adapter>
aic7870: Wide Channel A, SCSI Id=7, 16/253 SCBs

Vendor: SEAGATE Model: ST19171W Rev: 0024
Type: Direct-Access ANSI SCSI revision: 02
(scsi0:A:0): 20.000MB/s transfers (10.000MHz, offset 8, 16bit)
Vendor: HP Model: C1537A Rev: L708
Type: Sequential-Access ANSI SCSI revision: 02
(scsi0:A:2): 10.000MB/s transfers (10.000MHz, offset 15)
scsi0:A:0:0: Tagged Queuing enabled. Depth 8
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
SCSI device sda: 17783112 512-byte hdwr sectors (9105 MB)
sda: sda1 sda2 < sda5 sda6 >
-------------------------------------------------------------------

.config
-------------------------------------------------------------------
_SCSI_AIC7XXX=y
_AIC7XXX_CMDS_PER_DEVICE=8
_AIC7XXX_RESET_DELAY_MS=15000
IG_AIC7XXX_PROBE_EISA_VL is not set
IG_AIC7XXX_BUILD_FIRMWARE is not set
-------------------------------------------------------------------

2.2.18pre18 with aic7xxx-5.1.31 was solid on this box (the motherboard has
been changed since, though).


-- v --

[email protected]


2002-09-20 15:32:27

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: 2.4.20pre7, aic7xxx-6.2.8: Panic: HOST_MSG_LOOP with invalid SCB 0

> Celeron 1.3GHz, Intel i815 chipset, 512MB ram.
>
> AIC-2640 PCI card with uw and narrow connectors. A Seagate scsi disk
> (rootfs) attached to uw, and a HP tape drive attached to narrow. Tape
> drive never used.
>
> I only ran 2.4.20pre7 (no other patches) for a night and it crashed:
>
> -------------------------------------------------------------------
> Kernel panic: HOST_MSG_LOOP with invalid SCB 0
>
> In interrupt handler, not syncing

I need all of the messages leading up to the panic in order to
diagnose this. You may need to use a serial console to get
them all.

--
Justin

2002-09-20 18:14:25

by Stephan von Krawczynski

[permalink] [raw]
Subject: 2.4.19, 2.4.20pre7, problem with aic7xxx driver

Hello Justin, hello all,

I just came across an interesting phenomenon regarding 2.4.19 / 2.4.20-pre7 and
adaptec scsi. Scene is this:

board: Asus SP97-V with Pentium 200 (non-MMX) (I know it is old)
controllers tried: adaptec 29160, 29160N, 2940 U2W
kernel: 2.4.18-SuSE (distribution 8.0), 2.4.19, 2.4.20-pre7

>From all possible configurations of the above the following work:

kernel 2.4.18-SuSE: with all controllers
kernel 2.4.19 : only with 2940 U2W
kernel 2.4.20-pre7: only with 2040 U2W

All other configurations with newer adaptecs and recent kernels fail during
init of controller. Last message in sight: "PCI: Sharing interrupt xxx"

I tried all interrupts from 5-14 and configurations with other pci devices
plugged in or not. The problem stays the same.
I really wonder what they did to 2.4.18 so that this one works... ??
Any suggestions?

Regards,
Stephan


2002-09-20 19:43:57

by Phil Brutsche

[permalink] [raw]
Subject: Re: 2.4.19, 2.4.20pre7, problem with aic7xxx driver

Stephan von Krawczynski wrote:
> Hello Justin, hello all,
>
> I just came across an interesting phenomenon regarding 2.4.19 / 2.4.20-pre7 and
> adaptec scsi. Scene is this:
>
> board: Asus SP97-V with Pentium 200 (non-MMX) (I know it is old)
> controllers tried: adaptec 29160, 29160N, 2940 U2W
> kernel: 2.4.18-SuSE (distribution 8.0), 2.4.19, 2.4.20-pre7
>
> From all possible configurations of the above the following work:
>
> kernel 2.4.18-SuSE: with all controllers
> kernel 2.4.19 : only with 2940 U2W
> kernel 2.4.20-pre7: only with 2040 U2W

The aic7xxx driver works like a champ here in 2.4.17 (vanilla and with
rmap-11c), vanilla 2.4.19, and early vanilla 2.5.x (last I used was 2.5.9).

This is a 29160 (the 64-bit dual-channel card, not the 19160 or 29160N)
controller on a MSI 694D-Pro motherboard - dual 1GHz PIIIs.


Phil

2002-09-20 19:51:47

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: 2.4.19, 2.4.20pre7, problem with aic7xxx driver

On Fri, 20 Sep 2002 14:48:56 -0500
Phil Brutsche <[email protected]> wrote:

> Stephan von Krawczynski wrote:
> > Hello Justin, hello all,
> >
> > I just came across an interesting phenomenon regarding 2.4.19 / 2.4.20-pre7
> > and adaptec scsi. Scene is this:
> >
> > board: Asus SP97-V with Pentium 200 (non-MMX) (I know it is old)
> > controllers tried: adaptec 29160, 29160N, 2940 U2W
> > kernel: 2.4.18-SuSE (distribution 8.0), 2.4.19, 2.4.20-pre7
> >
> > From all possible configurations of the above the following work:
> >
> > kernel 2.4.18-SuSE: with all controllers
> > kernel 2.4.19 : only with 2940 U2W
> > kernel 2.4.20-pre7: only with 2040 U2W

Uh, here is a typo, it should be "2940 U2W" of course ...

> The aic7xxx driver works like a champ here in 2.4.17 (vanilla and with
> rmap-11c), vanilla 2.4.19, and early vanilla 2.5.x (last I used was 2.5.9).
>
> This is a 29160 (the 64-bit dual-channel card, not the 19160 or 29160N)
> controller on a MSI 694D-Pro motherboard - dual 1GHz PIIIs.

I know this.
I use it myself in quite a number of boards (all comparably new). The thing is:
why doesn't it work in an _old_ board like SP97-V, and _only_ regarding 2.4.19
/ 2.4.20-pre7 (I did not try all in between, but I dare a good guess ;-)

Regards,
Stephan

2002-09-23 18:41:14

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: 2.4.20pre7, aic7xxx-6.2.8: Panic: HOST_MSG_LOOP with invalid SCB 0



On Mon, 23 Sep 2002, Justin T. Gibbs wrote:

> > Justin,
> >
> > I guess is the second or third report of problems with the new aic7xxx :(
>
> This issue has already been resolved as a chipset issue requiring
> I/O mapped register access to work around. The "old" aic7xxx driver
> avoids these issues by issuing a register read after every register
> write. This stops up your PCI bus with wasted cycles even if you have
> a perfectly working chipset.
>
> So, how would you like me to resolve this. We can do the same thing
> as Adaptec's windows drivers and just always use the slower, less
> efficient I/O mapped method for accessing registers. This will "fix"
> the problems people have with broken VIA and Intel chipsets. I can
> make this a compile and run-time option, but should we default to
> I/O mapped or memory mapped?
>
> Don't you just love broken PC hardware?

Its all fine, then: I thought the problems were caused by some bug in the
driver itself.

Thanks for explaining me the issue clearly. :)

2002-09-23 19:11:06

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: 2.4.20pre7, aic7xxx-6.2.8: Panic: HOST_MSG_LOOP with invalid SCB 0

> Justin,
>
> I guess is the second or third report of problems with the new aic7xxx :(

This issue has already been resolved as a chipset issue requiring
I/O mapped register access to work around. The "old" aic7xxx driver
avoids these issues by issuing a register read after every register
write. This stops up your PCI bus with wasted cycles even if you have
a perfectly working chipset.

So, how would you like me to resolve this. We can do the same thing
as Adaptec's windows drivers and just always use the slower, less
efficient I/O mapped method for accessing registers. This will "fix"
the problems people have with broken VIA and Intel chipsets. I can
make this a compile and run-time option, but should we default to
I/O mapped or memory mapped?

Don't you just love broken PC hardware?

--
Justin

2002-09-23 19:32:09

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: 2.4.20pre7, aic7xxx-6.2.8: Panic: HOST_MSG_LOOP with invalid SCB 0



On Fri, 20 Sep 2002, Justin T. Gibbs wrote:

> > Celeron 1.3GHz, Intel i815 chipset, 512MB ram.
> >
> > AIC-2640 PCI card with uw and narrow connectors. A Seagate scsi disk
> > (rootfs) attached to uw, and a HP tape drive attached to narrow. Tape
> > drive never used.
> >
> > I only ran 2.4.20pre7 (no other patches) for a night and it crashed:
> >
> > -------------------------------------------------------------------
> > Kernel panic: HOST_MSG_LOOP with invalid SCB 0
> >
> > In interrupt handler, not syncing
>
> I need all of the messages leading up to the panic in order to
> diagnose this. You may need to use a serial console to get
> them all.

Justin,

I guess is the second or third report of problems with the new aic7xxx :(

2002-09-23 21:18:33

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: 2.4.20pre7, aic7xxx-6.2.8: Panic: HOST_MSG_LOOP with invalid SCB 0

>> This issue has already been resolved as a chipset issue requiring
>> I/O mapped register access to work around. The "old" aic7xxx driver
>> avoids these issues by issuing a register read after every register
>> write. This stops up your PCI bus with wasted cycles even if you have
>> a perfectly working chipset.
>>
>> So, how would you like me to resolve this. We can do the same thing
>> as Adaptec's windows drivers and just always use the slower, less
>> efficient I/O mapped method for accessing registers. This will "fix"
>> the problems people have with broken VIA and Intel chipsets. I can
>> make this a compile and run-time option, but should we default to
>> I/O mapped or memory mapped?
>>
>> Don't you just love broken PC hardware?
>
>Its all fine, then: I thought the problems were caused by some bug in the
>driver itself.
>
>Thanks for explaining me the issue clearly. :)

Hi Justin ! What is the actual breakage here ? Is this just PCI write
posting ? (that is PCI writes staying in bridge write buffer for
some time until you flush the whole path with a read). In this
case those intel & VIA chipsets aren't at fault as this is perfectly
legal per PCI spec and we'll have problem with all other sort of
machines, especially machines with stacked PCI<->PCI bridges like
it's the case for most pmacs.

Or is there a real Intel/VIA bug regarding PCI write buffers ?

I doubt it would affect only Adaptec cards then...

Ben.


2002-09-23 21:41:51

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: 2.4.20pre7, aic7xxx-6.2.8: Panic: HOST_MSG_LOOP with invalid SCB 0

>> Thanks for explaining me the issue clearly. :)
>
> Hi Justin ! What is the actual breakage here ? Is this just PCI write
> posting ? (that is PCI writes staying in bridge write buffer for
> some time until you flush the whole path with a read). In this
> case those intel & VIA chipsets aren't at fault as this is perfectly
> legal per PCI spec and we'll have problem with all other sort of
> machines, especially machines with stacked PCI<->PCI bridges like
> it's the case for most pmacs.

No, it is not write posting. It is usually a problem with write
combining/merging and or read prefetch on devices that do not
support this feature. The memory BAR on the aic7xxx chips does
not have the PREFETCH bit set so these types of operations are
forbidden by the spec. The end result are missed writes and
state read data leading to all kinds of driver confusion.

Often these issues are really register layout dependent. If
you never have to access two registers that are right next to
each other, the chipset can't write combine, etc.

--
Justin

2002-09-24 07:34:19

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: 2.4.20pre7, aic7xxx-6.2.8: Panic: HOST_MSG_LOOP with invalid SCB 0

>No, it is not write posting. It is usually a problem with write
>combining/merging and or read prefetch on devices that do not
>support this feature. The memory BAR on the aic7xxx chips does
>not have the PREFETCH bit set so these types of operations are
>forbidden by the spec. The end result are missed writes and
>state read data leading to all kinds of driver confusion.
>
>Often these issues are really register layout dependent. If
>you never have to access two registers that are right next to
>each other, the chipset can't write combine, etc.

Ok, well. Indeed, adding a read on all writes may help here.
Does this affect the performances significantly ?

Ben.