2002-03-20 16:21:24

by Bradley McLean

[permalink] [raw]
Subject: Hard hang on 3Ware7850, Dual AthlonMP, Tyan2462

I've been following the various discussions of athlon MP problems.

We too have systems that consistently hard lock up.

Running RH7.2 with kernel.org kernels, versions 2.4.17, 2.4.18,
or 2.4.18 plus the IO-APIC patch posted for 2.4.19pre3.
Using the latest (release 7.4, driver version 19) 3ware code.

Tyan 2462, 3.5 GB
(2) AMD MP1900+
(6) WB1200JB

Symptoms: Either under heavy read, or heavy write, system locks up. No ping, no keyboard.
Tests:
(T1) On single disks, run five simultaneous mke2fs, one each disk.
(T2) On single disks, run five simultaneous bonnie++, one each disk.
(T3) On 4 disk raid5, run three simultaneous bonnie++
(T4) On 4 disk raid5, run four simultaneous postgresql dump/restore/vacuum.

T3 and T4 have failed under everything except stock 2.4.7 and 2.4.9 from RH,
where they run agonizingly slow.

T1 failed on all 2.4.17 and 2.4.18 attempts (including earlier versions
of the 3ware firmware and driver), until the IO-APIC patch was added.
Then T1 and T2 both passed, several repetitions.

T3 and T4 both continue to fail, always during high bandwidth disk reads.
It appears the write portion is solved, either by the upgrade to 7.4 driver
and firmware, or by the IOAPIC patch, or both. The read still fails
in a predictable time range, but not exactly the same - once the
card is up to full capacity with 256 outstanding commands, it will fail
within the next few minutes.

When the system is set to nosmp, all tests pass, although we've had one
unexplained lockup even under these conditions. Load data was not
available.

noapic seems to have no effect.

Dual PIII Xeon 550 based systems do not exhibit the symptom.

APIC seems highly suspect.

Anyone with suggestions, or test cases?

Thanks,

-Brad McLean


2002-03-20 16:32:34

by Alan

[permalink] [raw]
Subject: Re: Hard hang on 3Ware7850, Dual AthlonMP, Tyan2462

> Running RH7.2 with kernel.org kernels, versions 2.4.17, 2.4.18,
> or 2.4.18 plus the IO-APIC patch posted for 2.4.19pre3.
> Using the latest (release 7.4, driver version 19) 3ware code.
>
> Tyan 2462, 3.5 GB
> (2) AMD MP1900+
> (6) WB1200JB

Ok thats the fourth report of this 3ware + 2462 SMP only breakage

> Anyone with suggestions, or test cases?

Apparently if you swap the Tyan for something like the ASUS dual athlon
board it works. Dunno if its hardware, bios or software.

Alan

2002-03-20 17:09:54

by Bradley McLean

[permalink] [raw]
Subject: Re: Hard hang on 3Ware7850, Dual AthlonMP, Tyan2462

* Alan Cox ([email protected]) [020320 11:34]:
>
> Ok thats the fourth report of this 3ware + 2462 SMP only breakage
>
> > Anyone with suggestions, or test cases?
>
> Apparently if you swap the Tyan for something like the ASUS dual athlon
> board it works. Dunno if its hardware, bios or software.

Thanks, Alan. Anybody out there with the ASUS dual board who can send
me your bootlog (I'm primarily interested in the processor and APIC init
sections, digging for clues by comparing).

-Brad

2002-03-20 17:51:45

by Jason L Tibbitts III

[permalink] [raw]
Subject: Re: Hard hang on 3Ware7850, Dual AthlonMP, Tyan2462

>>>>> "AC" == Alan Cox <[email protected]> writes:

AC> Apparently if you swap the Tyan for something like the ASUS dual
AC> athlon board it works. Dunno if its hardware, bios or software.

But the 2462 has the MP chipset, while Asus has only MPX boards,
right? Does the 2466 (Tyan MPX board) have the same problems? I may
be able to test this in a few hours if nobody knows. I'm running 2466
boards with 3w7850 cards but uniprocessor since I only want the 64 bit
slot.

- J<

2002-03-30 20:50:08

by Bradley McLean

[permalink] [raw]
Subject: Re: Hard hang on 3Ware7850, Dual AthlonMP, Tyan2462

* Alan Cox ([email protected]) [020320 11:34]:

> > Running RH7.2 with kernel.org kernels, versions 2.4.17, 2.4.18,
> > or 2.4.18 plus the IO-APIC patch posted for 2.4.19pre3.
> > Using the latest (release 7.4, driver version 19) 3ware code.
> >
> > Tyan 2462, 3.5 GB
> > (2) AMD MP1900+
> > (6) WB1200JB
>
> Ok thats the fourth report of this 3ware + 2462 SMP only breakage
>
> > Anyone with suggestions, or test cases?
>
> Apparently if you swap the Tyan for something like the ASUS dual athlon
> board it works. Dunno if its hardware, bios or software.

Well, in our case it was hardware, bios, firmware, and software.

I was able to get an ASUS, do side by side testing, and then
eventually get the Tyan working as well.

I'll post details once I complete full stress tests in various
configurations, but there *are* configurations that work correctly.

The hardware issue was a passive PCI riser card that only supports
the 3ware in one slot on the Tyan. Works fine in the ASUS. Sigh.

Bios is currently set to MP1.1 (will test with MP1.4 soon).

The .019 driver (with matching firmware) from 3Ware seems to be
the minimum required.

I'd like to thank 3ware's support, AngieN and AdamR specifically for
their patience, dilligence and assistance. Wish all vendors were like
them.

-Brad