2003-03-19 22:05:21

by Wolfram Schlich

[permalink] [raw]
Subject: Hardlocks with 2.4.21-pre5, pdc202xx_new (PDC20269) and shared IRQs

Hi,

I am experiencing system hardlocks under the following conditions:
- Hardware:
- Tyan Thunder K7 w/ 2x Athlon MP 1.2GHz (5x PCI)
- 2x Onboard Adaptec 7899P SCSI adapter
IRQ 16, IRQ 17
- 2x Onboard 3Com 3C982 100Mb 32bit PCI NIC
IRQ 18, IRC 19
- 1x National Semiconductor DP83820 1000Mb 64bit PCI NIC
IRQ 16
- 2x Promise Ultra 133TX2 PDC20269
IRQ 16, IRQ 17
- Software:
- Linux 2.4.21-pre5:
CONFIG_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_BLK_DEV_GENERIC=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_ADMA=y
CONFIG_BLK_DEV_PDC202XX_NEW=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_IDEDMA_IVB=y
CONFIG_BLK_DEV_PDC202XX=y
CONFIG_BLK_DEV_IDE_MODES=y

When one of the Promise controllers is sharing the same IRQ with one of
the NICs (don't matter which, I tried all) and data is copied *to* the
machine over the network, the system deadlocks. When data is copied
*from* the system over the network, it works all ok. Unfortunately the
system BIOS doesn't give me any possibility of setting the IRQ
channels by hand, so all I can do is put the cards into other slots.

Ah, at boot time the kernel spits out this message:
--8<--
I/O APIC: AMD Errata #22 may be present. In the event of instability try
: booting with the "noapic" option.
--8<--
I've not yet tried that, but will do now.
--
Wolfram Schlich; Friedhofstr. 8, D-88069 Tettnang; +49-(0)178-SCHLICH


2003-03-20 00:20:24

by Alan

[permalink] [raw]
Subject: Re: Hardlocks with 2.4.21-pre5, pdc202xx_new (PDC20269) and shared IRQs

On Wed, 2003-03-19 at 22:16, Wolfram Schlich wrote:
> When one of the Promise controllers is sharing the same IRQ with one of
> the NICs (don't matter which, I tried all) and data is copied *to* the
> machine over the network, the system deadlocks. When data is copied
> *from* the system over the network, it works all ok. Unfortunately the
> system BIOS doesn't give me any possibility of setting the IRQ
> channels by hand, so all I can do is put the cards into other slots.
>

Thats very useful information. There certain have been (and it seems
still are) some cases with shared IRQ that are not quite handled right.
The 2.4.21pre5/pre5-ac work has partly been about fixing it. Deadlocks
suprise me however, since the problems I've seen have been I/O
errors.

However there is another known problem that does cause deadlocks with
the AMD76x, especially if the onboard IDE is used. Shove a PS/2 mouse
in the box, reboot and retest - if you dont already have one

2003-03-20 07:12:01

by Wolfram Schlich

[permalink] [raw]
Subject: Re: Hardlocks with 2.4.21-pre5, pdc202xx_new (PDC20269) and shared IRQs

* Alan Cox <[email protected]> [2003-03-20 01:31]:
> On Wed, 2003-03-19 at 22:16, Wolfram Schlich wrote:
> > When one of the Promise controllers is sharing the same IRQ with one of
> > the NICs (don't matter which, I tried all) and data is copied *to* the
> > machine over the network, the system deadlocks. When data is copied
> > *from* the system over the network, it works all ok. Unfortunately the
> > system BIOS doesn't give me any possibility of setting the IRQ
> > channels by hand, so all I can do is put the cards into other slots.
> >
>
> Thats very useful information. There certain have been (and it seems
> still are) some cases with shared IRQ that are not quite handled right.
> The 2.4.21pre5/pre5-ac work has partly been about fixing it. Deadlocks
> suprise me however, since the problems I've seen have been I/O
> errors.

Well, now I have trashed my array :-)
-> http://marc.theaimsgroup.com/?l=linux-raid&m=104811878405765&w=2

Btw., it spits out *lots* of messages when IRQ sharing is *disabled*
in the kernel config and just dies quietly when it's *enabled*
(having it dying before didn't mess up my array... ;)).

> However there is another known problem that does cause deadlocks with
> the AMD76x, especially if the onboard IDE is used. Shove a PS/2 mouse
> in the box, reboot and retest - if you dont already have one

?! I'm using the onboard IDE for two CDROM drives and one smaller
hard disk which I use rarely... and I didn't use any of these devices
in the cases in which I had the described problems... Anyway, why should I
connect a PS/2 mouse to the machine? Is it gonna solve all my
problems at once? ;-)
--
Mit freundlichen Gruessen / Yours sincerely
Wolfram Schlich; Friedhofstr. 8, D-88069 Tettnang; +49-(0)178-SCHLICH

2003-03-20 08:41:11

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: Hardlocks with 2.4.21-pre5, pdc202xx_new (PDC20269) and shared IRQs

On Thu, 20 Mar 2003 08:22:59 +0100
Wolfram Schlich <[email protected]> wrote:

> [died ide with shared interrupts]

Don't know if it is related, but I experienced the same thing sharing PDC with
3com GBit (Broadcom) and it was indeed solved by latest version of tg3-driver
from Jeff. Maybe there are analogies between the two cases concerning the nic
drivers, too.

--
Regards,
Stephan

2003-03-20 10:14:02

by Chris Newland

[permalink] [raw]
Subject: RE: Hardlocks with 2.4.21-pre5, pdc202xx_new (PDC20269) and shared IRQs

Hi Wolfram,

I had the same hardlock problem with dual athlons, MSI K7D Master, Promise
TX2000 (PDC20271) with 2 HDDs on RAID0 on the Promise card and only a CDROM
on the onboard IDE channel.

It used to lock hard (2.4.18 vanilla kernel) on 'tar' when using a USB mouse
but I haven't had a single lockup since plugging in a PS2 mouse :)

PS. Whilst 2.4 kernels run fine for me, I can't get any 2.5 kernel to run
yet.

I get a VFS kernel panic on bootup (can't mount root device).

I've installed Rusty's 2.5 modutils and tried compiling the 20271 driver
both into the kernel and as a module.

I read in Dave Jones' post-halloween notes that the Promise drivers are
broken:

<quote>
- The hptraid/promise RAID drivers are currently non functional, and
will probably be converted to use device-mapper.
</quote>

Is this still true?

Best Regards,

Chris Newland

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Wolfram Schlich
> Sent: 20 March 2003 07:23
> To: Linux-Kernel mailinglist
> Subject: Re: Hardlocks with 2.4.21-pre5, pdc202xx_new (PDC20269) and
> shared IRQs
>
>
> * Alan Cox <[email protected]> [2003-03-20 01:31]:
> > On Wed, 2003-03-19 at 22:16, Wolfram Schlich wrote:
> > > When one of the Promise controllers is sharing the same IRQ
> with one of
> > > the NICs (don't matter which, I tried all) and data is copied *to* the
> > > machine over the network, the system deadlocks. When data is copied
> > > *from* the system over the network, it works all ok. Unfortunately the
> > > system BIOS doesn't give me any possibility of setting the IRQ
> > > channels by hand, so all I can do is put the cards into other slots.
> > >
> >
> > Thats very useful information. There certain have been (and it seems
> > still are) some cases with shared IRQ that are not quite handled right.
> > The 2.4.21pre5/pre5-ac work has partly been about fixing it. Deadlocks
> > suprise me however, since the problems I've seen have been I/O
> > errors.
>
> Well, now I have trashed my array :-)
> -> http://marc.theaimsgroup.com/?l=linux-raid&m=104811878405765&w=2
>
> Btw., it spits out *lots* of messages when IRQ sharing is *disabled*
> in the kernel config and just dies quietly when it's *enabled*
> (having it dying before didn't mess up my array... ;)).
>
> > However there is another known problem that does cause deadlocks with
> > the AMD76x, especially if the onboard IDE is used. Shove a PS/2 mouse
> > in the box, reboot and retest - if you dont already have one
>
> ?! I'm using the onboard IDE for two CDROM drives and one smaller
> hard disk which I use rarely... and I didn't use any of these devices
> in the cases in which I had the described problems... Anyway, why should I
> connect a PS/2 mouse to the machine? Is it gonna solve all my
> problems at once? ;-)
> --
> Mit freundlichen Gruessen / Yours sincerely
> Wolfram Schlich; Friedhofstr. 8, D-88069 Tettnang; +49-(0)178-SCHLICH
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2003-03-20 11:55:49

by Wolfram Schlich

[permalink] [raw]
Subject: Re: Hardlocks with 2.4.21-pre5, pdc202xx_new (PDC20269) and shared IRQs

* Chris Newland <[email protected]> [2003-03-20 11:25]:
> Hi Wolfram,

Hi!

> I had the same hardlock problem with dual athlons, MSI K7D Master, Promise
> TX2000 (PDC20271) with 2 HDDs on RAID0 on the Promise card and only a CDROM
> on the onboard IDE channel.
>
> It used to lock hard (2.4.18 vanilla kernel) on 'tar' when using a USB mouse
> but I haven't had a single lockup since plugging in a PS2 mouse :)
>
> PS. Whilst 2.4 kernels run fine for me, I can't get any 2.5 kernel to run
> yet.
>
> I get a VFS kernel panic on bootup (can't mount root device).
>
> I've installed Rusty's 2.5 modutils and tried compiling the 20271 driver
> both into the kernel and as a module.
>
> I read in Dave Jones' post-halloween notes that the Promise drivers are
> broken:
>
> <quote>
> - The hptraid/promise RAID drivers are currently non functional, and
> will probably be converted to use device-mapper.
> </quote>
>
> Is this still true?

I have no idea... I'm not using:
a) A Promise RAID controller (just the dumb ones)
b) Kernel 2.5
:-)
--
Mit freundlichen Gruessen / Yours sincerely
Wolfram Schlich; Friedhofstr. 8, D-88069 Tettnang; +49-(0)178-SCHLICH

2003-03-20 11:57:30

by Wolfram Schlich

[permalink] [raw]
Subject: Re: Hardlocks with 2.4.21-pre5, pdc202xx_new (PDC20269) and shared IRQs

* Stephan von Krawczynski <[email protected]> [2003-03-20 09:53]:
> On Thu, 20 Mar 2003 08:22:59 +0100
> Wolfram Schlich <[email protected]> wrote:
>
> > [died ide with shared interrupts]
>
> Don't know if it is related, but I experienced the same thing sharing PDC with
> 3com GBit (Broadcom) and it was indeed solved by latest version of tg3-driver
> from Jeff. Maybe there are analogies between the two cases concerning the nic
> drivers, too.

Interesting. Well, I have the problems with both the 3c59x (100Mb) and
the ns83820 (1000Mb) drivers...
--
Mit freundlichen Gruessen / Yours sincerely
Wolfram Schlich; Friedhofstr. 8, D-88069 Tettnang; +49-(0)178-SCHLICH

2003-03-20 13:35:33

by Alan

[permalink] [raw]
Subject: Re: Hardlocks with 2.4.21-pre5, pdc202xx_new (PDC20269) and shared IRQs

On Thu, 2003-03-20 at 07:22, Wolfram Schlich wrote:
> Well, now I have trashed my array :-)
> -> http://marc.theaimsgroup.com/?l=linux-raid&m=104811878405765&w=2
>
> Btw., it spits out *lots* of messages when IRQ sharing is *disabled*
> in the kernel config and just dies quietly when it's *enabled*
> (having it dying before didn't mess up my array... ;)).

I'll take a look. I have no promise docs however so there is little that
can be done for promise specific bugs if it looks that way.

> ?! I'm using the onboard IDE for two CDROM drives and one smaller
> hard disk which I use rarely... and I didn't use any of these devices
> in the cases in which I had the described problems... Anyway, why should I
> connect a PS/2 mouse to the machine? Is it gonna solve all my
> problems at once? ;-)

Probably not, but it will avoid a lockup with IDE DMA in a specific case

2003-03-20 14:24:39

by Wolfram Schlich

[permalink] [raw]
Subject: Re: Hardlocks with 2.4.21-pre5, pdc202xx_new (PDC20269) and shared IRQs

* Alan Cox <[email protected]> [2003-03-20 14:51]:
> On Thu, 2003-03-20 at 07:22, Wolfram Schlich wrote:
> > Well, now I have trashed my array :-)
> > -> http://marc.theaimsgroup.com/?l=linux-raid&m=104811878405765&w=2
> >
> > Btw., it spits out *lots* of messages when IRQ sharing is *disabled*
> > in the kernel config and just dies quietly when it's *enabled*
> > (having it dying before didn't mess up my array... ;)).
>
> I'll take a look. I have no promise docs however so there is little that
> can be done for promise specific bugs if it looks that way.

Should I contact some guy at Promise regarding that issue?

> > ?! I'm using the onboard IDE for two CDROM drives and one smaller
> > hard disk which I use rarely... and I didn't use any of these devices
> > in the cases in which I had the described problems... Anyway, why should I
> > connect a PS/2 mouse to the machine? Is it gonna solve all my
> > problems at once? ;-)
>
> Probably not, but it will avoid a lockup with IDE DMA in a specific case

This only affects onboard IDE usage?
Argh, I start to hate this AMD-MP stuff.

Btw., I get these messages from time to time (not often):
--8<--
APIC error on CPU1: 00(02)
APIC error on CPU0: 00(02)
--8<--
Should I boot with "noapic" or "disableapic"? But I guess this is
another issue...
--
Wolfram Schlich; Friedhofstr. 8, D-88069 Tettnang; +49-(0)178-SCHLICH

2003-03-25 18:21:40

by Jan Kasprzak

[permalink] [raw]
Subject: Re: Hardlocks with 2.4.21-pre5, pdc202xx_new (PDC20269) and shared IRQs

Wolfram Schlich wrote:
: > However there is another known problem that does cause deadlocks with
: > the AMD76x, especially if the onboard IDE is used. Shove a PS/2 mouse
: > in the box, reboot and retest - if you dont already have one
:
: ?! I'm using the onboard IDE for two CDROM drives and one smaller
: hard disk which I use rarely... and I didn't use any of these devices
: in the cases in which I had the described problems... Anyway, why should I
: connect a PS/2 mouse to the machine? Is it gonna solve all my
: problems at once? ;-)

I had a similar problem which has been solved by plugging
in a PS/2 mouse. So far I've got about 10 reports from people where
the PS/2 mouse solved the problem. It seems it is limited only to
the revision 04 of AMD 768 southbridge, and especially the MSI K7D-Master
boards. My lspci looks like this:

00:00.0 Host bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] System Controller (rev 11)
00:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] AGP Bridge00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] ISA (rev 04)
00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-768 [Opus] IDE (rev 04)
[...]

It is even somewhat documented as an official AMD erratum.

This is not a dead-lock per se - but rather a hard lock-up
of the box (the system is totally locked up, even pressing NumLock does
not light the NumLock LED on the keyboard).

However: I also have occasional (less than 1 per week) dead-locks
on this box related probably to NFS or ext3 or NFS-lockd - the system is
OK, only all nfsd and lockd processes are stuck in the "D" state,
sometimes there is also an "exportfs -a" process in the "D" state
(my /etc/exports is generated from database, and I run exportfs
every two hours or so). And I think it is SMP-related, not necessarily
AMD-related. These deadlocks are more often in 2.4.21-pre kernels
than in vanilla 2.4.20. See my previous posts to LKML on this topic as well.

-Yenya

--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Czech Linux Homepage: http://www.linux.cz/ |
|-- If you start doing things because you hate others and want to screw --|
|-- them over the end result is bad. --Linus Torvalds to the BBC News --|