Subject: Problem on Alpha with "convert to generic irq framework"

Hello Ivan,

Few weeks ago, I stopped using the latest git snapshot on my alpha (an
EV56 on a EB164/LX164 motherboard [1]) because I was experiecing nasty
crashes when doing fairly heavy IOs (like checksumming ISO files) over a
MD Raid 0 device built against 3 scsi disks connected to a pci Adaptec
(2940U2W [2]) SCSI card.

I can trigger the bug almost instantly and I get, more or less randomly,
the following panic messages (without traces):

Aiee, killing interrupt handler!
Attempted to kill the idle task!
Unable to handle kernel paging request at virtual address


Using git, I started bisecting using 2.6.15 as the "good" release
(because the box is rock solid with 2.6.15) and after countless hours
(boy that thing isn't the fastest box around) and countless kernel
compiles, I ended up with this first bad commit:


0595bf3bca9d9932a05b06dd438f40f01d27cd33 is first bad commit
diff-tree 0595bf3bca9d9932a05b06dd438f40f01d27cd33 (from eee45269b0f5979c70bc151c6c2f4e5f4f5ababe)
Author: Ivan Kokshaysky <[email protected]>
Date: Fri Jan 6 00:12:22 2006 -0800

[PATCH] Alpha: convert to generic irq framework (alpha part)

Kconfig tweaks and tons of deletions.

Signed-off-by: Ivan Kokshaysky <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Richard Henderson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

:040000 040000 ac127f16325bb65941bd38208325ab7821877f52 15d7d4d17a7c8cfb8fe53c29ded31ff9cf287534 M arch
:040000 040000 287f73cdf371b2b33cc48f1d876005aab29ff3de 29263093ae33ceccd6346b987870367bc8329f0a M include


I've scanned the patch quickly but I didn't see anything obvious... ;-(

Reverting this commit using:
git revert 0595bf3bca9d9932a05b06dd438f40f01d27cd33

against c499ec24c31edf270e777a868ffd0daddcfe7ebd (the latest HEAD as of
right now) made my system usable again.

As this code will make it into mainline 2.6.16, I wonder if I'm the
online one experiencing this problem (any Alpha owners/testers
around?)...

I'll go over the patch again to learn more about the irq framework and
I'm available to try any patches you can throw at me the goal being to
make 2.6.16 (or latter stable version) usable (at least to me).

Cheers,


[1]
/proc/cpuinfo:
cpu : Alpha
cpu model : EV56
cpu variation : 0
cpu revision : 0
cpu serial number : Linux_is_Great!
system type : EB164
system variation : LX164
system revision : 0
system serial number : MILO-2.2-18
cycle frequency [Hz] : 533333333
timer frequency [Hz] : 1024.00
page size [bytes] : 8192
phys. address bits : 40
max. addr. space # : 127
BogoMIPS : 1059.80
kernel unaligned acc : 0 (pc=0,va=0)
user unaligned acc : 0 (pc=0,va=0)
platform string : N/A
cpus detected : 0
L1 Icache : 8K, 1-way, 32b line
L1 Dcache : 8K, 1-way, 32b line
L2 cache : 96K, 3-way, 64b line
L3 cache : 1024K, 1-way, 64b line

[2]
01:01.0 SCSI storage controller: Adaptec AHA-2940U2/U2W

--
Mathieu Chouquet-Stringer


2006-03-06 12:51:20

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: Problem on Alpha with "convert to generic irq framework"

On Sat, Mar 04, 2006 at 12:12:19PM +0100, Mathieu Chouquet-Stringer wrote:
> I can trigger the bug almost instantly and I get, more or less randomly,
> the following panic messages (without traces):
>
> Aiee, killing interrupt handler!
> Attempted to kill the idle task!
> Unable to handle kernel paging request at virtual address

I cannot reproduce that, but all my machines use SRM, so interrupt
handling is quite different from AlphaBIOS systems.

> system type : EB164
> system variation : LX164
> system revision : 0
> system serial number : MILO-2.2-18

I'll try to install AlphaBIOS/MILO on my lx164 to see what happens.

Ivan.

2006-03-06 13:00:52

by Al Viro

[permalink] [raw]
Subject: Re: Problem on Alpha with "convert to generic irq framework"

On Mon, Mar 06, 2006 at 03:51:14PM +0300, Ivan Kokshaysky wrote:
> On Sat, Mar 04, 2006 at 12:12:19PM +0100, Mathieu Chouquet-Stringer wrote:
> > I can trigger the bug almost instantly and I get, more or less randomly,
> > the following panic messages (without traces):
> >
> > Aiee, killing interrupt handler!
> > Attempted to kill the idle task!
> > Unable to handle kernel paging request at virtual address
>
> I cannot reproduce that, but all my machines use SRM, so interrupt
> handling is quite different from AlphaBIOS systems.
>
> > system type : EB164
> > system variation : LX164
> > system revision : 0
> > system serial number : MILO-2.2-18
>
> I'll try to install AlphaBIOS/MILO on my lx164 to see what happens.

FWIW, works here on DS10 and alphastation (both with SRM)...

Subject: Re: Problem on Alpha with "convert to generic irq framework"

On Mon, Mar 06, 2006 at 01:00:50PM +0000, Al Viro wrote:
> FWIW, works here on DS10 and alphastation (both with SRM)...

Thanks for chiming in!!!

--
Mathieu Chouquet-Stringer

Subject: Re: Problem on Alpha with "convert to generic irq framework"

On Mon, Mar 06, 2006 at 03:51:14PM +0300, Ivan Kokshaysky wrote:
> I cannot reproduce that, but all my machines use SRM, so interrupt
> handling is quite different from AlphaBIOS systems.
> [...]
> I'll try to install AlphaBIOS/MILO on my lx164 to see what happens.

Too bad my alpha doesn't support SRM (it's really a modified LX164
board).

Is there anything I can do to help debug the problem?
--
Mathieu Chouquet-Stringer [email protected]
"Le disparu, si l'on v?n?re sa m?moire, est plus pr?sent et
plus puissant que le vivant".
-- Antoine de Saint-Exup?ry, Citadelle --

2006-03-06 16:13:28

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: Problem on Alpha with "convert to generic irq framework"

On Mon, Mar 06, 2006 at 02:54:34PM +0100, Mathieu Chouquet-Stringer wrote:
> Too bad my alpha doesn't support SRM (it's really a modified LX164
> board).
>
> Is there anything I can do to help debug the problem?

No, thanks. I've finally found the AlphaBIOS ROM image for lx164,
that was most difficult part of the work. ;-)
Now I'm able to kill the box in just one second with 'ping -f'...
Will look into this tomorrow.

Ivan.

Subject: Re: Problem on Alpha with "convert to generic irq framework"

On Mon, Mar 06, 2006 at 07:13:24PM +0300, Ivan Kokshaysky wrote:
> No, thanks. I've finally found the AlphaBIOS ROM image for lx164,
> that was most difficult part of the work. ;-)

lol, I wish I could switch to srm but it just doesn't work...

> Now I'm able to kill the box in just one second with 'ping -f'...

Awesome (well in a way)...

> Will look into this tomorrow.

Thanks Ivan!!!

--
Mathieu Chouquet-Stringer [email protected]
"Le disparu, si l'on v?n?re sa m?moire, est plus pr?sent et
plus puissant que le vivant".
-- Antoine de Saint-Exup?ry, Citadelle --

2006-03-08 11:28:59

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: Problem on Alpha with "convert to generic irq framework"

Well, the problem with the new interrupt code is that it does
local_irq_enable() before return from interrupt.

I don't know exactly why it breaks with the MILO PALcode. I'd guess
that if an interrupt occurs during 'call_pal rti' execution, some
critical PALcode data gets corrupted.

Fixed thus.

Ivan.

--- 2.6.16-rc5/arch/alpha/kernel/irq.c Mon Mar 6 11:57:58 2006
+++ linux/arch/alpha/kernel/irq.c Wed Mar 8 14:10:51 2006
@@ -153,6 +153,5 @@ handle_irq(int irq, struct pt_regs * reg
irq_enter();
local_irq_disable();
__do_IRQ(irq, regs);
- local_irq_enable();
irq_exit();
}

Subject: Re: Problem on Alpha with "convert to generic irq framework"

[email protected] (Ivan Kokshaysky) writes:
> Well, the problem with the new interrupt code is that it does
> local_irq_enable() before return from interrupt.
> [...]

Thanks I'm going to give it a shot...

--
Mathieu Chouquet-Stringer

2006-03-08 17:57:07

by Richard Henderson

[permalink] [raw]
Subject: Re: Problem on Alpha with "convert to generic irq framework"

On Wed, Mar 08, 2006 at 02:28:57PM +0300, Ivan Kokshaysky wrote:
> irq_enter();
> local_irq_disable();
> __do_IRQ(irq, regs);
> - local_irq_enable();
> irq_exit();

This will need commenting if it's to go in.


r~

2006-03-08 21:51:05

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: Problem on Alpha with "convert to generic irq framework"

On Wed, Mar 08, 2006 at 09:56:52AM -0800, Richard Henderson wrote:
> This will need commenting if it's to go in.

Agreed. What about this?

Ivan.

--- 2.6.16-rc5/arch/alpha/kernel/irq.c Mon Mar 6 11:57:58 2006
+++ linux/arch/alpha/kernel/irq.c Thu Mar 9 00:38:53 2006
@@ -151,8 +151,13 @@ handle_irq(int irq, struct pt_regs * reg
}

irq_enter();
+ /*
+ * __do_IRQ() must be called with IPL_MAX. Note that we do not
+ * explicitly enable interrupts afterwards - some MILO PALcode
+ * (namely LX164 one) seems to have severe problems with RTI
+ * at IPL 0.
+ */
local_irq_disable();
__do_IRQ(irq, regs);
- local_irq_enable();
irq_exit();
}

Subject: Re: Problem on Alpha with "convert to generic irq framework"

[email protected] (Ivan Kokshaysky) writes:
> Well, the problem with the new interrupt code is that it does
> local_irq_enable() before return from interrupt.
> [...]

Your patch works wonder, thanks Ivan.

--
Mathieu Chouquet-Stringer