2002-07-16 21:28:48

by Filip Van Raemdonck

[permalink] [raw]
Subject: [PATCH] aha152x fix

Hi,

I upgraded from 2.4.19-pre7 to -rc1 and this resulted in my aha152x card not
working anymore. (The error was "trying software interrupt, lost")

Below is a patch which makes it work again. Note that this is just reverting
a minimal part of the last applied patch to aha152x.c; so this may only be
fixing the symptom and not the problem.

Can somebody confirm if this is correct or not, and give some more insight
into this behaviour?


Regards,

Filip

--- aha152x.c.orig Tue Jul 16 22:20:57 2002
+++ aha152x.c Tue Jul 16 21:43:51 2002
@@ -1366,11 +1366,13 @@
}
HOSTDATA(shpnt)->swint = 0;

printk(KERN_INFO "aha152x%d: trying software interrupt, ", HOSTNO);
SETPORT(DMACNTRL0, SWINT|INTEN);
+ spin_unlock_irq (&io_request_lock);
mdelay(1000);
+ spin_lock_irq (&io_request_lock);
free_irq(shpnt->irq, shpnt);

if (!HOSTDATA(shpnt)->swint) {
if (TESTHI(DMASTAT, INTSTAT)) {
printk("lost.\n");

--
"Microsoft shouldn't be broken up. It should be shut down."
-- Phil Agre on the ILOVEYOU virus.


2002-07-16 21:47:28

by Alan

[permalink] [raw]
Subject: Re: [PATCH] aha152x fix

On Tue, 2002-07-16 at 22:10, Filip Van Raemdonck wrote:
> Hi,
>
> I upgraded from 2.4.19-pre7 to -rc1 and this resulted in my aha152x card not
> working anymore. (The error was "trying software interrupt, lost")
>
> Below is a patch which makes it work again. Note that this is just reverting
> a minimal part of the last applied patch to aha152x.c; so this may only be
> fixing the symptom and not the problem.
>
> Can somebody confirm if this is correct or not, and give some more insight
> into this behaviour?

I've seen reports but not figured out what is going on yet. Are you
using an AHA152x or the PCMCIA version ?

2002-07-17 07:06:19

by Filip Van Raemdonck

[permalink] [raw]
Subject: Re: [PATCH] aha152x fix

Hello,

On Wed, Jul 17, 2002 at 12:00:30AM +0100, Alan Cox wrote:
> On Tue, 2002-07-16 at 22:10, Filip Van Raemdonck wrote:
> >
> > I upgraded from 2.4.19-pre7 to -rc1 and this resulted in my aha152x card not
> > working anymore. (The error was "trying software interrupt, lost")
> >
> > Below is a patch which makes it work again. Note that this is just reverting
> > a minimal part of the last applied patch to aha152x.c; so this may only be
> > fixing the symptom and not the problem.
>
> I've seen reports but not figured out what is going on yet. Are you
> using an AHA152x or the PCMCIA version ?

An ISA - non PNP - aha1515 if I'm not mistaken (I can't check the exact type
right now but I'm quite sure I'm right). Definitely not PCMCIA.


Regards,

Filip

--
<liiwi> http://www.benefon.fi is running Microsoft-IIS/4.0 on Solaris
<netgod> neat trick
<liiwi> hmms. how come I think that netcraft is on crack

2002-07-17 07:56:58

by Martin Diehl

[permalink] [raw]
Subject: Re: [PATCH] aha152x fix

On 17 Jul 2002, Alan Cox wrote:

> On Tue, 2002-07-16 at 22:10, Filip Van Raemdonck wrote:
> > Hi,
> >
> > I upgraded from 2.4.19-pre7 to -rc1 and this resulted in my aha152x card not
> > working anymore. (The error was "trying software interrupt, lost")
> >
> > Below is a patch which makes it work again. Note that this is just reverting
> > a minimal part of the last applied patch to aha152x.c; so this may only be
> > fixing the symptom and not the problem.
> >
> > Can somebody confirm if this is correct or not, and give some more insight
> > into this behaviour?
>
> I've seen reports but not figured out what is going on yet. Are you

AFAICT, the patch which went into 2.4.19-pre10 (looks like a 2.5 backport)
removed the release/re-acquire of the io_request_lock with interrupts off
around a mdelay(1000) while the scsi_host->detect method probes for
interrupts. The identical code (i.e. with the unlock/lock removed) works
with 2.5.

Apparently the io_request_lock policy was changed in 2.5 and
aha152x_detect() is called without io_request_lock taken - in contrast to
2.4. However, the aha152x_detect strategy depends on some status change
driven by interrupt completion which doesn't happen when the lock is still
acquired by the caller - hence the interrupt appears to be lost.

Well, I'm definetely not the one to judge what's really correct here, but
my impression is, if the detect() is called with io_request_lock held and
interrupts off it wouldn't be allowed to release it down there. OTOH it
was there before and the driver used to work this way - in contrast to the
2.4.19-pre10 and later 2.4 which isn't working at all.

> using an AHA152x or the PCMCIA version ?

I can confirm Filip's patch putting back in place the old unlock/lock
makes things working again. Tested with an AVA1505AE (ISA, configured to
non-pnp fixed irq/io) on UP box running SMP kernel.

Therefore if you ask me, my vote would be to put this back in for
2.4.19-final until we have a better solution.

Martin

2002-07-17 21:32:08

by Filip Van Raemdonck

[permalink] [raw]
Subject: Re: [PATCH] aha152x fix

On Wed, Jul 17, 2002 at 10:02:39AM +0200, Martin Diehl wrote:
>
> > On Tue, 2002-07-16 at 22:10, Filip Van Raemdonck wrote:
> > >
> > >
> > > Below is a patch which makes it work again. Note that this is just reverting
> > > a minimal part of the last applied patch to aha152x.c; so this may only be
> > > fixing the symptom and not the problem.
>
> I can confirm Filip's patch putting back in place the old unlock/lock
> makes things working again. Tested with an AVA1505AE (ISA, configured to
> non-pnp fixed irq/io) on UP box running SMP kernel.

Actually, I'm not sure if it works. I hadn't tried anything more than
loading the card module yesterday since it was getting late, but I am now
getting oopses and almost immediate hard hangs whenever I try to access
the hard drive on that card.
Or rather, whenever I load a driver that hooks into that device. I've had
it when I mounted sda1 (thus loading sd_mod), after rebooting when I tried
to use sg0 (equivalent to sda, also then loading sg module), and when I got
suspicious even when I only just modprobed sd_mod, not doing anything on
the drive yet.

Actually, hold on...
I just rmmodded the aha152x module and then modprobe sd_mod and all was/is
fine now, except that obviously I can't get to my harddrive now - sd_mod is
just an unused driver.

I'm pasting an oops below.

I'm not really certain if this problem is related to the other one, maybe
this is caused by some unrelated change which went into the last aha152x
patch. But that's hard to figure out, if I need my two line fix before I
can load the driver :-<


Regards,

Filip


Jul 17 23:12:10 lucretia kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Jul 17 23:12:10 lucretia kernel: SCSI device sda: 2154176 512-byte hdwr sectors (1103 MB)
Jul 17 23:12:10 lucretia kernel: sda:<1>Unable to handle kernel NULL pointer dereference at virtual address 0000001b
Jul 17 23:12:10 lucretia kernel: printing eip:
Jul 17 23:12:10 lucretia kernel: c88f01b7
Jul 17 23:12:10 lucretia kernel: *pde = 00000000
Jul 17 23:12:10 lucretia kernel: Oops: 0000
Jul 17 23:12:10 lucretia kernel: CPU: 0
Jul 17 23:12:10 lucretia kernel: EIP: 0010:[rtc:__insmod_rtc_O/lib/modules/2.4.19-rc1/kernel/drivers/char/r+-73289/96] Not tainted
Jul 17 23:12:10 lucretia kernel: EFLAGS: 00010002
Jul 17 23:12:10 lucretia kernel: eax: 00000000 ebx: c0cdb600 ecx: c0366000 edx: c2fb8dfc
Jul 17 23:12:10 lucretia kernel: esi: c3aab000 edi: c0cdb600 ebp: c0851c84 esp: c0851c78
Jul 17 23:12:10 lucretia kernel: ds: 0018 es: 0018 ss: 0018
Jul 17 23:12:10 lucretia kernel: Process modprobe (pid: 457, stackpage=c0851000)
Jul 17 23:12:10 lucretia kernel: Stack: 00000293 c380f3f8 c0cdb600 c0851ca0 c88f02c2 c0cdb600 00000000 00000000
Jul 17 23:12:10 lucretia kernel: 00000000 c88e1a94 c0851cc4 c88e154c c0cdb600 c88e1a94 c0cdb600 c380f3f8
Jul 17 23:12:10 lucretia kernel: c0cdb70c 00000000 c3aab000 c0851cf4 c88e7bed c0cdb600 c0cdb600 00000296
Jul 17 23:12:10 lucretia kernel: Call Trace: [rtc:__insmod_rtc_O/lib/modules/2.4.19-rc1/kernel/drivers/char/r+-73022/96] [rtc:__insmod_rtc_O/lib/modules/2.4.19-rc1/kernel/drivers/char/r+-132460/96] [rtc:__insmod_rtc_O/lib/modules/2.4.19-rc1/kernel/drivers/char/r+-133812/96] [rtc:__insmod_rtc_O/lib/modules/2.4.19-rc1/kernel/drivers/char/r+-132460/96] [rtc:__insmod_rtc_O/lib/modules/2.4.19-rc1/kernel/drivers/char/r+-107539/96]
Jul 17 23:12:10 lucretia kernel: [scsi_mod:scsi_hosts_Rfba6a71c+131632/65472080] [generic_unplug_device+34/52] [__run_task_queue+75/92] [block_sync_page+25/32] [__lock_page+145/192] [lock_page+23/28]
Jul 17 23:12:10 lucretia kernel: [read_cache_page+198/288] [read_dev_sector+49/172] [blkdev_readpage+0/24] [handle_ide_mess+41/392] [msdos_partition+126/732] [get_empty_inode+137/156]
Jul 17 23:12:10 lucretia kernel: [check_partition+265/388] [grok_partitions+193/260] [scsi_mod:scsi_hosts_Rfba6a71c+131632/65472080] [register_disk+37/44] [scsi_mod:scsi_hosts_Rfba6a71c+128360/65475352] [scsi_mod:scsi_hosts_Rfba6a71c+131704/65472008]
Jul 17 23:12:10 lucretia kernel: [scsi_mod:scsi_hosts_Rfba6a71c+131632/65472080] [rtc:__insmod_rtc_O/lib/modules/2.4.19-rc1/kernel/drivers/char/r+-127803/96] [scsi_mod:scsi_hosts_Rfba6a71c+131632/65472080] [rtc:__insmod_rtc_O/lib/modules/2.4.19-rc1/kernel/drivers/char/r+-127540/96] [scsi_mod:scsi_hosts_Rfba6a71c+131632/65472080] [scsi_mod:scsi_hosts_Rfba6a71c+129673/65474039]
Jul 17 23:12:10 lucretia kernel: [scsi_mod:scsi_hosts_Rfba6a71c+131632/65472080] [sys_init_module+1291/1444] [scsi_mod:scsi_hosts_Rfba6a71c+121840/65481872] [system_call+51/56]
Jul 17 23:12:10 lucretia kernel:
Jul 17 23:12:10 lucretia kernel: Code: 0f b6 50 1b 8b 14 95 bc 04 32 c0 2b 82 a0 00 00 00 69 c0 a3


--
Please don't send proprietary format documents, I can't (and don't want to)
open them. Appreciated are open-source formats like .txt or .rtf. Dvi, ps or
tex files are welcome.

2002-07-18 07:38:51

by Filip Van Raemdonck

[permalink] [raw]
Subject: Re: [PATCH] aha152x fix

On Thu, Jul 18, 2002 at 07:52:04AM +0200, Juergen E. Fischer wrote:
> [I seem to have had a problem with my provider's smart host. It didn't
> deliver to everyone. Hopefully that's resolved now. Sorry
> I you already seen this whole or in part.]

I have received your previous message, but only saw it after I sent mine.
(I read lkml from a different address than the one I send from, and only
check the mail received at the latter once a day)
And it appears your mail doesn't reach lkml anyway. ECN perhaps?

> On Wed, Jul 17, 2002 at 23:35:43 +0200, Filip Van Raemdonck wrote:
> > Actually, hold on...
> > I just rmmodded the aha152x module and then modprobe sd_mod and all was/is
> > fine now, except that obviously I can't get to my harddrive now - sd_mod is
> > just an unused driver.
> >
> > I'm pasting an oops below.
>
> Please run it through ksymoops (Documentation/oops-tracing.txt).

Can that be done with the ones already modified by klogd? Note that when I
talked about almost immediate hard crashes, I really meant that. Within a
second (or two or three at max) a number of new oopses are generated, ending
in a final one with the "killing interrupt handler" message. And these don't
even show up in the logs because the machine doesn't sync anymore at that
point.
I also don't have a serial cable handy.
But I'll see what I can do.

> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,5,0)
> + spin_unlock_irq(&io_request_lock);
> +#endif

Note that this is not really the issue (anymore). I did this already, and
basically just left out the #if. But now I'm getting oopses when I actually
try to use a drive attached to it - which is probably caused by another part
of the last driver change than this detect issue.


Regards,

Filip

--
<broonie> Why do all the idiots on debian-user insist on trying sendmail.
<Myth> because sendmail is "industry standard"
<broonie> Mind you, I suppose the industry standard is to be a fscking moron.
<Thing> broonie: tell them to fuck off and use M$ Exchange -- that's that
market leader, surely?

2002-07-18 12:29:07

by Martin Diehl

[permalink] [raw]
Subject: Re: [PATCH] aha152x fix

On Thu, 18 Jul 2002, Juergen E. Fischer wrote:

> I tested the driver using a cdrom and can smoothly load and remove the
> modules here.

Yes, this was my impression with your/Filip's patch too. I've a scanner
connected there and on insmod the bus scan succeeded and the scanner
was detected - fine. Unfortunately, I hadn't tried to go any further :-(

However, when I try to scan some image using xscanimage the scanner starts
to operate for a few seconds (probably initialisation and move to starting
position) and then the box locks up solid (no reaction to SysRq, power
cycle required for both the box and the scanner to recover). With 2.5.25
this does not happen, i.e. I do get real scans there without problem.

Due to NULL pointer dereferencing I'm getting an Oops - see below. This
was taken from the console with the aha152x running with debug enabled.
And yes, this should be completely unrelated to the detection problem.

As we are close to -final, what about reverting -pre10 (the 2.5 backport)?

Martin

-----------------------------------

aha152x: BIOS test: passed, detected 1 controller(s)
aha152x0: vital data: rev=3, io=0x140 (0x140/0x140), irq=10, scsiid=0,
reconnect=disabled, parity=disabled, synchronous=disabled, delay=0,
extended translation=disabled
aha152x0: trying software interrupt, ok.
scsi0 : Adaptec 152x SCSI driver; $Revision: 2.5 $
Vendor: Model: scanner V636A4 Rev: 1.10
Type: Scanner ANSI SCSI revision: 02
Attached scsi generic sg0 at scsi0, channel 0, id 6, lun 0, type 6

>>> start of scan (debug=0x3883)

(scsi0:6:0) queue: cmd_len=10 pieces=0 size=9 cmnd=Read (10) 00 81 00 00 00 00 00 09 00
(scsi0:6:0) inbound status 00 Good
(scsi0:6:0) calling scsi_done(cb7f5000)
(scsi0:6:0) scsi_done(cb7f5000) returned
(scsi0:6:0) queue: cmd_len=10 pieces=0 size=9 cmnd=Read (10) 00 81 00 00 00 00 00 09 00
(scsi0:6:0) inbound status 00 Good
(scsi0:6:0) calling scsi_done(cb7f5000)
(scsi0:6:0) scsi_done(cb7f5000) returned
(scsi0:6:0) queue: cmd_len=10 pieces=0 size=9 cmnd=Write (10) 00 81 00 00 00 00 00 09 00
(scsi0:6:0) inbound status 00 Good
(scsi0:6:0) calling scsi_done(cb7f5000)
(scsi0:6:0) scsi_done(cb7f5000) returned
(scsi0:6:0) queue: cmd_len=10 pieces=0 size=24576 cmnd=Write (10) 00 03 00 00 61 00 60 00 00
(scsi0:6:0) inbound status 00 Good
(scsi0:6:0) calling scsi_done(cb7f5000)
(scsi0:6:0) scsi_done(cb7f5000) returned
(scsi0:6:0) queue: cmd_len=10 pieces=0 size=69 cmnd=Define window parameters 00 00 00 00 00 00 00 45 00
(scsi0:6:0) inbound status 00 Good
(scsi0:6:0) calling scsi_done(cb7f5000)
(scsi0:6:0) scsi_done(cb7f5000) returned
(scsi0:6:0) queue: cmd_len=10 pieces=0 size=16 cmnd=Read (10) 00 80 00 00 00 00 00 10 00
(scsi0:6:0) inbound status 00 Good
(scsi0:6:0) calling scsi_done(cb7f5000)
(scsi0:6:0) scsi_done(cb7f5000) returned
(scsi0:6:0) queue: cmd_len=10 pieces=0 size=0 cmnd=Read (10) 00 83 00 00 00 00 00 01 00
(scsi0:6:0) inbound status 08 Busy
(scsi0:6:0) calling scsi_done(cb7f5000)
(scsi0:6:0) scsi_done(cb7f5000) returned
(scsi0:6:0) queue: cmd_len=10 pieces=2 size=49632 cmnd=Read (10) 00 00 00 00 00 00 c1 e0 00

>>> complete lockup due to:

Unable to handle kernel NULL pointer dereference at virtual address 0000001b
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<cd0ae4b7>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010002
eax: 00000000 ebx: c44b9400 ecx: cba06000 edx: c2a501a4
esi: cb7f5000 edi: 00000020 ebp: ca8bfda4 esp: ca8bfd98
ds: 0018 es: 0018 ss: 0018
Process xscanimage (pid: 1942, stackpage=ca8bf000)
Stack: 00000293 cbba1c10 cb7f5000 ca8bfdc0 cd0ae854 cb7f5000 00000000 00000000
00000000 cd082240 ca8bfde4 cd08185c cb7f5000 cd082240 00000000 c44b9400
cb7f5000 cbba1c10 cb7f50b8 ca8bfe14 cd08ad0c cb7f5000 cb7f5000 cbba1c10
Call Trace: [<cd0ae854>] [<cd082240>] [<cd08185c>] [<cd082240>] [<cd08ad0c>]
[<cd089f1b>] [<cd089faa>] [<cd081e0e>] [<cd09d510>] [<cd09c0ce>] [<cd09d510>]
[<cd09be55>] [<cd09ba68>] [<c0142a09>] [<c0109393>]
Code: 0f b6 50 1b 8b 14 95 a4 f1 2f c0 2b 82 a4 00 00 00 69 c0 a3
Error (Oops_bfd_perror): set_section_contents Section has no contents

>>EIP; cd0ae4b7 <[aha152x]aha152x_internal_queue+147/4d0> <=====
Trace; cd0ae854 <[aha152x]aha152x_queue+14/20>
Trace; cd082240 <[scsi_mod]scsi_done+0/120>
Trace; cd08185c <[scsi_mod]scsi_dispatch_cmd+1cc/560>
Trace; cd082240 <[scsi_mod]scsi_done+0/120>
Trace; cd08ad0c <[scsi_mod]scsi_request_fn+3ac/420>
Trace; cd089f1b <[scsi_mod]__scsi_insert_special+9b/e0>
Trace; cd089faa <[scsi_mod]scsi_insert_special_req+1a/30>
Trace; cd081e0e <[scsi_mod]scsi_do_req+16e/1b0>
Trace; cd09d510 <[sg]sg_cmd_done_bh+0/3d0>
Trace; cd09c0ce <[sg]sg_common_write+24e/270>
Trace; cd09d510 <[sg]sg_cmd_done_bh+0/3d0>
Trace; cd09be55 <[sg]sg_new_write+215/240>
Trace; cd09ba68 <[sg]sg_write+f8/2d0>
Trace; c0142a09 <sys_write+99/190>
Trace; c0109393 <system_call+33/40>

2002-07-19 17:03:27

by Martin Diehl

[permalink] [raw]
Subject: Re: [PATCH] aha152x fix

On Fri, 19 Jul 2002, Juergen E. Fischer wrote:

> > Due to NULL pointer dereferencing I'm getting an Oops - see below. This
> > was taken from the console with the aha152x running with debug enabled.
> > And yes, this should be completely unrelated to the detection problem.
>
> Following patch against 2.4.19-rc2 fixes that problem. The patch also
> includes a fix for a timeout problem with a tape drive (reported Julian
> Bradfield).

Works for me now. Device detection ok. Did some test scanning without any
problem. Since the Oops was 100% reproducible I'd say it's really fixed
now - Thanks.

Hopefully we can still get this into -final or the aha152x would be
unusable with 2.4.19 (that's why I still haven't shortened the CC-list)

Martin