2006-11-03 18:03:01

by Alberto Alonso

[permalink] [raw]
Subject: qstor driver -> irq 193: nobody cared

I have a Pacific Digital qstor card on irq 193. I am using kernel
2.6.17.13 SMP

The error happens every now and then. I have not been able to
figure out any triggers and I can not reproduce it on demand. Today
it happened 3 times within a 40 minutes period.

All disks connected to the card are disabled and I can't do anything
other than a reboot to get them back.

It is reported as follows:

irq 193: nobody cared (try booting with the "irqpoll" option)
<c013e19a> __report_bad_irq+0x2a/0xa0 <c013d970> handle_IRQ_event
+0x30/0x70
<c013e2b0> note_interrupt+0x80/0xf0 <c013da8c> __do_IRQ+0xdc/0xf0
<c0105799> do_IRQ+0x19/0x30 <c010391a> common_interrupt+0x1a/0x20
<c0100d91> default_idle+0x41/0x70 <c0100e60> cpu_idle+0x80/0x90
<c046699d> start_kernel+0x18d/0x1d0 <c0466330> unknown_bootoption
+0x0/0x1d0
handlers:
[<c0301300>] (qs_intr+0x0/0x220)
Disabling IRQ #193

The PCI info for the card is:

0000:01:03.0 0106: Pacific Digital Corp: Unknown device 2068 (rev 01)
Subsystem: Pacific Digital Corp: Unknown device 2068
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 193
I/O ports at eff0 [size=8]
I/O ports at efe0 [size=8]
I/O ports at efa8 [size=8]
I/O ports at efa0 [size=8]
Memory at febf0000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at febe0000 [disabled] [size=64K]
Capabilities: [40] Message Signalled Interrupts: 64bit+
Queue=0/0 Enable-
Capabilities: [50] Power Management version 2
Capabilities: [58] PCI-X non-bridge device.


This is really creating problems for me now. If anybody knows
of a dumb (ie. non-raid) PCI SATA card with 8 ports with good
Linux support for a couple hundred backs I would appreciate
it.

If there is any other info that I should provide to help
troubleshoot please let me know.

Thanks,

Alberto


--
Alberto Alonso Global Gate Systems LLC.
(512) 351-7233 http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups


2006-11-03 19:00:43

by Sergey Vlasov

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

On Fri, 03 Nov 2006 12:02:53 -0600 Alberto Alonso wrote:

> I have a Pacific Digital qstor card on irq 193. I am using kernel
> 2.6.17.13 SMP
>
> The error happens every now and then. I have not been able to
> figure out any triggers and I can not reproduce it on demand. Today
> it happened 3 times within a 40 minutes period.
>
> All disks connected to the card are disabled and I can't do anything
> other than a reboot to get them back.
>
> It is reported as follows:
>
> irq 193: nobody cared (try booting with the "irqpoll" option)

Did you try this option? It may decrease performance, but in some cases
IRQ routing is so screwed that only irqpoll helps.

[...]
> handlers:
> [<c0301300>] (qs_intr+0x0/0x220)
> Disabling IRQ #193
[..]
> If there is any other info that I should provide to help
> troubleshoot please let me know.

The "nobody cared" error is often caused by some other device which
shares the same interrupt, but Linux does not know about it (either due
to broken IRQ routing tables in BIOS, or because the driver for that
device is not loaded, but the device really is active and asserts its
IRQ line - sometimes this also happens due to a broken BIOS).

Please post complete /proc/interrupts and lspci -v output, and also
information about the motherboard model and BIOS version.

If your motherboard has a VIA chipset, you may also try the patch from
http://lkml.org/lkml/2006/9/7/235 which attempts to fix IRQ routing on
these chipsets.


Attachments:
(No filename) (1.44 kB)
(No filename) (189.00 B)
Download all attachments

2006-11-03 20:22:12

by Alberto Alonso

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

On Fri, 2006-11-03 at 22:00 +0300, Sergey Vlasov wrote:
> On Fri, 03 Nov 2006 12:02:53 -0600 Alberto Alonso wrote:
>
> > I have a Pacific Digital qstor card on irq 193. I am using kernel
> > 2.6.17.13 SMP
> >
> >
> > irq 193: nobody cared (try booting with the "irqpoll" option)
>
> Did you try this option? It may decrease performance, but in some cases
> IRQ routing is so screwed that only irqpoll helps.

I have now switched to using that option. Will kick in after I reboot.

> [...]
> > handlers:
> > [<c0301300>] (qs_intr+0x0/0x220)
> > Disabling IRQ #193
> [..]
> > If there is any other info that I should provide to help
> > troubleshoot please let me know.
>
> The "nobody cared" error is often caused by some other device which
> shares the same interrupt, but Linux does not know about it (either due
> to broken IRQ routing tables in BIOS, or because the driver for that
> device is not loaded, but the device really is active and asserts its
> IRQ line - sometimes this also happens due to a broken BIOS).
>
> Please post complete /proc/interrupts and lspci -v output, and also
> information about the motherboard model and BIOS version.
>

The system is an old Gateway 6400 server w/ dual P3 1 GHz PCUs. It is
my NFS/samba file server.

Here is the rest of the requested info:

w100:/usr/src/linux-2.6.17.13# cat /proc/interrupts
CPU0 CPU1
0: 1398199 1412518 IO-APIC-edge timer
1: 1 8 IO-APIC-edge i8042
2: 0 0 XT-PIC cascade
8: 3 1 IO-APIC-edge rtc
14: 8163 8710 IO-APIC-edge ide0
15: 5 7 IO-APIC-edge ide1
145: 0 0 IO-APIC-level ivtv0
153: 94715 121999 IO-APIC-level ide2, ide3, ide4, ide5
161: 6909 18301 IO-APIC-level ide8, ide9
169: 8323 17288 IO-APIC-level ide6, ide7
185: 1216892 92 IO-APIC-level eth0
193: 2281 1954 IO-APIC-level libata
NMI: 0 0
LOC: 2810842 2810841
ERR: 0
MIS: 0
w100:/usr/src/linux-2.6.17.13# lspci -v
0000:00:00.0 Host bridge: ServerWorks CNB20LE Host Bridge (rev 06)
Flags: bus master, medium devsel, latency 32

0000:00:00.1 Host bridge: ServerWorks CNB20LE Host Bridge (rev 06)
Flags: bus master, medium devsel, latency 16

0000:00:02.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08)
Subsystem: Intel Corp. 82559 Fast Ethernet LAN on Motherboard
Flags: bus master, medium devsel, latency 64, IRQ 177
Memory at feaff000 (32-bit, non-prefetchable) [size=4K]
I/O ports at b000 [size=64]
Memory at fe800000 (32-bit, non-prefetchable) [size=1M]
Expansion ROM at fe700000 [disabled] [size=1M]
Capabilities: [dc] Power Management version 2

0000:00:03.0 Multimedia video controller: Internext Compression Inc iTVC16 (CX23416) MPEG-2 Encoder (rev 01)
Subsystem: Hauppauge computer works Inc.: Unknown device 0001
Flags: bus master, medium devsel, latency 64, IRQ 145
Memory at f8000000 (32-bit, prefetchable) [size=64M]
Capabilities: [44] Power Management version 2

0000:00:04.0 RAID bus controller: Triones Technologies, Inc. HPT374 (rev 07)
Subsystem: Triones Technologies, Inc.: Unknown device 0001
Flags: bus master, 66MHz, medium devsel, latency 120, IRQ 153
I/O ports at bf60 [size=8]
I/O ports at bf7c [size=4]
I/O ports at bf58 [size=8]
I/O ports at bf54 [size=4]
I/O ports at b400 [size=256]
Expansion ROM at fe960000 [disabled] [size=128K]
Capabilities: [60] Power Management version 2

0000:00:04.1 RAID bus controller: Triones Technologies, Inc. HPT374 (rev 07)
Subsystem: Triones Technologies, Inc.: Unknown device 0001
Flags: bus master, 66MHz, medium devsel, latency 120, IRQ 153
I/O ports at bf88 [size=8]
I/O ports at bf84 [size=4]
I/O ports at bf68 [size=8]
I/O ports at bf80 [size=4]
I/O ports at b800 [size=256]
Capabilities: [60] Power Management version 2

0000:00:05.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366/368/370/370A/372 (rev 03)
Subsystem: Triones Technologies, Inc. HPT370 UDMA100
Flags: bus master, medium devsel, latency 120, IRQ 161
I/O ports at bfa0 [size=8]
I/O ports at bf9c [size=4]
I/O ports at bf90 [size=8]
I/O ports at bf98 [size=4]
I/O ports at c000 [size=256]
Expansion ROM at fe980000 [disabled] [size=128K]

0000:00:06.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366/368/370/370A/372 (rev 03)
Subsystem: Triones Technologies, Inc. HPT370 UDMA100
Flags: bus master, medium devsel, latency 120, IRQ 169
I/O ports at bff0 [size=8]
I/O ports at bfe4 [size=4]
I/O ports at bfa8 [size=8]
I/O ports at bfe0 [size=4]
I/O ports at c400 [size=256]
Expansion ROM at fe9a0000 [disabled] [size=128K]

0000:00:07.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA])
Subsystem: Gateway 2000: Unknown device 6400
Flags: bus master, stepping, medium devsel, latency 64
Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
I/O ports at c800 [size=256]
Memory at fe9ff000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at fe9c0000 [disabled] [size=128K]
Capabilities: [5c] Power Management version 2

0000:00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 50)
Subsystem: ServerWorks OSB4 South Bridge
Flags: bus master, medium devsel, latency 0

0000:00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller (prog-if 8a [Master SecP PriP])
Flags: bus master, medium devsel, latency 64
I/O ports at ffa0 [size=16]

0000:00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 04) (prog-if 10 [OHCI])
Subsystem: ServerWorks OSB4/CSB5 OHCI USB Controller
Flags: bus master, medium devsel, latency 64, IRQ 10
Memory at fe9fe000 (32-bit, non-prefetchable) [size=4K]

0000:01:02.0 Ethernet controller: Intel Corp. 82543GC Gigabit Ethernet Controller (Copper) (rev 02)
Subsystem: Intel Corp. PRO/1000 T Server Adapter
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 185
Memory at febc0000 (32-bit, non-prefetchable) [size=128K]
Memory at febb0000 (32-bit, non-prefetchable) [size=64K]
Expansion ROM at feba0000 [disabled] [size=64K]
Capabilities: [dc] Power Management version 2

0000:01:03.0 0106: Pacific Digital Corp: Unknown device 2068 (rev 01)
Subsystem: Pacific Digital Corp: Unknown device 2068
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 193
I/O ports at eff0 [size=8]
I/O ports at efe0 [size=8]
I/O ports at efa8 [size=8]
I/O ports at efa0 [size=8]
Memory at febf0000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at febe0000 [disabled] [size=64K]
Capabilities: [40] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Capabilities: [50] Power Management version 2
Capabilities: [58] PCI-X non-bridge device.

0000:01:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 Ultra3 SCSI Adapter (rev 01)
Subsystem: Gateway 2000: Unknown device 1040
Flags: medium devsel
I/O ports at 1000 [disabled] [size=256]
Memory at <ignored> (64-bit, non-prefetchable) [disabled]
Memory at <ignored> (64-bit, non-prefetchable) [disabled]
Capabilities: [40] Power Management version 2

0000:01:05.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 Ultra3 SCSI Adapter (rev 01)
Subsystem: Gateway 2000: Unknown device 1040
Flags: medium devsel
I/O ports at 1400 [disabled] [size=256]
Memory at <ignored> (64-bit, non-prefetchable) [disabled]
Memory at <ignored> (64-bit, non-prefetchable) [disabled]
Capabilities: [40] Power Management version 2


Thanks,

Alberto

--
Alberto Alonso Global Gate Systems LLC.
(512) 351-7233 http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups

2006-11-03 20:52:44

by Sergey Vlasov

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

On Fri, Nov 03, 2006 at 02:22:07PM -0600, Alberto Alonso wrote:
> On Fri, 2006-11-03 at 22:00 +0300, Sergey Vlasov wrote:
> > On Fri, 03 Nov 2006 12:02:53 -0600 Alberto Alonso wrote:
> >
> > > I have a Pacific Digital qstor card on irq 193. I am using kernel
> > > 2.6.17.13 SMP
> > >
> > >
> > > irq 193: nobody cared (try booting with the "irqpoll" option)
> >
> > Did you try this option? It may decrease performance, but in some cases
> > IRQ routing is so screwed that only irqpoll helps.
>
> I have now switched to using that option. Will kick in after I reboot.

Actually, it might be better to wait with this - it is the last resort
option, if there is nothing else to try.

> 0000:00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 04) (prog-if 10 [OHCI])
> Subsystem: ServerWorks OSB4/CSB5 OHCI USB Controller
> Flags: bus master, medium devsel, latency 64, IRQ 10
> Memory at fe9fe000 (32-bit, non-prefetchable) [size=4K]

Can you try with ohci-hcd loaded (and without irqpoll)? USB
controllers might cause such problems, because BIOSen often do weird
things with them for legacy emulation; however, loading a proper
driver should make BIOS stop messing with the hardware behind the back
of the OS.

After loading ohci-hcd look which IRQ does the USB controller use - if
it is the same IRQ as the qstor card, you should keep ohci-hcd loaded
to avoid problems. (If it is on some other IRQ, loading ohci-hcd
might still help - the code for enabling the PCI device calls ACPI
BIOS, which might change interrupt routing setup in the chipset.)

Also look for something like "USB Keyboard Support" and maybe "USB
Mouse Support" in BIOS setup - some BIOS versions have severe problems
with this emulation, so it may be better to disable it unless you
really need it.

Disabling USB in the BIOS setup if you really don't need might be
another option.

> 0000:01:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 Ultra3 SCSI Adapter (rev 01)
> Subsystem: Gateway 2000: Unknown device 1040
> Flags: medium devsel
> I/O ports at 1000 [disabled] [size=256]
> Memory at <ignored> (64-bit, non-prefetchable) [disabled]
> Memory at <ignored> (64-bit, non-prefetchable) [disabled]
> Capabilities: [40] Power Management version 2
>
> 0000:01:05.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 Ultra3 SCSI Adapter (rev 01)
> Subsystem: Gateway 2000: Unknown device 1040
> Flags: medium devsel
> I/O ports at 1400 [disabled] [size=256]
> Memory at <ignored> (64-bit, non-prefetchable) [disabled]
> Memory at <ignored> (64-bit, non-prefetchable) [disabled]
> Capabilities: [40] Power Management version 2

These devices seem to be disabled, but if loading ohci-hcd does not
help, you can try to load the sym53c8xx driver and look whether it
ends up on the same IRQ as qstor.


Attachments:
(No filename) (2.87 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2006-11-03 23:22:01

by Alberto Alonso

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

On Fri, 2006-11-03 at 23:52 +0300, Sergey Vlasov wrote:
> On Fri, Nov 03, 2006 at 02:22:07PM -0600, Alberto Alonso wrote:
> > On Fri, 2006-11-03 at 22:00 +0300, Sergey Vlasov wrote:
> > > On Fri, 03 Nov 2006 12:02:53 -0600 Alberto Alonso wrote:
> > >
> > > > I have a Pacific Digital qstor card on irq 193. I am using kernel
> > > > 2.6.17.13 SMP
> > > >
> > > >
> > > > irq 193: nobody cared (try booting with the "irqpoll" option)
> > >
> > > Did you try this option? It may decrease performance, but in some cases
> > > IRQ routing is so screwed that only irqpoll helps.
> >
> > I have now switched to using that option. Will kick in after I reboot.
>
> Actually, it might be better to wait with this - it is the last resort
> option, if there is nothing else to try.


I'll wait then. This is a file server with 6 active HD controllers and
14 drives. For what I can figure out reading LKML irqpoll would really
have a bad effect.

>
> > 0000:00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 04) (prog-if 10 [OHCI])
> > Subsystem: ServerWorks OSB4/CSB5 OHCI USB Controller
> > Flags: bus master, medium devsel, latency 64, IRQ 10
> > Memory at fe9fe000 (32-bit, non-prefetchable) [size=4K]
>
> Can you try with ohci-hcd loaded (and without irqpoll)? USB
> controllers might cause such problems, because BIOSen often do weird
> things with them for legacy emulation; however, loading a proper
> driver should make BIOS stop messing with the hardware behind the back
> of the OS.
>
> After loading ohci-hcd look which IRQ does the USB controller use - if
> it is the same IRQ as the qstor card, you should keep ohci-hcd loaded
> to avoid problems. (If it is on some other IRQ, loading ohci-hcd
> might still help - the code for enabling the PCI device calls ACPI
> BIOS, which might change interrupt routing setup in the chipset.)

I have now compiled and enabled the ohci-hcd module. It ended up
in IRQ10

>
> Also look for something like "USB Keyboard Support" and maybe "USB
> Mouse Support" in BIOS setup - some BIOS versions have severe problems
> with this emulation, so it may be better to disable it unless you
> really need it.
>
> Disabling USB in the BIOS setup if you really don't need might be
> another option.
>
> > 0000:01:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 Ultra3 SCSI Adapter (rev 01)
> > Subsystem: Gateway 2000: Unknown device 1040
> > Flags: medium devsel
> > I/O ports at 1000 [disabled] [size=256]
> > Memory at <ignored> (64-bit, non-prefetchable) [disabled]
> > Memory at <ignored> (64-bit, non-prefetchable) [disabled]
> > Capabilities: [40] Power Management version 2
> >
> > 0000:01:05.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 Ultra3 SCSI Adapter (rev 01)
> > Subsystem: Gateway 2000: Unknown device 1040
> > Flags: medium devsel
> > I/O ports at 1400 [disabled] [size=256]
> > Memory at <ignored> (64-bit, non-prefetchable) [disabled]
> > Memory at <ignored> (64-bit, non-prefetchable) [disabled]
> > Capabilities: [40] Power Management version 2
>
> These devices seem to be disabled, but if loading ohci-hcd does not
> help, you can try to load the sym53c8xx driver and look whether it
> ends up on the same IRQ as qstor.


I will go in tonight and disable all USB and symbios SCSI controllers
on the MB via the BIOS. I don't use SCSI disks on that server anymore,
and I can get by without the USB ports.

The really bad thing is that I can't reproduce it on purpose. Some times
I've gone for 6 weeks without any problems. This morning got really bad,
but it hasn't happened again since.

Thanks for all your help,

Alberto


--
Alberto Alonso Global Gate Systems LLC.
(512) 351-7233 http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups

2006-11-04 18:39:45

by Mark Lord

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

Alberto Alonso wrote:
> I have a Pacific Digital qstor card on irq 193. I am using kernel
> 2.6.17.13 SMP
>
> The error happens every now and then. I have not been able to
> figure out any triggers and I can not reproduce it on demand. Today
> it happened 3 times within a 40 minutes period.
>
> All disks connected to the card are disabled and I can't do anything
> other than a reboot to get them back.
>
> It is reported as follows:
>
> irq 193: nobody cared (try booting with the "irqpoll" option)
> <c013e19a> __report_bad_irq+0x2a/0xa0 <c013d970> handle_IRQ_event
> +0x30/0x70
> <c013e2b0> note_interrupt+0x80/0xf0 <c013da8c> __do_IRQ+0xdc/0xf0
> <c0105799> do_IRQ+0x19/0x30 <c010391a> common_interrupt+0x1a/0x20
> <c0100d91> default_idle+0x41/0x70 <c0100e60> cpu_idle+0x80/0x90
> <c046699d> start_kernel+0x18d/0x1d0 <c0466330> unknown_bootoption
> +0x0/0x1d0
> handlers:
> [<c0301300>] (qs_intr+0x0/0x220)
> Disabling IRQ #193

What other devices are routed to that same interrupt line?

The sata_qstor driver is very rigorous in acknowledging *only* it's own
interrupts, to prevent other devices sharing the same IRQ from losing theirs.

Mmm.. We could apply a bit of fuzzy tolerance for the odd glitch.
Try this patch (attached) and report back.

Thanks.



Attachments:
qstor_spurious.patch (833.00 B)

2006-11-06 04:04:39

by Alberto Alonso

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

I have patched the system. I'll let you know if I get the
same problem. Since I can't reproduce it on demand I may not see
the problem for months. I don't know what happened the other day
when it just started to happen repeatedly, I haven't had the issue
since.

Thanks,

Alberto


On Sat, 2006-11-04 at 13:39 -0500, Mark Lord wrote:
> Alberto Alonso wrote:
> > I have a Pacific Digital qstor card on irq 193. I am using kernel
> > 2.6.17.13 SMP
> >
> > The error happens every now and then. I have not been able to
> > figure out any triggers and I can not reproduce it on demand. Today
> > it happened 3 times within a 40 minutes period.
> >
> > All disks connected to the card are disabled and I can't do anything
> > other than a reboot to get them back.
> >
> > It is reported as follows:
> >
> > irq 193: nobody cared (try booting with the "irqpoll" option)
> > <c013e19a> __report_bad_irq+0x2a/0xa0 <c013d970> handle_IRQ_event
> > +0x30/0x70
> > <c013e2b0> note_interrupt+0x80/0xf0 <c013da8c> __do_IRQ+0xdc/0xf0
> > <c0105799> do_IRQ+0x19/0x30 <c010391a> common_interrupt+0x1a/0x20
> > <c0100d91> default_idle+0x41/0x70 <c0100e60> cpu_idle+0x80/0x90
> > <c046699d> start_kernel+0x18d/0x1d0 <c0466330> unknown_bootoption
> > +0x0/0x1d0
> > handlers:
> > [<c0301300>] (qs_intr+0x0/0x220)
> > Disabling IRQ #193
>
> What other devices are routed to that same interrupt line?
>
> The sata_qstor driver is very rigorous in acknowledging *only* it's own
> interrupts, to prevent other devices sharing the same IRQ from losing theirs.
>
> Mmm.. We could apply a bit of fuzzy tolerance for the odd glitch.
> Try this patch (attached) and report back.
>
> Thanks.
>
>
--
Alberto Alonso Global Gate Systems LLC.
(512) 351-7233 http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups

2006-11-06 14:51:59

by Mark Lord

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

Alberto Alonso wrote:
> I have patched the system. I'll let you know if I get the
> same problem. Since I can't reproduce it on demand I may not see
> the problem for months. I don't know what happened the other day
> when it just started to happen repeatedly, I haven't had the issue
> since.

Ahh. Well, in that case please add a printk() line so that we'll know
if it *ever* does anything! Updated patch attached.

Cheers


Attachments:
qstor_spurious2.patch (877.00 B)

2006-11-10 18:10:31

by Alberto Alonso

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

The saga continues. It happened again this morning even with the
patch:

irq 193: nobody cared (try booting with the "irqpoll" option)
<c013e19a> __report_bad_irq+0x2a/0xa0 <c013d970> handle_IRQ_event
+0x30/0x70
<c013e2b0> note_interrupt+0x80/0xf0 <c013da8c> __do_IRQ+0xdc/0xf0
<c0105799> do_IRQ+0x19/0x30 <c010391a> common_interrupt+0x1a/0x20
<c0100d91> default_idle+0x41/0x70 <c0100e60> cpu_idle+0x80/0x90
<c046699d> start_kernel+0x18d/0x1d0 <c0466330> unknown_bootoption
+0x0/0x1d0
handlers:
[<c0301300>] (qs_intr+0x0/0x240)
Disabling IRQ #193
ata4: command 0xea timeout, stat 0xff host_stat 0x0
ata1: command 0xea timeout, stat 0xff host_stat 0x0

[...]

The drives on this controller went offline.

This is my interrupts table:

cat /proc/interrupts
CPU0 CPU1
0: 48405419 48520371 IO-APIC-edge timer
1: 60 146 IO-APIC-edge i8042
2: 0 0 XT-PIC cascade
8: 1 3 IO-APIC-edge rtc
10: 13 13 IO-APIC-level ohci_hcd:usb1
14: 228219 237351 IO-APIC-edge ide0
15: 3 9 IO-APIC-edge ide1
145: 0 0 IO-APIC-level ivtv0
153: 2272523 2393822 IO-APIC-level ide2, ide3, ide4, ide5
161: 5416 23455 IO-APIC-level ide8, ide9
169: 7148 22129 IO-APIC-level ide6, ide7
185: 3680365 86 IO-APIC-level eth0
193: 337154 162846 IO-APIC-level libata
NMI: 0 0
LOC: 96924026 96924027
ERR: 0
MIS: 0

lspci -v shows this for the controller. No other device (at least
pci device) is using IRQ 193.


0000:01:03.0 0106: Pacific Digital Corp: Unknown device 2068 (rev 01)
Subsystem: Pacific Digital Corp: Unknown device 2068
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 193
I/O ports at eff0 [size=8]
I/O ports at efe0 [size=8]
I/O ports at efa8 [size=8]
I/O ports at efa0 [size=8]
Memory at febf0000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at febe0000 [disabled] [size=64K]
Capabilities: <available only to root>


Any other ideas I can try to look for? Is there any other type
of device that may be using the IRQ but not listed with the above
commands?

Thanks,

Alberto


On Sat, 2006-11-04 at 13:39 -0500, Mark Lord wrote:
> Alberto Alonso wrote:
> > I have a Pacific Digital qstor card on irq 193. I am using kernel
> > 2.6.17.13 SMP
> >
> > The error happens every now and then. I have not been able to
> > figure out any triggers and I can not reproduce it on demand. Today
> > it happened 3 times within a 40 minutes period.
> >
> > All disks connected to the card are disabled and I can't do anything
> > other than a reboot to get them back.
> >
> > It is reported as follows:
> >
> > irq 193: nobody cared (try booting with the "irqpoll" option)
> > <c013e19a> __report_bad_irq+0x2a/0xa0 <c013d970> handle_IRQ_event
> > +0x30/0x70
> > <c013e2b0> note_interrupt+0x80/0xf0 <c013da8c> __do_IRQ+0xdc/0xf0
> > <c0105799> do_IRQ+0x19/0x30 <c010391a> common_interrupt+0x1a/0x20
> > <c0100d91> default_idle+0x41/0x70 <c0100e60> cpu_idle+0x80/0x90
> > <c046699d> start_kernel+0x18d/0x1d0 <c0466330> unknown_bootoption
> > +0x0/0x1d0
> > handlers:
> > [<c0301300>] (qs_intr+0x0/0x220)
> > Disabling IRQ #193
>
> What other devices are routed to that same interrupt line?
>
> The sata_qstor driver is very rigorous in acknowledging *only* it's own
> interrupts, to prevent other devices sharing the same IRQ from losing theirs.
>
> Mmm.. We could apply a bit of fuzzy tolerance for the odd glitch.
> Try this patch (attached) and report back.
>
> Thanks.
>
>
--
Alberto Alonso Global Gate Systems LLC.
(512) 351-7233 http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups

2006-11-12 05:09:12

by Mark Lord

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

Alberto Alonso wrote:
> The saga continues. It happened again this morning even with the
> patch:
..
>> Mmm.. We could apply a bit of fuzzy tolerance for the odd glitch.
>> Try this patch (attached) and report back.

Did you add the printk() to the patch, as suggested?
If so, did it log anything ?

Thanks

2006-11-12 05:18:04

by Alberto Alonso

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

Sorry, I missed that post. I will add the printk line right
now. I'll let you know the results as soon as it happens
again.

Thanks,


Alberto

On Sun, 2006-11-12 at 00:09 -0500, Mark Lord wrote:
> Alberto Alonso wrote:
> > The saga continues. It happened again this morning even with the
> > patch:
> ..
> >> Mmm.. We could apply a bit of fuzzy tolerance for the odd glitch.
> >> Try this patch (attached) and report back.
>
> Did you add the printk() to the patch, as suggested?
> If so, did it log anything ?
>
> Thanks
--
Alberto Alonso Global Gate Systems LLC.
(512) 351-7233 http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups

2006-11-12 20:31:24

by Alberto Alonso

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

OK, after adding the printk line I can start seeing
results.

I guess it has been close to 10 on quite a few
occasions.

Thanks,

Alberto

# grep qstor /var/log/messages
Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=0
Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=1
Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=0
Nov 12 07:00:56 w100 kernel: sata_qstor: spurious=1
Nov 12 07:00:56 w100 kernel: sata_qstor: spurious=2
Nov 12 07:00:56 w100 kernel: sata_qstor: spurious=0
Nov 12 07:03:35 w100 kernel: sata_qstor: spurious=0
Nov 12 07:34:44 w100 kernel: sata_qstor: spurious=0
Nov 12 07:35:08 w100 kernel: sata_qstor: spurious=1
Nov 12 07:35:08 w100 kernel: sata_qstor: spurious=1
Nov 12 07:35:08 w100 kernel: sata_qstor: spurious=0
Nov 12 09:11:17 w100 kernel: sata_qstor: spurious=0
Nov 12 09:11:17 w100 kernel: sata_qstor: spurious=1
Nov 12 09:11:17 w100 kernel: sata_qstor: spurious=0
Nov 12 09:11:17 w100 kernel: sata_qstor: spurious=1
Nov 12 09:11:18 w100 kernel: sata_qstor: spurious=0
Nov 12 09:36:34 w100 kernel: sata_qstor: spurious=0
Nov 12 09:38:26 w100 kernel: sata_qstor: spurious=0
Nov 12 09:38:26 w100 kernel: sata_qstor: spurious=0
Nov 12 09:53:06 w100 kernel: sata_qstor: spurious=0
Nov 12 09:53:06 w100 kernel: sata_qstor: spurious=1
Nov 12 09:53:06 w100 kernel: sata_qstor: spurious=2
Nov 12 09:53:06 w100 kernel: sata_qstor: spurious=0
Nov 12 09:53:09 w100 kernel: sata_qstor: spurious=1
Nov 12 10:09:03 w100 kernel: sata_qstor: spurious=0
Nov 12 10:09:04 w100 kernel: sata_qstor: spurious=1
Nov 12 10:09:05 w100 kernel: sata_qstor: spurious=0
Nov 12 10:09:05 w100 kernel: sata_qstor: spurious=0
Nov 12 10:11:52 w100 kernel: sata_qstor: spurious=0
Nov 12 10:11:53 w100 kernel: sata_qstor: spurious=1
Nov 12 10:11:53 w100 kernel: sata_qstor: spurious=2
Nov 12 10:11:53 w100 kernel: sata_qstor: spurious=3
Nov 12 10:11:54 w100 kernel: sata_qstor: spurious=0
Nov 12 10:11:55 w100 kernel: sata_qstor: spurious=1
Nov 12 10:11:55 w100 kernel: sata_qstor: spurious=0
Nov 12 10:12:35 w100 kernel: sata_qstor: spurious=1
Nov 12 10:12:37 w100 kernel: sata_qstor: spurious=0
Nov 12 10:12:37 w100 kernel: sata_qstor: spurious=1
Nov 12 10:12:37 w100 kernel: sata_qstor: spurious=0
Nov 12 10:30:51 w100 kernel: sata_qstor: spurious=0
Nov 12 10:30:51 w100 kernel: sata_qstor: spurious=1
Nov 12 10:30:51 w100 kernel: sata_qstor: spurious=2
Nov 12 10:30:52 w100 kernel: sata_qstor: spurious=0
Nov 12 10:30:52 w100 kernel: sata_qstor: spurious=0
Nov 12 10:40:08 w100 kernel: sata_qstor: spurious=0
Nov 12 10:40:08 w100 kernel: sata_qstor: spurious=1
Nov 12 10:57:34 w100 kernel: sata_qstor: spurious=0
Nov 12 10:57:34 w100 kernel: sata_qstor: spurious=1
Nov 12 10:57:34 w100 kernel: sata_qstor: spurious=1
Nov 12 10:57:34 w100 kernel: sata_qstor: spurious=0
Nov 12 10:57:37 w100 kernel: sata_qstor: spurious=0
Nov 12 10:57:37 w100 kernel: sata_qstor: spurious=1
Nov 12 10:57:37 w100 kernel: sata_qstor: spurious=2
Nov 12 10:57:37 w100 kernel: sata_qstor: spurious=3
Nov 12 10:57:37 w100 kernel: sata_qstor: spurious=4
Nov 12 10:57:37 w100 kernel: sata_qstor: spurious=5
Nov 12 10:57:37 w100 kernel: sata_qstor: spurious=3
Nov 12 11:01:18 w100 kernel: sata_qstor: spurious=0
Nov 12 11:01:19 w100 kernel: sata_qstor: spurious=1
Nov 12 11:18:35 w100 kernel: sata_qstor: spurious=0
Nov 12 11:18:39 w100 kernel: sata_qstor: spurious=1
Nov 12 11:18:40 w100 kernel: sata_qstor: spurious=0
Nov 12 11:18:40 w100 kernel: sata_qstor: spurious=0
Nov 12 11:21:42 w100 kernel: sata_qstor: spurious=0
Nov 12 11:35:11 w100 kernel: sata_qstor: spurious=0
Nov 12 11:35:11 w100 kernel: sata_qstor: spurious=1
Nov 12 11:35:11 w100 kernel: sata_qstor: spurious=0
Nov 12 11:35:12 w100 kernel: sata_qstor: spurious=0
Nov 12 11:35:12 w100 kernel: sata_qstor: spurious=1
Nov 12 11:35:12 w100 kernel: sata_qstor: spurious=2
Nov 12 11:35:12 w100 kernel: sata_qstor: spurious=3
Nov 12 11:35:12 w100 kernel: sata_qstor: spurious=4
Nov 12 11:35:12 w100 kernel: sata_qstor: spurious=2
Nov 12 11:38:01 w100 kernel: sata_qstor: spurious=0
Nov 12 11:38:01 w100 kernel: sata_qstor: spurious=1
Nov 12 11:38:01 w100 kernel: sata_qstor: spurious=0
Nov 12 11:42:53 w100 kernel: sata_qstor: spurious=0
Nov 12 11:50:26 w100 kernel: sata_qstor: spurious=0
Nov 12 11:50:26 w100 kernel: sata_qstor: spurious=1
Nov 12 11:50:26 w100 kernel: sata_qstor: spurious=2
Nov 12 11:50:26 w100 kernel: sata_qstor: spurious=0
Nov 12 12:07:03 w100 kernel: sata_qstor: spurious=0
Nov 12 12:07:03 w100 kernel: sata_qstor: spurious=0
Nov 12 12:07:03 w100 kernel: sata_qstor: spurious=1
Nov 12 12:07:03 w100 kernel: sata_qstor: spurious=2
Nov 12 12:07:11 w100 kernel: sata_qstor: spurious=0
Nov 12 12:07:11 w100 kernel: sata_qstor: spurious=1
Nov 12 12:07:11 w100 kernel: sata_qstor: spurious=0
Nov 12 12:15:08 w100 kernel: sata_qstor: spurious=0
Nov 12 12:15:31 w100 kernel: sata_qstor: spurious=1
Nov 12 12:15:31 w100 kernel: sata_qstor: spurious=0
Nov 12 12:15:31 w100 kernel: sata_qstor: spurious=1
Nov 12 12:15:31 w100 kernel: sata_qstor: spurious=2
Nov 12 12:41:11 w100 kernel: sata_qstor: spurious=0
Nov 12 12:41:28 w100 kernel: sata_qstor: spurious=1
Nov 12 12:41:28 w100 kernel: sata_qstor: spurious=2
Nov 12 12:41:28 w100 kernel: sata_qstor: spurious=3
Nov 12 12:41:28 w100 kernel: sata_qstor: spurious=4
Nov 12 12:41:28 w100 kernel: sata_qstor: spurious=5
Nov 12 12:41:28 w100 kernel: sata_qstor: spurious=3
Nov 12 13:05:42 w100 kernel: sata_qstor: spurious=0
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=1
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=2
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=3
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=4
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=5
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=6
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=6
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=0
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=1
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=2
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=3
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=4
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=5
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=6
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=6
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=0
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=1
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=2
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=3
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=4
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=5
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=6
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=7
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=6
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=0
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=1
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=0
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=1
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=2
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=3
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=4
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=5
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=6
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=7
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=7
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=4
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=5
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=6
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=7
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:20:15 w100 kernel: sata_qstor: spurious=0
Nov 12 13:20:15 w100 kernel: sata_qstor: spurious=1
Nov 12 13:20:15 w100 kernel: sata_qstor: spurious=2
Nov 12 13:20:17 w100 kernel: sata_qstor: spurious=0
Nov 12 13:20:17 w100 kernel: sata_qstor: spurious=1
Nov 12 13:20:17 w100 kernel: sata_qstor: spurious=0
Nov 12 13:23:30 w100 kernel: sata_qstor: spurious=0
Nov 12 13:23:30 w100 kernel: sata_qstor: spurious=1
Nov 12 13:23:30 w100 kernel: sata_qstor: spurious=2
Nov 12 13:23:30 w100 kernel: sata_qstor: spurious=3
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=0
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=0
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=1
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=0
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=0
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=1
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=1
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=2
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=0
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=1
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=2
Nov 12 13:24:08 w100 kernel: sata_qstor: spurious=0
Nov 12 13:24:08 w100 kernel: sata_qstor: spurious=1
Nov 12 13:24:08 w100 kernel: sata_qstor: spurious=1
Nov 12 13:24:08 w100 kernel: sata_qstor: spurious=0
Nov 12 13:26:36 w100 kernel: sata_qstor: spurious=0
Nov 12 13:26:36 w100 kernel: sata_qstor: spurious=0
Nov 12 13:26:36 w100 kernel: sata_qstor: spurious=1
Nov 12 13:26:36 w100 kernel: sata_qstor: spurious=2
Nov 12 13:31:30 w100 kernel: sata_qstor: spurious=0
Nov 12 13:31:33 w100 kernel: sata_qstor: spurious=1
Nov 12 13:31:33 w100 kernel: sata_qstor: spurious=1
Nov 12 13:32:18 w100 kernel: sata_qstor: spurious=0
Nov 12 13:32:43 w100 kernel: sata_qstor: spurious=1
Nov 12 13:32:43 w100 kernel: sata_qstor: spurious=2
Nov 12 13:32:43 w100 kernel: sata_qstor: spurious=3
Nov 12 13:32:43 w100 kernel: sata_qstor: spurious=3
Nov 12 13:33:07 w100 kernel: sata_qstor: spurious=0
Nov 12 13:33:07 w100 kernel: sata_qstor: spurious=1
Nov 12 13:33:07 w100 kernel: sata_qstor: spurious=2
Nov 12 13:33:07 w100 kernel: sata_qstor: spurious=3
Nov 12 13:33:07 w100 kernel: sata_qstor: spurious=4
Nov 12 13:33:07 w100 kernel: sata_qstor: spurious=2
Nov 12 13:33:08 w100 kernel: sata_qstor: spurious=0
Nov 12 13:33:39 w100 kernel: sata_qstor: spurious=0
Nov 12 13:34:53 w100 kernel: sata_qstor: spurious=0
Nov 12 13:34:53 w100 kernel: sata_qstor: spurious=1
Nov 12 13:34:53 w100 kernel: sata_qstor: spurious=0
Nov 12 13:41:15 w100 kernel: sata_qstor: spurious=0
Nov 12 13:41:15 w100 kernel: sata_qstor: spurious=1
Nov 12 13:41:19 w100 kernel: sata_qstor: spurious=0
Nov 12 13:41:19 w100 kernel: sata_qstor: spurious=1
Nov 12 13:41:19 w100 kernel: sata_qstor: spurious=2
Nov 12 13:41:19 w100 kernel: sata_qstor: spurious=3
Nov 12 13:41:19 w100 kernel: sata_qstor: spurious=4
Nov 12 13:41:19 w100 kernel: sata_qstor: spurious=5
Nov 12 13:41:19 w100 kernel: sata_qstor: spurious=3
Nov 12 13:49:43 w100 kernel: sata_qstor: spurious=0
Nov 12 13:49:43 w100 kernel: sata_qstor: spurious=0
Nov 12 13:49:43 w100 kernel: sata_qstor: spurious=1
Nov 12 13:49:43 w100 kernel: sata_qstor: spurious=2
Nov 12 13:49:49 w100 kernel: sata_qstor: spurious=0
Nov 12 13:49:49 w100 kernel: sata_qstor: spurious=0
Nov 12 13:49:49 w100 kernel: sata_qstor: spurious=1
Nov 12 13:49:49 w100 kernel: sata_qstor: spurious=2
Nov 12 13:53:04 w100 kernel: sata_qstor: spurious=0
Nov 12 13:53:32 w100 kernel: sata_qstor: spurious=0
Nov 12 13:53:32 w100 kernel: sata_qstor: spurious=1
Nov 12 13:53:32 w100 kernel: sata_qstor: spurious=0
Nov 12 13:56:11 w100 kernel: sata_qstor: spurious=0
Nov 12 13:56:11 w100 kernel: sata_qstor: spurious=0
Nov 12 13:56:11 w100 kernel: sata_qstor: spurious=1
Nov 12 13:56:11 w100 kernel: sata_qstor: spurious=2
Nov 12 13:56:11 w100 kernel: sata_qstor: spurious=0
Nov 12 13:56:11 w100 kernel: sata_qstor: spurious=0
Nov 12 13:56:11 w100 kernel: sata_qstor: spurious=1
Nov 12 13:56:11 w100 kernel: sata_qstor: spurious=2
Nov 12 13:56:13 w100 kernel: sata_qstor: spurious=0
Nov 12 13:56:13 w100 kernel: sata_qstor: spurious=0
Nov 12 14:10:49 w100 kernel: sata_qstor: spurious=0
Nov 12 14:10:49 w100 kernel: sata_qstor: spurious=1
Nov 12 14:10:49 w100 kernel: sata_qstor: spurious=0
Nov 12 14:10:50 w100 kernel: sata_qstor: spurious=1
Nov 12 14:10:50 w100 kernel: sata_qstor: spurious=1
Nov 12 14:10:50 w100 kernel: sata_qstor: spurious=0
Nov 12 14:11:40 w100 kernel: sata_qstor: spurious=0
Nov 12 14:11:40 w100 kernel: sata_qstor: spurious=1
Nov 12 14:11:40 w100 kernel: sata_qstor: spurious=2
Nov 12 14:11:40 w100 kernel: sata_qstor: spurious=3
Nov 12 14:14:32 w100 kernel: sata_qstor: spurious=0
Nov 12 14:14:32 w100 kernel: sata_qstor: spurious=0
Nov 12 14:14:32 w100 kernel: sata_qstor: spurious=1
Nov 12 14:14:32 w100 kernel: sata_qstor: spurious=1
Nov 12 14:14:32 w100 kernel: sata_qstor: spurious=0
Nov 12 14:22:18 w100 kernel: sata_qstor: spurious=0
Nov 12 14:22:22 w100 kernel: sata_qstor: spurious=1
Nov 12 14:22:22 w100 kernel: sata_qstor: spurious=2
Nov 12 14:22:22 w100 kernel: sata_qstor: spurious=3
Nov 12 14:26:51 w100 kernel: sata_qstor: spurious=0
Nov 12 14:26:51 w100 kernel: sata_qstor: spurious=1
Nov 12 14:26:51 w100 kernel: sata_qstor: spurious=0



On Sun, 2006-11-12 at 00:09 -0500, Mark Lord wrote:
> Alberto Alonso wrote:
> > The saga continues. It happened again this morning even with the
> > patch:
> ..
> >> Mmm.. We could apply a bit of fuzzy tolerance for the odd glitch.
> >> Try this patch (attached) and report back.
>
> Did you add the printk() to the patch, as suggested?
> If so, did it log anything ?
>
> Thanks
--
Alberto Alonso Global Gate Systems LLC.
(512) 351-7233 http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups

2006-11-13 14:29:10

by Mark Lord

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

Alberto Alonso wrote:
> OK, after adding the printk line I can start seeing
> results.
>
> I guess it has been close to 10 on quite a few
> occasions.
..
> # grep qstor /var/log/messages
> Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=0
> Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=1
> Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=0
> Nov 12 07:00:56 w100 kernel: sata_qstor: spurious=1
> Nov 12 07:00:56 w100 kernel: sata_qstor: spurious=2
..
> On Sun, 2006-11-12 at 00:09 -0500, Mark Lord wrote:
>> Alberto Alonso wrote:
>>> The saga continues. It happened again this morning even with the
>>> patch:
>> ..
>>>> Mmm.. We could apply a bit of fuzzy tolerance for the odd glitch.
>>>> Try this patch (attached) and report back.
>> Did you add the printk() to the patch, as suggested?
..

Excellent!

So, either we have a very noisy bit of hardware in there,
or something is wrong with sata_qstor.c.

The device has two methods for dealing with commands.
Regular R/W uses the driver's host queue "packet" interface,
and all other commands pass through the legacy MMIO mechanism.

I'm betting on some bug/interaction with the latter.

Try this patch and see what happens, on top of the printk patch.

Thanks




Attachments:
qstor_check_status.patch (703.00 B)

2006-11-14 04:51:03

by Alberto Alonso

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

Things are improving, after the latest patch I can still
see spurious messages, but the count stays at 0.

Sorry for including all the lines, but I think it may help
to see if a time pattern is visible. Here is the output:

Nov 13 14:01:08 w100 kernel: scsi0 : sata_qstor
Nov 13 14:01:08 w100 kernel: scsi1 : sata_qstor
Nov 13 14:01:08 w100 kernel: scsi2 : sata_qstor
Nov 13 14:01:08 w100 kernel: scsi3 : sata_qstor
Nov 13 14:05:15 w100 kernel: sata_qstor: spurious=0
Nov 13 14:08:29 w100 kernel: sata_qstor: spurious=0
Nov 13 14:11:01 w100 kernel: sata_qstor: spurious=0
Nov 13 14:11:06 w100 kernel: sata_qstor: spurious=0
Nov 13 14:15:03 w100 kernel: sata_qstor: spurious=0
Nov 13 14:20:28 w100 kernel: sata_qstor: spurious=0
Nov 13 14:28:44 w100 kernel: sata_qstor: spurious=0
Nov 13 14:40:40 w100 kernel: sata_qstor: spurious=0
Nov 13 14:46:53 w100 kernel: sata_qstor: spurious=0
Nov 13 15:00:15 w100 kernel: sata_qstor: spurious=0
Nov 13 15:01:52 w100 kernel: sata_qstor: spurious=0
Nov 13 15:13:18 w100 kernel: sata_qstor: spurious=0
Nov 13 15:18:25 w100 kernel: sata_qstor: spurious=0
Nov 13 15:19:57 w100 kernel: sata_qstor: spurious=0
Nov 13 15:24:56 w100 kernel: sata_qstor: spurious=0
Nov 13 15:32:52 w100 kernel: sata_qstor: spurious=0
Nov 13 15:32:53 w100 kernel: sata_qstor: spurious=0
Nov 13 15:37:47 w100 kernel: sata_qstor: spurious=0
Nov 13 15:42:30 w100 kernel: sata_qstor: spurious=0
Nov 13 15:42:33 w100 kernel: sata_qstor: spurious=0
Nov 13 15:45:50 w100 kernel: sata_qstor: spurious=0
Nov 13 15:54:53 w100 kernel: sata_qstor: spurious=0
Nov 13 15:59:32 w100 kernel: sata_qstor: spurious=0
Nov 13 16:04:07 w100 kernel: sata_qstor: spurious=0
Nov 13 16:06:01 w100 kernel: sata_qstor: spurious=0
Nov 13 16:07:24 w100 kernel: sata_qstor: spurious=0
Nov 13 16:19:54 w100 kernel: sata_qstor: spurious=0
Nov 13 16:19:55 w100 kernel: sata_qstor: spurious=0
Nov 13 16:27:34 w100 kernel: sata_qstor: spurious=0
Nov 13 16:33:21 w100 kernel: sata_qstor: spurious=0
Nov 13 16:39:57 w100 kernel: sata_qstor: spurious=0
Nov 13 16:42:38 w100 kernel: sata_qstor: spurious=0
Nov 13 16:43:40 w100 kernel: sata_qstor: spurious=0
Nov 13 16:50:36 w100 kernel: sata_qstor: spurious=0
Nov 13 16:52:11 w100 kernel: sata_qstor: spurious=0
Nov 13 16:53:41 w100 kernel: sata_qstor: spurious=0
Nov 13 16:58:30 w100 kernel: sata_qstor: spurious=0
Nov 13 17:18:40 w100 kernel: sata_qstor: spurious=0
Nov 13 17:20:21 w100 kernel: sata_qstor: spurious=0
Nov 13 17:20:46 w100 kernel: sata_qstor: spurious=0
Nov 13 17:24:02 w100 kernel: sata_qstor: spurious=0
Nov 13 17:27:26 w100 kernel: sata_qstor: spurious=0
Nov 13 17:35:40 w100 kernel: sata_qstor: spurious=0
Nov 13 17:39:01 w100 kernel: sata_qstor: spurious=0
Nov 13 17:40:23 w100 kernel: sata_qstor: spurious=0
Nov 13 18:03:57 w100 kernel: sata_qstor: spurious=0
Nov 13 18:08:16 w100 kernel: sata_qstor: spurious=0
Nov 13 18:11:20 w100 kernel: sata_qstor: spurious=0
Nov 13 18:16:32 w100 kernel: sata_qstor: spurious=0
Nov 13 18:16:32 w100 kernel: sata_qstor: spurious=0
Nov 13 18:21:28 w100 kernel: sata_qstor: spurious=0
Nov 13 18:22:57 w100 kernel: sata_qstor: spurious=0
Nov 13 18:26:36 w100 kernel: sata_qstor: spurious=0
Nov 13 18:29:21 w100 kernel: sata_qstor: spurious=0
Nov 13 18:32:34 w100 kernel: sata_qstor: spurious=0
Nov 13 18:34:39 w100 kernel: sata_qstor: spurious=0
Nov 13 18:36:17 w100 kernel: sata_qstor: spurious=0
Nov 13 18:38:07 w100 kernel: sata_qstor: spurious=0
Nov 13 18:42:58 w100 kernel: sata_qstor: spurious=0
Nov 13 18:45:19 w100 kernel: sata_qstor: spurious=0
Nov 13 18:47:14 w100 kernel: sata_qstor: spurious=0
Nov 13 18:49:42 w100 kernel: sata_qstor: spurious=0
Nov 13 18:52:22 w100 kernel: sata_qstor: spurious=0
Nov 13 18:55:28 w100 kernel: sata_qstor: spurious=0
Nov 13 18:55:42 w100 kernel: sata_qstor: spurious=0
Nov 13 19:00:12 w100 kernel: sata_qstor: spurious=0
Nov 13 19:04:15 w100 kernel: sata_qstor: spurious=0
Nov 13 19:10:41 w100 kernel: sata_qstor: spurious=0
Nov 13 19:12:11 w100 kernel: sata_qstor: spurious=0
Nov 13 19:15:51 w100 kernel: sata_qstor: spurious=0
Nov 13 19:20:12 w100 kernel: sata_qstor: spurious=0
Nov 13 19:24:27 w100 kernel: sata_qstor: spurious=0
Nov 13 19:26:40 w100 kernel: sata_qstor: spurious=0
Nov 13 19:30:41 w100 kernel: sata_qstor: spurious=0
Nov 13 19:38:02 w100 kernel: sata_qstor: spurious=0
Nov 13 19:41:07 w100 kernel: sata_qstor: spurious=0
Nov 13 19:42:42 w100 kernel: sata_qstor: spurious=0
Nov 13 19:48:06 w100 kernel: sata_qstor: spurious=0
Nov 13 19:55:05 w100 kernel: sata_qstor: spurious=0
Nov 13 19:55:07 w100 kernel: sata_qstor: spurious=0
Nov 13 19:58:00 w100 kernel: sata_qstor: spurious=0
Nov 13 19:58:07 w100 kernel: sata_qstor: spurious=0
Nov 13 20:05:51 w100 kernel: sata_qstor: spurious=0
Nov 13 20:08:32 w100 kernel: sata_qstor: spurious=0
Nov 13 20:09:39 w100 kernel: sata_qstor: spurious=0
Nov 13 20:11:54 w100 kernel: sata_qstor: spurious=0
Nov 13 20:17:19 w100 kernel: sata_qstor: spurious=0
Nov 13 20:17:57 w100 kernel: sata_qstor: spurious=0
Nov 13 20:23:48 w100 kernel: sata_qstor: spurious=0
Nov 13 20:25:14 w100 kernel: sata_qstor: spurious=0
Nov 13 20:25:16 w100 kernel: sata_qstor: spurious=0
Nov 13 20:28:56 w100 kernel: sata_qstor: spurious=0
Nov 13 20:29:00 w100 kernel: sata_qstor: spurious=0
Nov 13 20:32:39 w100 kernel: sata_qstor: spurious=0
Nov 13 20:35:06 w100 kernel: sata_qstor: spurious=0
Nov 13 20:38:20 w100 kernel: sata_qstor: spurious=0
Nov 13 20:41:30 w100 kernel: sata_qstor: spurious=0
Nov 13 20:41:33 w100 kernel: sata_qstor: spurious=0
Nov 13 20:45:00 w100 kernel: sata_qstor: spurious=0
Nov 13 21:08:22 w100 kernel: sata_qstor: spurious=0
Nov 13 21:10:28 w100 kernel: sata_qstor: spurious=0
Nov 13 21:13:26 w100 kernel: sata_qstor: spurious=0
Nov 13 21:15:22 w100 kernel: sata_qstor: spurious=0
Nov 13 21:23:32 w100 kernel: sata_qstor: spurious=0
Nov 13 21:24:11 w100 kernel: sata_qstor: spurious=0
Nov 13 21:26:40 w100 kernel: sata_qstor: spurious=0
Nov 13 21:33:14 w100 kernel: sata_qstor: spurious=0
Nov 13 21:45:55 w100 kernel: sata_qstor: spurious=0
Nov 13 21:45:57 w100 kernel: sata_qstor: spurious=0
Nov 13 21:56:52 w100 kernel: sata_qstor: spurious=0
Nov 13 21:56:56 w100 kernel: sata_qstor: spurious=0
Nov 13 22:01:53 w100 kernel: sata_qstor: spurious=0
Nov 13 22:22:46 w100 kernel: sata_qstor: spurious=0
Nov 13 22:30:44 w100 kernel: sata_qstor: spurious=0
Nov 13 22:38:52 w100 kernel: sata_qstor: spurious=0
Nov 13 22:41:48 w100 kernel: sata_qstor: spurious=0
Nov 13 22:43:20 w100 kernel: sata_qstor: spurious=0
Nov 13 22:44:58 w100 kernel: sata_qstor: spurious=0


Thanks,

Alberto

On Mon, 2006-11-13 at 09:29 -0500, Mark Lord wrote:
> Alberto Alonso wrote:
> > OK, after adding the printk line I can start seeing
> > results.
> >
> > I guess it has been close to 10 on quite a few
> > occasions.
> ..
> > # grep qstor /var/log/messages
> > Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=0
> > Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=1
> > Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=0
> > Nov 12 07:00:56 w100 kernel: sata_qstor: spurious=1
> > Nov 12 07:00:56 w100 kernel: sata_qstor: spurious=2
> ..
> > On Sun, 2006-11-12 at 00:09 -0500, Mark Lord wrote:
> >> Alberto Alonso wrote:
> >>> The saga continues. It happened again this morning even with the
> >>> patch:
> >> ..
> >>>> Mmm.. We could apply a bit of fuzzy tolerance for the odd glitch.
> >>>> Try this patch (attached) and report back.
> >> Did you add the printk() to the patch, as suggested?
> ..
>
> Excellent!
>
> So, either we have a very noisy bit of hardware in there,
> or something is wrong with sata_qstor.c.
>
> The device has two methods for dealing with commands.
> Regular R/W uses the driver's host queue "packet" interface,
> and all other commands pass through the legacy MMIO mechanism.
>
> I'm betting on some bug/interaction with the latter.
>
> Try this patch and see what happens, on top of the printk patch.
>
> Thanks
>
>
>
--
Alberto Alonso Global Gate Systems LLC.
(512) 351-7233 http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups

2006-11-14 16:46:41

by Mark Lord

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

Alberto Alonso wrote:
> Things are improving, after the latest patch I can still
> see spurious messages, but the count stays at 0.

Mmm.. Okay, so we have a kludge fix (just get rid of the printk's we added).

But I would like to find out more about what is going on.
We seem to be getting lots of "leftover interrupts".
I'll look through my full qstor block driver (the high-performance
queuing driver, out-of-tree), and see if we missed an IRQ-mask bit
someplace in the simple sata_qstor.c re-implementation.

Cheers

2006-11-14 18:17:43

by Alberto Alonso

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

Sounds good, let me know if you need me to do any
testing or help on the programming. I've been wanting
to get into kernel drivers, but never did find the
right time/place to do it.

Thanks,

Alberto

On Tue, 2006-11-14 at 11:46 -0500, Mark Lord wrote:
> Alberto Alonso wrote:
> > Things are improving, after the latest patch I can still
> > see spurious messages, but the count stays at 0.
>
> Mmm.. Okay, so we have a kludge fix (just get rid of the printk's we added).
>
> But I would like to find out more about what is going on.
> We seem to be getting lots of "leftover interrupts".
> I'll look through my full qstor block driver (the high-performance
> queuing driver, out-of-tree), and see if we missed an IRQ-mask bit
> someplace in the simple sata_qstor.c re-implementation.
>
> Cheers
--
Alberto Alonso Global Gate Systems LLC.
(512) 351-7233 http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups

2006-11-14 18:23:36

by Mark Lord

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

Alberto Alonso wrote:
> Sounds good, let me know if you need me to do any
> testing or help on the programming.

Okay, try this (untested) and see if it cleans things up.
I'm also including the old "printk" patch, so that we can
tell whether the "fix" is actually working or not.

Thanks


Attachments:
qstor_spurious2.patch (877.00 B)
qstor_nien_fix.patch (994.00 B)
Download all attachments

2006-11-16 06:36:00

by Alberto Alonso

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

Sorry for the long delay, I've been called on too
many issues at work this week.

Anyway, the patch basically made the drives not usable.
Here is the log from the boot process:


libata version 1.20 loaded.
sata_qstor 0000:01:03.0: version 0.05
ata1: SATA max UDMA/133 cmd 0xF8820400 ctl 0xF8820440 bmdma 0x0 irq 185
ata2: SATA max UDMA/133 cmd 0xF8824400 ctl 0xF8824440 bmdma 0x0 irq 185
ata3: SATA max UDMA/133 cmd 0xF8828400 ctl 0xF8828440 bmdma 0x0 irq 185
ata4: SATA max UDMA/133 cmd 0xF882C400 ctl 0xF882C440 bmdma 0x0 irq 185
ata1: SATA link up 3.0 Gbps (SStatus 123)
ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023
88:407f
ata1: dev 0 ATA-7, max UDMA/133, 976773168 sectors: LBA48
ata1: dev 0 configured for UDMA/133
scsi0 : sata_qstor
ata2: SATA link up 3.0 Gbps (SStatus 123)
ata2: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023
88:407f
ata2: dev 0 ATA-7, max UDMA/133, 976773168 sectors: LBA48
ata2: dev 0 configured for UDMA/133
scsi1 : sata_qstor
ata3: SATA link up 3.0 Gbps (SStatus 123)
ata3: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023
88:407f
ata3: dev 0 ATA-7, max UDMA/133, 976771055 sectors: LBA48
ata3: dev 0 configured for UDMA/133
scsi2 : sata_qstor
ata4: SATA link up 3.0 Gbps (SStatus 123)
ata4: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023
88:407f
ata4: dev 0 ATA-7, max UDMA/133, 976773168 sectors: LBA48
ata4: dev 0 configured for UDMA/133
scsi3 : sata_qstor
Vendor: ATA Model: ST3500641AS Rev: 3.AA
Type: Direct-Access ANSI SCSI revision: 05
Vendor: ATA Model: ST3500641AS Rev: 3.AA
Type: Direct-Access ANSI SCSI revision: 05
Vendor: ATA Model: ST3500641AS Rev: 3.AA
Type: Direct-Access ANSI SCSI revision: 05
Vendor: ATA Model: ST3500641AS Rev: 3.AA
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
sda: unknown partition table
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 976773168 512-byte hdwr sectors (500108 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
SCSI device sdb: 976773168 512-byte hdwr sectors (500108 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
sdb: unknown partition table
sd 1:0:0:0: Attached scsi disk sdb
SCSI device sdc: 976771055 512-byte hdwr sectors (500107 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: drive cache: write back
SCSI device sdc: 976771055 512-byte hdwr sectors (500107 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: drive cache: write back
sdc: sdc1
sd 2:0:0:0: Attached scsi disk sdc
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
sdd: sdd1
sd 3:0:0:0: Attached scsi disk sdd
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 1:0:0:0: Attached scsi generic sg1 type 0
sd 2:0:0:0: Attached scsi generic sg2 type 0
sd 3:0:0:0: Attached scsi generic sg3 type 0

[...]

md: md3 stopped.
ata1: command 0x25 timeout, stat 0xff host_stat 0x0
ata1: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata1: status=0xff { Busy }
sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sda, sector 976772992
Buffer I/O error on device sda, logical block 122096624
ata1: command 0x25 timeout, stat 0xff host_stat 0x0
ata1: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata1: status=0xff { Busy }
sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sda, sector 976772992
Buffer I/O error on device sda, logical block 122096624
ata2: command 0x25 timeout, stat 0xff host_stat 0x0
ata2: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata2: status=0xff { Busy }
sd 1:0:0:0: SCSI error: return code = 0x8000002
sdb: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdb, sector 976772992
Buffer I/O error on device sdb, logical block 122096624
ata2: command 0x25 timeout, stat 0xff host_stat 0x0
ata2: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata2: status=0xff { Busy }
sd 1:0:0:0: SCSI error: return code = 0x8000002
sdb: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdb, sector 976772992
Buffer I/O error on device sdb, logical block 122096624
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770816
Buffer I/O error on device sdc, logical block 976770816
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770817
Buffer I/O error on device sdc, logical block 976770817
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770818
Buffer I/O error on device sdc, logical block 976770818
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770819
Buffer I/O error on device sdc, logical block 976770819
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770820
Buffer I/O error on device sdc, logical block 976770820
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770821
Buffer I/O error on device sdc, logical block 976770821
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770822
Buffer I/O error on device sdc, logical block 976770822
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770823
Buffer I/O error on device sdc, logical block 976770823
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770816
Buffer I/O error on device sdc, logical block 976770816
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770817
Buffer I/O error on device sdc, logical block 976770817
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770818
Buffer I/O error on device sdc, logical block 976770818
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770819
Buffer I/O error on device sdc, logical block 976770819
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770820
Buffer I/O error on device sdc, logical block 976770820
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770821
Buffer I/O error on device sdc, logical block 976770821
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770822
Buffer I/O error on device sdc, logical block 976770822
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770823
Buffer I/O error on device sdc, logical block 976770823
ata4: command 0x25 timeout, stat 0xff host_stat 0x0
ata4: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata4: status=0xff { Busy }
sd 3:0:0:0: SCSI error: return code = 0x8000002
sdd: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdd, sector 976772992
Buffer I/O error on device sdd, logical block 122096624
ata4: command 0x25 timeout, stat 0xff host_stat 0x0
ata4: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata4: status=0xff { Busy }
sd 3:0:0:0: SCSI error: return code = 0x8000002
sdd: Current: sense key=0xb
ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdd, sector 976772992
Buffer I/O error on device sdd, logical block 122096624

[...]

md3 is software RAID 5 composed of sda through sdd, which are
the disks attached to the qstor.

I went back to the old kernel to be able to use this
array.

Any ideas?


Thanks,

Alberto


On Tue, 2006-11-14 at 13:23 -0500, Mark Lord wrote:
> Alberto Alonso wrote:
> > Sounds good, let me know if you need me to do any
> > testing or help on the programming.
>
> Okay, try this (untested) and see if it cleans things up.
> I'm also including the old "printk" patch, so that we can
> tell whether the "fix" is actually working or not.
>
> Thanks
--
Alberto Alonso Global Gate Systems LLC.
(512) 351-7233 http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups

2006-11-16 13:53:15

by Mark Lord

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

Alberto Alonso wrote:
> Sorry for the long delay, I've been called on too
> many issues at work this week.
>
> Anyway, the patch basically made the drives not usable.

Mmm.. Okay, thanks for helping track this down.

It appears that this got broken when the ATA_TFLAG_POLLING
got introduced into libata, replacing previous checks of ATA_NIEN.
Or maybe even before that. Not many of us have qstor cards!

Speaking of which, I'll dig my own qstor card out of mothballs soon,
and work out a proper fix for it soon-ish.

In the meanwhile, could you take a clean kernel, and apply the first
attached patch (qstor_spurious_1.patch), and see if it fixes things.

If not, then you can instead apply the second patch (qstor_spurious_kludge.patch)
and your problems should disappear. But I cannot actually push that rubbish
upstream, so a "proper" fix will have to come later.

Cheers


Attachments:
qstor_spurious_1.patch (458.00 B)
qstor_spurious_kludge.patch (934.00 B)
Download all attachments

2006-11-17 21:42:30

by Alberto Alonso

[permalink] [raw]
Subject: Re: qstor driver -> irq 193: nobody cared

FYI: trying with just the first patch didn't work. I got
the usual:

irq 185: nobody cared (try booting with the "irqpoll" option)
<c013e19a> __report_bad_irq+0x2a/0xa0 <c013d970> handle_IRQ_event
+0x30/0x70
<c013e2b0> note_interrupt+0x80/0xf0 <c013da8c> __do_IRQ+0xdc/0xf0
<c0105799> do_IRQ+0x19/0x30 <c010391a> common_interrupt+0x1a/0x20
<c0100d91> default_idle+0x41/0x70 <c0100e60> cpu_idle+0x80/0x90
handlers:
[<c0301300>] (qs_intr+0x0/0x230)
Disabling IRQ #185

Interestingly enough this is the first time I see it outside
of irq 193, not that it matter though.

I am going to try now with the second patch instead.

Alberto


On Thu, 2006-11-16 at 08:53 -0500, Mark Lord wrote:
> Alberto Alonso wrote:
> > Sorry for the long delay, I've been called on too
> > many issues at work this week.
> >
> > Anyway, the patch basically made the drives not usable.
>
> Mmm.. Okay, thanks for helping track this down.
>
> It appears that this got broken when the ATA_TFLAG_POLLING
> got introduced into libata, replacing previous checks of ATA_NIEN.
> Or maybe even before that. Not many of us have qstor cards!
>
> Speaking of which, I'll dig my own qstor card out of mothballs soon,
> and work out a proper fix for it soon-ish.
>
> In the meanwhile, could you take a clean kernel, and apply the first
> attached patch (qstor_spurious_1.patch), and see if it fixes things.
>
> If not, then you can instead apply the second patch (qstor_spurious_kludge.patch)
> and your problems should disappear. But I cannot actually push that rubbish
> upstream, so a "proper" fix will have to come later.
>
> Cheers
--
Alberto Alonso Global Gate Systems LLC.
(512) 351-7233 http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups