2007-11-22 01:43:18

by Al Niessner

[permalink] [raw]
Subject: Where is the interrupt going?


Quickly stated, I have a piece of hardware on the PCI bus that is
generating an interrupt (can watch it with a scope) but my handler is
not being called (no printk in /var/log/messages). So, where has the
interrupt gone?

Obligatory information:
1) I have done the google search and mailing list search finding lots of
ancillary information but not what I needed.
2) Yes, it is my fault, but I need some help from people more directly
involved in the kernel than myself to point out what I am doing wrong.
3) Thanks for any and all help in advance.

On with the detailed technical information. I developed a kernel module
for an PCI card back in 2.4, moved it to 2.6.3, then 2.6.11 or so and
now I am trying to move it to 2.6.22. When I began the to move to
2.6.22, I changed all of the deprecated calls for finding the card on
the PCI bus, modified the interrupt handler prototype, and changed my
readvv/writev to aio_read/aio_write following
http://lwn.net/Articles/202449/. So initialization looks like this:

p8620 = pci_get_device (APC8620_VENDOR_ID, APC8620_DEVICE_ID, p8620);
<... fail if p8620 is 0 ...>
apcsi[i].ret_val = register_chrdev (MAJOR_NUM,

DEVICE_NAME,

&apc8620_ops);
<... fail if ret_val < 0 ...>
apcsi[i].board_irq = p8620->irq;
status = request_irq (apcsi[i].board_irq,
apc8620_handler,
IRQF_DISABLED,
DEVICE_NAME,
(void*)&apcsi[i]);
<... fail if status != 0 ...>

I do check all of the return values to verify the call happened
successfully. There are some memory mapping calls that I have left out
since they are working while the interrupt is not.

Things seem to work for the most part because I can read/write data
through a memory map and verify the IndustryPack modules on the carrier
through their header. The memory map is still working sufficiently well
that I can program up one of the IndustryPack modules to generate an
interrupt every 2 seconds or so. Prior to my changes for 2.6.22 this
worked quite well. Since it is the interrupt portion of this game that
is giving me grief, lets stick with just that. apc8620_handler is:

static irqreturn_t apc8620_handler (int irq,
void
*did)
{
printk (KERN_NOTICE "apc8620: did (0x%lx)\n", (unsigned long)did);
<... other irrelevant steps ...>
return IRQ_HANDLED;
}

I would then expect that every two seconds or so I would see a message
from apc8620_handler pop up. Instead I see nothing. Poking around I see
that the kernel module is loaded and attached to my devices and set for
IRQ 10:

lsmod: -> acromag8620 4207556 0
cat /proc/devices -> 46 apc8620
cat /proc/interrupts -> 10: 0 IO-APIC-edge apc8620

With /proc/interrupts, LOC keeps growing at a rate faster than what my
hardware is generating and I have no idea what LOC means, but ERR and
MIS (I take it to mean error and missed respectively) are both 0 and
remain 0 indefinitely.

In /var/log/messages, I do not see any missing interrupt messages or any
other report indicating that there is some trouble.

Assuming no one sees the error I am making right off the bat and would
like me to probe the interrupt system a little bit more, please give me
a suggestion as to where to poke. There is lots of code there and I
would prefer to have guided poke over a random one.

Anyway, I read through linux/interrupts.h looking for some bit, flag, or
call that I have omitted but found nothing. I understand why the
interrupt handlers have changed, but the changes made should not be
causing this problem.

Again, any and all help in finding my lost interrupt is much
appreciated.

Lastly, I would be happy to give out the entire module to anyone who
requests it, but it is about 550 lines so I did not want to attach it to
this already long post.

--
Al Niessner
818.354.0859

All opinions stated above are mine and do not necessarily reflect those
of JPL or NASA.

--------
| dS | >= 0
--------



2007-11-22 01:58:49

by Alan

[permalink] [raw]
Subject: Re: Where is the interrupt going?

> status = request_irq (apcsi[i].board_irq,
> apc8620_handler,
> IRQF_DISABLED,

You set IRQF_DISABLED

Do you then enable the interrupt anywhere later on ?

Alan

2007-11-22 02:14:32

by Kyle McMartin

[permalink] [raw]
Subject: Re: Where is the interrupt going?

On Thu, Nov 22, 2007 at 01:56:25AM +0000, Alan Cox wrote:
> > status = request_irq (apcsi[i].board_irq,
> > apc8620_handler,
> > IRQF_DISABLED,
>
> You set IRQF_DISABLED
>
> Do you then enable the interrupt anywhere later on ?
>

IRQF_DISABLED just means that the handler is atomic wrt other local
interrupts. Shouldn't be the cause of this.

cheers,
Kyle

2007-11-22 02:16:57

by Jesper Juhl

[permalink] [raw]
Subject: Re: Where is the interrupt going?

On 22/11/2007, Al Niessner <[email protected]> wrote:
>
> Quickly stated, I have a piece of hardware on the PCI bus that is
> generating an interrupt (can watch it with a scope) but my handler is
> not being called (no printk in /var/log/messages). So, where has the
> interrupt gone?
>
Just to rule out the trivial causes. Could it be that you've simply
not configured your system to log messages at the loglevel that your
printk() is using?

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2007-11-22 02:20:18

by Kyle McMartin

[permalink] [raw]
Subject: Re: Where is the interrupt going?

On Wed, Nov 21, 2007 at 05:08:30PM -0800, Al Niessner wrote:
> On with the detailed technical information. I developed a kernel module
> for an PCI card back in 2.4, moved it to 2.6.3, then 2.6.11 or so and
> now I am trying to move it to 2.6.22. When I began the to move to
> 2.6.22, I changed all of the deprecated calls for finding the card on
> the PCI bus, modified the interrupt handler prototype, and changed my
> readvv/writev to aio_read/aio_write following
> http://lwn.net/Articles/202449/. So initialization looks like this:
>

Hi Al,

>From the sounds of it, you might have an interrupt routing problem. Can
you describe the machine you have this plugged into? Possibly attaching
a copy of "dmesg" and "/proc/interrupts"?

Feel free to attach the driver source to your email if the size is
reasonable (which it sounds like it is.)

As a "big hammer" in case it is an APIC problem, please try booting the
kernel with the "noapic" parameter.

cheers,
Kyle

2007-11-22 02:50:35

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Where is the interrupt going?

On Wed, 21 Nov 2007 17:08:30 -0800
Al Niessner <[email protected]> wrote:
>
> Lastly, I would be happy to give out the entire module to anyone who
> requests it, but it is about 550 lines so I did not want to attach it
> to this already long post.
>

can you send it to me, or even better, post it somewhere online ?
I have something I'd like to check to see if you do it correct but I
can't without the code...


--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2007-11-23 00:43:17

by niessner

[permalink] [raw]
Subject: Re: Where is the interrupt going?


I do not think so. I have printk (KERN_NOTICE ...) scattered
throughout to make sure the ioctl() is succeeding and to print out
registers on the hardware. Those are showing up in /var/log/messages
without a hitch. If there is a setting for printk in interrupts, then
maybe because I would not know the macro to look for in the
configuration.

Quoting Jesper Juhl <[email protected]>, on Wed 21 Nov 2007
06:16:45 PM PST:

> On 22/11/2007, Al Niessner <[email protected]> wrote:
>>
>> Quickly stated, I have a piece of hardware on the PCI bus that is
>> generating an interrupt (can watch it with a scope) but my handler is
>> not being called (no printk in /var/log/messages). So, where has the
>> interrupt gone?
>>
> Just to rule out the trivial causes. Could it be that you've simply
> not configured your system to log messages at the loglevel that your
> printk() is using?
>
> --
> Jesper Juhl <[email protected]>
> Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
> Plain text mails only, please http://www.expita.com/nomime.html
>


2007-11-23 00:49:17

by niessner

[permalink] [raw]
Subject: Re: Where is the interrupt going?


I tried the hammer and the problem persists.
observer@bbb:~$ cat /proc/cmdline
root=UUID=8b3c3666-22c3-4c04-b399-ece266f2ef30 ro noapic quiet splash

However, I reserve the right to try the hammer again in the future.
When I look at /proc/interrupts without the APIC:
observer@bbb:~$ cat /proc/interrupts
CPU0
0: 144 XT-PIC-XT timer
1: 10 XT-PIC-XT i8042
2: 0 XT-PIC-XT cascade
5: 100000 XT-PIC-XT ohci_hcd:usb5, mxser
6: 5 XT-PIC-XT floppy
7: 1 XT-PIC-XT parport0
8: 3 XT-PIC-XT rtc
9: 1 XT-PIC-XT acpi, uhci_hcd:usb2
10: 100000 XT-PIC-XT ohci_hcd:usb4, ehci_hcd:usb6,
r128@pci:0000:01:00.0
11: 2231 XT-PIC-XT uhci_hcd:usb1, ohci_hcd:usb3, eth0
12: 130 XT-PIC-XT i8042
14: 4362 XT-PIC-XT libata
15: 15315 XT-PIC-XT libata
NMI: 0
LOC: 130125
ERR: 0
MIS: 0

I do not even see the device that I registered unless it is that
r128... line. However the code printed out in /var/log/messages:
Nov 22 16:05:27 bbb kernel: [ 104.712473] apc8620: VID = 0x10B5
Nov 22 16:05:27 bbb kernel: [ 104.712486] apc8620: mapped addr = e0bd4000
Nov 22 16:05:27 bbb kernel: [ 104.713022] apc8620: registered carrier 0
Nov 22 16:05:27 bbb kernel: [ 104.713028] apc8620: interrupt data
(0xe1083e40) on irq (10) and status (0x10)

which indicates it successfully registered without being shared. When
I have more time, I will changed the code to be a shared IRQ and try
the noapic again.

However, without the noapic /proc/interrupts looks like:
observer@bbb:~$ cat /proc/interrupts
CPU0
0: 154 IO-APIC-edge timer
1: 10 IO-APIC-edge i8042
6: 5 IO-APIC-edge floppy
7: 0 IO-APIC-edge parport0
8: 3 IO-APIC-edge rtc
9: 1 IO-APIC-fasteoi acpi
10: 0 IO-APIC-edge apc8620
12: 130 IO-APIC-edge i8042
14: 2861 IO-APIC-edge libata
15: 1049 IO-APIC-edge libata
16: 100001 IO-APIC-fasteoi ohci_hcd:usb5, mxser
17: 0 IO-APIC-fasteoi uhci_hcd:usb1, ohci_hcd:usb3
18: 0 IO-APIC-fasteoi uhci_hcd:usb2
19: 187 IO-APIC-fasteoi eth0
20: 0 IO-APIC-fasteoi ohci_hcd:usb4, r128@pci:0000:01:00.0
21: 0 IO-APIC-fasteoi ehci_hcd:usb6
NMI: 0
LOC: 8820
ERR: 0
MIS: 0


I have attached the kernel module. The apc8620 is an IndustryPack
carrier card. I can therefore open up N (in this specific case 5) sub
memory windows in the memory mapped PCI address. The kernel module
keeps track of the slot offsets from the memory mapped address so that
the user can simply use read and write instead of a zillion ugly ioctl
calls. Because the kernel module tracks the slot offsets, I place acp
state into the private data of the file pointer. There can also be
multiple carriers on the bus. So, the array in the kernel module keeps
track of the card specific details with the file pointer the slot
specific information. Both are the same structure (bad on my part I
know but I never intended to show my dirty underwear). To get data
from interrupts (asynchronous IO) I was using readv. Now I am using
aio_read and had to make some minor changes that you will see comments
about to accomidate the change.

Just noticed that r128 is not the carrier card...

Thanks for all of the help so far and I hope this information is helpful.

I almost forgot. I also attached the dmesg output and will try the
irqpoll as it suggests. It is just the IRQ 16 is not the one I am
looking for, but is probably related to my mxser problems that I will
get to later.

Quoting Kyle McMartin <[email protected]>, on Wed 21 Nov 2007 06:20:04 PM PST:

> On Wed, Nov 21, 2007 at 05:08:30PM -0800, Al Niessner wrote:
>> On with the detailed technical information. I developed a kernel module
>> for an PCI card back in 2.4, moved it to 2.6.3, then 2.6.11 or so and
>> now I am trying to move it to 2.6.22. When I began the to move to
>> 2.6.22, I changed all of the deprecated calls for finding the card on
>> the PCI bus, modified the interrupt handler prototype, and changed my
>> readvv/writev to aio_read/aio_write following
>> http://lwn.net/Articles/202449/. So initialization looks like this:
>>
>
> Hi Al,
>
> From the sounds of it, you might have an interrupt routing problem. Can
> you describe the machine you have this plugged into? Possibly attaching
> a copy of "dmesg" and "/proc/interrupts"?
>
> Feel free to attach the driver source to your email if the size is
> reasonable (which it sounds like it is.)
>
> As a "big hammer" in case it is an APIC problem, please try booting the
> kernel with the "noapic" parameter.
>
> cheers,
> Kyle
>



Attachments:
(No filename) (4.90 kB)
apc8620.h (5.42 kB)
apc8620.c (17.72 kB)
dmesg.out (23.42 kB)
Download all attachments

2007-11-23 01:20:57

by Robert Hancock

[permalink] [raw]
Subject: Re: Where is the interrupt going?

[email protected] wrote:
>
> I tried the hammer and the problem persists.
> observer@bbb:~$ cat /proc/cmdline
> root=UUID=8b3c3666-22c3-4c04-b399-ece266f2ef30 ro noapic quiet splash
>
> However, I reserve the right to try the hammer again in the future. When
> I look at /proc/interrupts without the APIC:
> observer@bbb:~$ cat /proc/interrupts
> CPU0
> 0: 144 XT-PIC-XT timer
> 1: 10 XT-PIC-XT i8042
> 2: 0 XT-PIC-XT cascade
> 5: 100000 XT-PIC-XT ohci_hcd:usb5, mxser
> 6: 5 XT-PIC-XT floppy
> 7: 1 XT-PIC-XT parport0
> 8: 3 XT-PIC-XT rtc
> 9: 1 XT-PIC-XT acpi, uhci_hcd:usb2
> 10: 100000 XT-PIC-XT ohci_hcd:usb4, ehci_hcd:usb6,
> r128@pci:0000:01:00.0
> 11: 2231 XT-PIC-XT uhci_hcd:usb1, ohci_hcd:usb3, eth0
> 12: 130 XT-PIC-XT i8042
> 14: 4362 XT-PIC-XT libata
> 15: 15315 XT-PIC-XT libata
> NMI: 0
> LOC: 130125
> ERR: 0
> MIS: 0
>
> I do not even see the device that I registered unless it is that r128...
> line. However the code printed out in /var/log/messages:
> Nov 22 16:05:27 bbb kernel: [ 104.712473] apc8620: VID = 0x10B5
> Nov 22 16:05:27 bbb kernel: [ 104.712486] apc8620: mapped addr = e0bd4000
> Nov 22 16:05:27 bbb kernel: [ 104.713022] apc8620: registered carrier 0
> Nov 22 16:05:27 bbb kernel: [ 104.713028] apc8620: interrupt data
> (0xe1083e40) on irq (10) and status (0x10)
>
> which indicates it successfully registered without being shared. When I
> have more time, I will changed the code to be a shared IRQ and try the
> noapic again.

You're not calling pci_enable_device anywhere. Unless you do this before
requesting the IRQ, the IRQ routing may not be set up properly for your
device and it may not even give you the right IRQ number. You should see
a line like this somewhere in dmesg for the IRQ your card is on:

ACPI: PCI Interrupt 0000:00:1f.2[D] -> GSI 19 (level, low) -> IRQ 17

I think this behavior changed in the somewhat recent past..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-11-23 01:28:14

by Alan

[permalink] [raw]
Subject: Re: Where is the interrupt going?

On Thu, 22 Nov 2007 16:48:53 -0800
[email protected] wrote:

>
> I tried the hammer and the problem persists.

See my earlier email - your driver registers the irq with IRQF_DISABLED
then never enables it.

Subject: Re: Where is the interrupt going?

On Friday 23 November 2007, Alan Cox wrote:
> On Thu, 22 Nov 2007 16:48:53 -0800
> [email protected] wrote:
>
> >
> > I tried the hammer and the problem persists.
>
> See my earlier email - your driver registers the irq with IRQF_DISABLED
> then never enables it.

As already explained by Kyle IRQF_DISABLED shouldn't matter here.

[ Nowadays IRQF_DISABLED only tells kernel/irq/handle.c::handle_IRQ_event()
to not enable local interrupts before calling your IRQ handler.

I've recently removed IRQF_DISABLED from IDE after noticing this. ]

Bart

2007-11-23 03:16:22

by Marin Mitov

[permalink] [raw]
Subject: Re: Where is the interrupt going?

Hi,

On Friday 23 November 2007 02:48:53 am you wrote:
> I tried the hammer and the problem persists.
> observer@bbb:~$ cat /proc/cmdline
> root=UUID=8b3c3666-22c3-4c04-b399-ece266f2ef30 ro noapic quiet splash
>
> However, I reserve the right to try the hammer again in the future.
> When I look at /proc/interrupts without the APIC:
> observer@bbb:~$ cat /proc/interrupts
> CPU0
> 0: 144 XT-PIC-XT timer
> 1: 10 XT-PIC-XT i8042
> 2: 0 XT-PIC-XT cascade
> 5: 100000 XT-PIC-XT ohci_hcd:usb5, mxser
> 6: 5 XT-PIC-XT floppy
> 7: 1 XT-PIC-XT parport0
> 8: 3 XT-PIC-XT rtc
> 9: 1 XT-PIC-XT acpi, uhci_hcd:usb2
> 10: 100000 XT-PIC-XT ohci_hcd:usb4, ehci_hcd:usb6,
> r128@pci:0000:01:00.0
> 11: 2231 XT-PIC-XT uhci_hcd:usb1, ohci_hcd:usb3, eth0
> 12: 130 XT-PIC-XT i8042
> 14: 4362 XT-PIC-XT libata
> 15: 15315 XT-PIC-XT libata
> NMI: 0
> LOC: 130125
> ERR: 0
> MIS: 0
>
> I do not even see the device that I registered unless it is that
> r128... line. However the code printed out in /var/log/messages:

No, this is your radeon 128 board (on AGP I suppose). Could be integrated
on the mobo if it is a server mobo.

> Nov 22 16:05:27 bbb kernel: [ 104.712473] apc8620: VID = 0x10B5
> Nov 22 16:05:27 bbb kernel: [ 104.712486] apc8620: mapped addr = e0bd4000
> Nov 22 16:05:27 bbb kernel: [ 104.713022] apc8620: registered carrier 0
> Nov 22 16:05:27 bbb kernel: [ 104.713028] apc8620: interrupt data
> (0xe1083e40) on irq (10) and status (0x10)

Here is the problem (I suppose):
if status (0x10 hex or 16 decimal) is the value returned by request_irq:
status = request_irq (apcsi[i].board_irq,
apc8620_handler,
IRQF_DISABLED,
DEVICE_NAME,
(void*)&apcsi[i]);
(from your first post), that means the irq is NOT registered, because
according to the LDD v.3 book:
<cite>
The value returned from request_irq to the requesting function is either 0
to indicate success or a negative error code, as usual. It’s not uncommon
for the function to return -EBUSY to signal that another driver is already
using the requested interrupt line.
</cite>
If you grep the kernels's include directory for EBUSY you will find:
#define EBUSY 16 /* Device or resource busy */
in include/asm-generic/errno-base.h

So I think your mobo has shared (with other devices) irq line on the
PCI/PCIe slot you use for your hardware and these other devices have
already registered shered irq handlers for the same irq (10), so the
attempt to register nonshared irq fails.

Either try to register the irq as shared, or put the hardware on
another slot whose irq line is not shared with other devises
(if such one exists). This info should be available from the mobo
manual book.
>
> which indicates it successfully registered without being shared.

No, as I already explained.
The only problem :-) in my explanation is:
request_irq returns EBUSY (not -EBUSY as should be)

Marin Mitov

> When
> I have more time, I will changed the code to be a shared IRQ and try
> the noapic again.
>
> However, without the noapic /proc/interrupts looks like:
> observer@bbb:~$ cat /proc/interrupts
> CPU0
> 0: 154 IO-APIC-edge timer
> 1: 10 IO-APIC-edge i8042
> 6: 5 IO-APIC-edge floppy
> 7: 0 IO-APIC-edge parport0
> 8: 3 IO-APIC-edge rtc
> 9: 1 IO-APIC-fasteoi acpi
> 10: 0 IO-APIC-edge apc8620
> 12: 130 IO-APIC-edge i8042
> 14: 2861 IO-APIC-edge libata
> 15: 1049 IO-APIC-edge libata
> 16: 100001 IO-APIC-fasteoi ohci_hcd:usb5, mxser
> 17: 0 IO-APIC-fasteoi uhci_hcd:usb1, ohci_hcd:usb3
> 18: 0 IO-APIC-fasteoi uhci_hcd:usb2
> 19: 187 IO-APIC-fasteoi eth0
> 20: 0 IO-APIC-fasteoi ohci_hcd:usb4, r128@pci:0000:01:00.0
> 21: 0 IO-APIC-fasteoi ehci_hcd:usb6
> NMI: 0
> LOC: 8820
> ERR: 0
> MIS: 0
>
>
> I have attached the kernel module. The apc8620 is an IndustryPack
> carrier card. I can therefore open up N (in this specific case 5) sub
> memory windows in the memory mapped PCI address. The kernel module
> keeps track of the slot offsets from the memory mapped address so that
> the user can simply use read and write instead of a zillion ugly ioctl
> calls. Because the kernel module tracks the slot offsets, I place acp
> state into the private data of the file pointer. There can also be
> multiple carriers on the bus. So, the array in the kernel module keeps
> track of the card specific details with the file pointer the slot
> specific information. Both are the same structure (bad on my part I
> know but I never intended to show my dirty underwear). To get data
> from interrupts (asynchronous IO) I was using readv. Now I am using
> aio_read and had to make some minor changes that you will see comments
> about to accomidate the change.
>
> Just noticed that r128 is not the carrier card...
>
> Thanks for all of the help so far and I hope this information is helpful.
>
> I almost forgot. I also attached the dmesg output and will try the
> irqpoll as it suggests. It is just the IRQ 16 is not the one I am
> looking for, but is probably related to my mxser problems that I will
> get to later.
>
> Quoting Kyle McMartin <[email protected]>, on Wed 21 Nov 2007 06:20:04 PM
PST:
> > On Wed, Nov 21, 2007 at 05:08:30PM -0800, Al Niessner wrote:
> >> On with the detailed technical information. I developed a kernel module
> >> for an PCI card back in 2.4, moved it to 2.6.3, then 2.6.11 or so and
> >> now I am trying to move it to 2.6.22. When I began the to move to
> >> 2.6.22, I changed all of the deprecated calls for finding the card on
> >> the PCI bus, modified the interrupt handler prototype, and changed my
> >> readvv/writev to aio_read/aio_write following
> >> http://lwn.net/Articles/202449/. So initialization looks like this:
> >
> > Hi Al,
> >
> > From the sounds of it, you might have an interrupt routing problem. Can
> > you describe the machine you have this plugged into? Possibly attaching
> > a copy of "dmesg" and "/proc/interrupts"?
> >
> > Feel free to attach the driver source to your email if the size is
> > reasonable (which it sounds like it is.)
> >
> > As a "big hammer" in case it is an APIC problem, please try booting the
> > kernel with the "noapic" parameter.
> >
> > cheers,
> > Kyle


2007-11-23 04:32:16

by niessner

[permalink] [raw]
Subject: Re: Where is the interrupt going?


Quite right. I read it too quickly and thought it had succeeded when
it had failed. I will modify the module to do the shared IRQ and then
try the noapic test again. Exactly why I reserved the right to do it
again.

This is good because it means the hammer may work after all.

Thank you very much and I will post to let you know the outcome.

Quoting Marin Mitov <[email protected]>, on Thu 22 Nov 2007 07:18:01 PM PST:

> Hi,
>
> On Friday 23 November 2007 02:48:53 am you wrote:
>> I tried the hammer and the problem persists.
>> observer@bbb:~$ cat /proc/cmdline
>> root=UUID=8b3c3666-22c3-4c04-b399-ece266f2ef30 ro noapic quiet splash
>>
>> However, I reserve the right to try the hammer again in the future.
>> When I look at /proc/interrupts without the APIC:
>> observer@bbb:~$ cat /proc/interrupts
>> CPU0
>> 0: 144 XT-PIC-XT timer
>> 1: 10 XT-PIC-XT i8042
>> 2: 0 XT-PIC-XT cascade
>> 5: 100000 XT-PIC-XT ohci_hcd:usb5, mxser
>> 6: 5 XT-PIC-XT floppy
>> 7: 1 XT-PIC-XT parport0
>> 8: 3 XT-PIC-XT rtc
>> 9: 1 XT-PIC-XT acpi, uhci_hcd:usb2
>> 10: 100000 XT-PIC-XT ohci_hcd:usb4, ehci_hcd:usb6,
>> r128@pci:0000:01:00.0
>> 11: 2231 XT-PIC-XT uhci_hcd:usb1, ohci_hcd:usb3, eth0
>> 12: 130 XT-PIC-XT i8042
>> 14: 4362 XT-PIC-XT libata
>> 15: 15315 XT-PIC-XT libata
>> NMI: 0
>> LOC: 130125
>> ERR: 0
>> MIS: 0
>>
>> I do not even see the device that I registered unless it is that
>> r128... line. However the code printed out in /var/log/messages:
>
> No, this is your radeon 128 board (on AGP I suppose). Could be integrated
> on the mobo if it is a server mobo.
>
>> Nov 22 16:05:27 bbb kernel: [ 104.712473] apc8620: VID = 0x10B5
>> Nov 22 16:05:27 bbb kernel: [ 104.712486] apc8620: mapped addr = e0bd4000
>> Nov 22 16:05:27 bbb kernel: [ 104.713022] apc8620: registered carrier 0
>> Nov 22 16:05:27 bbb kernel: [ 104.713028] apc8620: interrupt data
>> (0xe1083e40) on irq (10) and status (0x10)
>
> Here is the problem (I suppose):
> if status (0x10 hex or 16 decimal) is the value returned by request_irq:
> status = request_irq (apcsi[i].board_irq,
> apc8620_handler,
> IRQF_DISABLED,
> DEVICE_NAME,
> (void*)&apcsi[i]);
> (from your first post), that means the irq is NOT registered, because
> according to the LDD v.3 book:
> <cite>
> The value returned from request_irq to the requesting function is either 0
> to indicate success or a negative error code, as usual. It’s not uncommon
> for the function to return -EBUSY to signal that another driver is already
> using the requested interrupt line.
> </cite>
> If you grep the kernels's include directory for EBUSY you will find:
> #define EBUSY 16 /* Device or resource busy */
> in include/asm-generic/errno-base.h
>
> So I think your mobo has shared (with other devices) irq line on the
> PCI/PCIe slot you use for your hardware and these other devices have
> already registered shered irq handlers for the same irq (10), so the
> attempt to register nonshared irq fails.
>
> Either try to register the irq as shared, or put the hardware on
> another slot whose irq line is not shared with other devises
> (if such one exists). This info should be available from the mobo
> manual book.
>>
>> which indicates it successfully registered without being shared.
>
> No, as I already explained.
> The only problem :-) in my explanation is:
> request_irq returns EBUSY (not -EBUSY as should be)
>
> Marin Mitov
>
>> When
>> I have more time, I will changed the code to be a shared IRQ and try
>> the noapic again.
>>
>> However, without the noapic /proc/interrupts looks like:
>> observer@bbb:~$ cat /proc/interrupts
>> CPU0
>> 0: 154 IO-APIC-edge timer
>> 1: 10 IO-APIC-edge i8042
>> 6: 5 IO-APIC-edge floppy
>> 7: 0 IO-APIC-edge parport0
>> 8: 3 IO-APIC-edge rtc
>> 9: 1 IO-APIC-fasteoi acpi
>> 10: 0 IO-APIC-edge apc8620
>> 12: 130 IO-APIC-edge i8042
>> 14: 2861 IO-APIC-edge libata
>> 15: 1049 IO-APIC-edge libata
>> 16: 100001 IO-APIC-fasteoi ohci_hcd:usb5, mxser
>> 17: 0 IO-APIC-fasteoi uhci_hcd:usb1, ohci_hcd:usb3
>> 18: 0 IO-APIC-fasteoi uhci_hcd:usb2
>> 19: 187 IO-APIC-fasteoi eth0
>> 20: 0 IO-APIC-fasteoi ohci_hcd:usb4, r128@pci:0000:01:00.0
>> 21: 0 IO-APIC-fasteoi ehci_hcd:usb6
>> NMI: 0
>> LOC: 8820
>> ERR: 0
>> MIS: 0
>>
>>
>> I have attached the kernel module. The apc8620 is an IndustryPack
>> carrier card. I can therefore open up N (in this specific case 5) sub
>> memory windows in the memory mapped PCI address. The kernel module
>> keeps track of the slot offsets from the memory mapped address so that
>> the user can simply use read and write instead of a zillion ugly ioctl
>> calls. Because the kernel module tracks the slot offsets, I place acp
>> state into the private data of the file pointer. There can also be
>> multiple carriers on the bus. So, the array in the kernel module keeps
>> track of the card specific details with the file pointer the slot
>> specific information. Both are the same structure (bad on my part I
>> know but I never intended to show my dirty underwear). To get data
>> from interrupts (asynchronous IO) I was using readv. Now I am using
>> aio_read and had to make some minor changes that you will see comments
>> about to accomidate the change.
>>
>> Just noticed that r128 is not the carrier card...
>>
>> Thanks for all of the help so far and I hope this information is helpful.
>>
>> I almost forgot. I also attached the dmesg output and will try the
>> irqpoll as it suggests. It is just the IRQ 16 is not the one I am
>> looking for, but is probably related to my mxser problems that I will
>> get to later.
>>
>> Quoting Kyle McMartin <[email protected]>, on Wed 21 Nov 2007 06:20:04 PM
> PST:
>> > On Wed, Nov 21, 2007 at 05:08:30PM -0800, Al Niessner wrote:
>> >> On with the detailed technical information. I developed a kernel module
>> >> for an PCI card back in 2.4, moved it to 2.6.3, then 2.6.11 or so and
>> >> now I am trying to move it to 2.6.22. When I began the to move to
>> >> 2.6.22, I changed all of the deprecated calls for finding the card on
>> >> the PCI bus, modified the interrupt handler prototype, and changed my
>> >> readvv/writev to aio_read/aio_write following
>> >> http://lwn.net/Articles/202449/. So initialization looks like this:
>> >
>> > Hi Al,
>> >
>> > From the sounds of it, you might have an interrupt routing problem. Can
>> > you describe the machine you have this plugged into? Possibly attaching
>> > a copy of "dmesg" and "/proc/interrupts"?
>> >
>> > Feel free to attach the driver source to your email if the size is
>> > reasonable (which it sounds like it is.)
>> >
>> > As a "big hammer" in case it is an APIC problem, please try booting the
>> > kernel with the "noapic" parameter.
>> >
>> > cheers,
>> > Kyle
>
>
>


2007-11-23 08:17:17

by Jiri Slaby

[permalink] [raw]
Subject: Re: Where is the interrupt going?

On 11/23/2007 04:18 AM, Marin Mitov wrote:
> request_irq returns EBUSY (not -EBUSY as should be)

Because he writes -status to the output.

2007-11-23 10:39:35

by Alan

[permalink] [raw]
Subject: Re: Where is the interrupt going?

On Fri, 23 Nov 2007 02:58:55 +0100
Bartlomiej Zolnierkiewicz <[email protected]> wrote:

> On Friday 23 November 2007, Alan Cox wrote:
> > On Thu, 22 Nov 2007 16:48:53 -0800
> > [email protected] wrote:
> >
> > >
> > > I tried the hammer and the problem persists.
> >
> > See my earlier email - your driver registers the irq with IRQF_DISABLED
> > then never enables it.
>
> As already explained by Kyle IRQF_DISABLED shouldn't matter here.
>
> [ Nowadays IRQF_DISABLED only tells kernel/irq/handle.c::handle_IRQ_event()
> to not enable local interrupts before calling your IRQ handler.
>
> I've recently removed IRQF_DISABLED from IDE after noticing this. ]

Bartlomiej is of course correct. Thats what you get for replying late at
night in a hurry.

I'm not sure IDE can work without it because you need to lock out the
timer events (old IDE doesn't handle this at all on SMP though so it
wants fixing properly anyway).

Alan

2007-11-23 21:54:18

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: Where is the interrupt going?


On Wed, 2007-11-21 at 17:08 -0800, Al Niessner wrote:
>
> p8620 = pci_get_device (APC8620_VENDOR_ID, APC8620_DEVICE_ID, p8620);
> <... fail if p8620 is 0 ...>
> apcsi[i].ret_val = register_chrdev (MAJOR_NUM,
>
> DEVICE_NAME,
>
> &apc8620_ops);
> <... fail if ret_val < 0 ...>
> apcsi[i].board_irq = p8620->irq;
> status = request_irq (apcsi[i].board_irq,
> apc8620_handler,
> IRQF_DISABLED,
> DEVICE_NAME,
> (void*)&apcsi[i]);

First, that's obviously not the proper way to do a PCI driver but I
suppose you know that :-)

Then, make sure you call pci_enable_device() at one point, don't some
platforms perform the actual IRQ routing that late ? (And don't sample
pdev->irq before the pci_enable_device(), sample it afterward).

Cheers,
Ben.


2007-11-26 22:50:15

by Al Niessner

[permalink] [raw]
Subject: Re: Where is the interrupt going?


Yes, as also pointed out by Arjan Van de Ven, I was missing the
pci_enable_device() call. This seems related to the deprecation of
pci_find_device (or something like that) in favor of pci_get_device.
Well, by adding the pci_enable_device it all works well.

On Sat, 2007-11-24 at 08:53 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2007-11-21 at 17:08 -0800, Al Niessner wrote:
> >
> > p8620 = pci_get_device (APC8620_VENDOR_ID, APC8620_DEVICE_ID, p8620);
> > <... fail if p8620 is 0 ...>
> > apcsi[i].ret_val = register_chrdev (MAJOR_NUM,
> >
> > DEVICE_NAME,
> >
> > &apc8620_ops);
> > <... fail if ret_val < 0 ...>
> > apcsi[i].board_irq = p8620->irq;
> > status = request_irq (apcsi[i].board_irq,
> > apc8620_handler,
> > IRQF_DISABLED,
> > DEVICE_NAME,
> > (void*)&apcsi[i]);
>
> First, that's obviously not the proper way to do a PCI driver but I
> suppose you know that :-)
>
> Then, make sure you call pci_enable_device() at one point, don't some
> platforms perform the actual IRQ routing that late ? (And don't sample
> pdev->irq before the pci_enable_device(), sample it afterward).
>
> Cheers,
> Ben.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Al Niessner
818.354.0859

All opinions stated above are mine and do not necessarily reflect those
of JPL or NASA.

--------
| dS | >= 0
--------