2007-02-28 13:54:49

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [RFC] killing the NR_IRQS arrays.

Arnd Bergmann <[email protected]> writes:

>
> I have to admit I still don't really understand how this works
> at all. Can a driver that uses msi-x have different handlers
> for each of those interrupts registered simultaneously?

Yes, and the irqs can be routed at different cpus independently.
However not all hardware supports all 4K irqs. 4K is the implementation
defined maximu. Although infiniband HCA's are rumored to support a
lot of irqs and it isn't uncommon for simpler nics to support 4 or so.

Conceptually think of it as having an irq controller embedded in your
pci device.

The MSI messages are writes to special addresses that are then
converted into CPU interrupts.

> I would expect that instead there should be only one 'struct irq'
> for the device, with the handler getting a 12 bit number argument.

No. That would be unnecessary coalescing. Even if that was what the
hardware layer gave us the (and it doesn't) the generic layers should
demux these things as much as is reasonable.

>> > s390: got rid of irq numbers already
>>
>> Yes. I should really look at that more and see if I could bring
>> s390 into the generic irq code with my planned changes.
>
> I don't think there is much point in changing the s390 code, but
> the way it is solved there may be interesting for other buses
> as well. The interrupt handler there is not being registered
> explicitly, but is part of the driver (in case of subchannel)
> or of the device (in case of ccw_device) data structure.
>
> Similarly, in a pci device, one could imagine that the
> struct pci_driver contains a irq_handler_t member that
> is registered from the pci_device_probe() function
> if present.

Yes. There is some potential there. Although we would have to go
through an extra hoop to make it a pci specific handler type.

>> > Note that we can even start converting device drivers first, before
>> > moving away from irq numbers. A typical PCI driver should get
>> > somewhat simpler by the conversion, and when they are all converted,
>> > we can replace pci_dev->irq with a struct irq* under the covers.
>>
>> Reasonable if it is easy and straight forward.
>> Something like pci_request_irq(dev,....) and the helper looks at
>> dev->irq under the covers and calls request_irq or whatever makes
>> sense. Is this what you are thinking. Examples would help me here.
>
> Ok, I had an example in on of my previous posts, but based on the
> discussion since then, it has become significantly simpler, basically
> reducing the work to
>
> struct irq *pci_irq_request(struct pci_device *dev,
> irq_handler_t handler)
> {
> if (!dev->irq)
> return -ENODEV;
>
> return irq_request(irq, handler, IRQF_SHARED,
> &dev->driver->name, dev);
> }
> int pci_irq_free(struct pci_device *dev)
> {
> return irq_free(dev->irq, dev);
> }
>
> The most significant change of this to the current code
> would be that we can pass arguments down to irq_request
> automatically, e.g. the irq handler can always get the
> pci_device as its dev_id.

Yes. Mostly. Since dev_id is what is passed back to the irq handler,
it makes sense to pass the device when the irq is registered.
Passing the driver a pointer to the driver specific structure (not
the pci device) make a lot more of sense from an efficiency
standpoint. Now it may make sense to remove the irq parameter
from irq_handler_t, and require drivers to look at their dev_id to
see which irq they really are processing.

There is a danger here of making things so generic you don't have good
performance, or the code becomes unnecessarily complex.

>> For talking to user space I expect we will have numbers for a long time
>> to come yet.
>
> I was wondering about that. Do you only mean /proc/interrupts or
> are there other user interfaces we need to worry about?

Yes. There are other interfaces like /proc/irq/XXX/smp_affinity,
for irq migration. There are also device specific ioctls. There is
lspci. I don't know what all else, and given the current state of the
kernel it is hard to grep for.

> For /proc/interrupts, what could break if we have interrupt numbers
> only local to each controller and potentially duplicate numbers
> in the list? It's good to be paranoid about changes to proc files,
> but I can definitely see value in having meaningful interrupt
> numbers in there instead of making up a more or less random mapping
> to a flat number space.

Well I can have meaning full flat numbers and on i386 and x86_64
except for msi I have that. The problem is that for the numbers
to have meaning I get a very sparse usage of the numbers, because
very frequently the hardware interrupt controllers has pins that
are not connected up, so I have about an order of magnitude more
numbers then I have actually irqs in use. That is before I start
reserving irq numbers for MSI.

For MSI (since they cannot be shared) it would actually make a lot of
sense to make the numbers domain,bus,device,func,(Nth device irq) but
I can't because bus,device,func is 16bits worth of number. Add in
12 more bits for the worst case device assignment and I am up to
28 bits, and the rest of the 32bits can be used for domain or
something like that. So while the current unsigned int irq works fine
for that backing that with a fixed size array allocated at compile
time is just hopeless.

Eric