LinuxLists.cc - [PATCH] pci: change msi-x vector to 32bit

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Fri, Aug 15, 2008 at 8:26 PM, H. Peter Anvin <[email protected]> wrote:
> The 28 bits aren't enough, are they: we need domain as well (and surely we can have more than 16 domains?)

current code
static unsigned int build_irq_for_pci_dev(struct pci_dev *dev)
{
unsigned int irq;

irq = dev->bus->number;
irq <<= 8;
irq |= dev->devfn;
irq <<= 12;

return irq;
}

int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
{
unsigned int irq;
int ret;
unsigned int irq_want;

irq_want = build_irq_for_pci_dev(dev) + 0x100;

irq = create_irq(irq_want);

domain is not used yet.

need to make vecter_irq to vector_domain_irq

irq_desc(irq) change to domain_irq_desc(domain, irq)

...

YH

YH

2008-08-16 08:23:56

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

"H. Peter Anvin" <[email protected]> writes:

> The 28 bits aren't enough, are they: we need domain as well (and surely we can
> have more than 16 domains?)

The justification is questionable. The fact that the irq number
reported for msi-x vectors is only 16bits is a bug. Everywhere else
in the kernel an irq number is stored in a 32bit field.

We also have a few other bits of craziness in the msix interface.
The entry field in msix_entry is unnecessary.

In fact now that we have a linked list of irq entries we stop passing
the structure entirely.

Eric

2008-08-16 09:01:16

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Sat, Aug 16, 2008 at 1:17 AM, Eric W. Biederman
<[email protected]> wrote:
> "H. Peter Anvin" <[email protected]> writes:
>
>> The 28 bits aren't enough, are they: we need domain as well (and surely we can
>> have more than 16 domains?)
>
> The justification is questionable. The fact that the irq number
> reported for msi-x vectors is only 16bits is a bug. Everywhere else
> in the kernel an irq number is stored in a 32bit field.

they assume irq < 65536?

if use NR_IRQS = NR_VEVTORS * NR_CPUS...
256*1024 ==> .... PROBLEM...

>
> We also have a few other bits of craziness in the msix interface.
> The entry field in msix_entry is unnecessary.

Agreed. qla will have another copy again.

>
> In fact now that we have a linked list of irq entries we stop passing
> the structure entirely.

msi_list in dev?

YH

2008-08-16 14:50:55

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Fri, 2008-08-15 at 23:42 -0700, Yinghai Lu wrote:
> On Fri, Aug 15, 2008 at 8:26 PM, H. Peter Anvin <[email protected]> wrote:
> > The 28 bits aren't enough, are they: we need domain as well (and surely we can have more than 16 domains?)
>
> current code
> static unsigned int build_irq_for_pci_dev(struct pci_dev *dev)
> {
> unsigned int irq;
>
> irq = dev->bus->number;
> irq <<= 8;
> irq |= dev->devfn;
> irq <<= 12;
>
> return irq;
> }

Where exactly is this code in the kernel? Most arches assume the irq is
an index to a compact table bounded by NR_IRQS, so something like this
would violate that assumption.

This is also the reason why it doesn't matter to assign a u32 irq to a
u16 vector. The u32 irq is a historical thing; it used to be u16 but
then sparc used a vector mapping scheme for the irq number, which I
don't believe it does any more.

> int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
> {
> unsigned int irq;
> int ret;
> unsigned int irq_want;
>
> irq_want = build_irq_for_pci_dev(dev) + 0x100;
>
> irq = create_irq(irq_want);
>
> domain is not used yet.
>
> need to make vecter_irq to vector_domain_irq
>
> irq_desc(irq) change to domain_irq_desc(domain, irq)

So the assumption underlying this is that the same bus/dev/function on
different domains would share irq space ... that sounds possible but a
bit strange.

James

2008-08-16 15:57:59

by Alan

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

> Where exactly is this code in the kernel? Most arches assume the irq is
> an index to a compact table bounded by NR_IRQS, so something like this
> would violate that assumption.

Yes, which is no bad thing for some platforms. There are some driver
assumptions like that but those have also been stomped.

Alan

2008-08-16 16:13:46

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Sat, 2008-08-16 at 16:39 +0100, Alan Cox wrote:
> > Where exactly is this code in the kernel? Most arches assume the irq is
> > an index to a compact table bounded by NR_IRQS, so something like this
> > would violate that assumption.
>
> Yes, which is no bad thing for some platforms. There are some driver
> assumptions like that but those have also been stomped.

I'm not saying we couldn't do this, or even that we shouldn't; I'm just
asking why would we want to?

All arches currently seem to have show_interrupts() which loop over
0..NR_IRQS where the interrupt is printed as %d. In this encoded scheme
they would show up with rather nastily large numbers that have no
visible meaning unless we switch to hex for displaying them.

What I'm really saying is that irq as the interrupt number is really the
*user's* handle for the interrupt not the machine's, so it needs to be
something the user is comfortable with. We could overcome this
objection by encoding the number to something meaningful for the
user ... I'm just asking if there's any benefit to doing this?

James

2008-08-16 18:57:13

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Sat, Aug 16, 2008 at 9:13 AM, James Bottomley
<[email protected]> wrote:
> On Sat, 2008-08-16 at 16:39 +0100, Alan Cox wrote:
>> > Where exactly is this code in the kernel? Most arches assume the irq is
>> > an index to a compact table bounded by NR_IRQS, so something like this
>> > would violate that assumption.
>>
>> Yes, which is no bad thing for some platforms. There are some driver
>> assumptions like that but those have also been stomped.
>
> I'm not saying we couldn't do this, or even that we shouldn't; I'm just
> asking why would we want to?
>
> All arches currently seem to have show_interrupts() which loop over
> 0..NR_IRQS where the interrupt is printed as %d. In this encoded scheme
> they would show up with rather nastily large numbers that have no
> visible meaning unless we switch to hex for displaying them.
>
> What I'm really saying is that irq as the interrupt number is really the
> *user's* handle for the interrupt not the machine's, so it needs to be
> something the user is comfortable with. We could overcome this
> objection by encoding the number to something meaningful for the
> user ... I'm just asking if there's any benefit to doing this?
>
the code is tip/irq/sparseirq or tip/master

story:
1. for x86_64: first we have NR_IRQS = NR_CPUS * NR_VECTORS, because
it already supports per_cpu vector
2. SGI want MAX_SMP support: NR_CPUS=4096, so everything is broken.
3. Mike spent some time to make every array [NR_CPUS] to per_cpu
define as possible.
4. Mike or someone else reduce NR_IRQS to 224, because NR=256*4096,
will make kstat_irqs[NR_CPUS][NR_VECTORS*NR_VECTORS] too big, and it
could be complied.
5. IBM guys report their one server is broken, that system GSI > 256,
so some irq can not work.
6. Yinghai tried one patch change NR_IRQS=32*NR_CPUS., but sgi said it
still broke their system. --- for 2.6.27
7. Eric provide one patch NR_IRQS = min(32*NR_CPUS, NR_VECTORS *
MAX_IO_APICS) --- for 2.6.27
8. For 2.6.28 later, Yinghai add code dyn_array, and probe nr_irqs, so
NR_IRQS related will be dynamically allocated after nr_irqs is probed.
9. Eric said using dyn_array still waste ram, because a lot of
irq_desc is not used. when MSI-X is involved, some card could use 256
vectors or 4096 in theory.
10. Eric said he had one dyn irq_desc, with 90% done. but didn't have
time to work it out left 10%
11. Yinghai add sparese_irq support. those array will be increased by
32, and be claimed one by one.
12. according to Eric, we could have irq spread out [0, -1U), irq =
bus/dev/fn + entry_of_msix
13. with sparseirq, /proc/interrupts will have irq_number in hex.

but msix current cached irq number, and it only use 16bit to store
unsigned int irq., and later cards will call request_irq with
truncated irq_number...card will fallback to MSI or INTa

only two places need to be changed about that.

BTW, any reason qlogic card need to cache that irq number second times?

YH

system with qlogic and lpfc

LBSuse:~ # cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
CPU6 CPU7 CPU8 CPU9 CPU10 CPU11
CPU12 CPU13 CPU14 CPU15
0x0: 111 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 IO-APIC-edge timer
0x4: 450 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 IO-APIC-edge serial
0x7: 1 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 IO-APIC-edge
0x8: 1 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 IO-APIC-edge rtc0
0x9: 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 IO-APIC-fasteoi acpi
0x17: 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 IO-APIC-fasteoi sata_nv
0x16: 140 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 IO-APIC-fasteoi
ohci_hcd:usb2, sata_nv
0x15: 384 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 IO-APIC-fasteoi
ehci_hcd:usb1
0x14: 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 IO-APIC-fasteoi sata_nv
0x10: 1083 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 IO-APIC-fasteoi aacraid
0x2e: 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 IO-APIC-fasteoi sata_nv
0x2d: 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 IO-APIC-fasteoi sata_nv
0x2c: 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 IO-APIC-fasteoi sata_nv
0x50100: 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 PCI-MSI-edge aerdrv
0x70100: 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 PCI-MSI-edge aerdrv
0x78100: 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 PCI-MSI-edge aerdrv
0x8058100: 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 PCI-MSI-edge
aerdrv
0x8070100: 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 PCI-MSI-edge
aerdrv
0x8078100: 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 PCI-MSI-edge
aerdrv
0x8300100: 41 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 PCI-MSI-edge
qla2xxx (default)
0x83000ff: 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 PCI-MSI-edge
qla2xxx (rsp_q)
0x8301100: 41 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 PCI-MSI-edge
qla2xxx (default)
0x83010ff: 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 PCI-MSI-edge
qla2xxx (rsp_q)
0x300100: 2 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 PCI-MSI-edge lpfc
0x301100: 2 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 PCI-MSI-edge lpfc
0x40100: 326 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 none-edge
0x48100: 328 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 none-edge
0x8040100: 2222 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 PCI-MSI-edge eth2
0x8048100: 326 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 none-edge
NMI: 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 Non-maskable interrupts
LOC: 8782 5209 3029 3222 4556 3328
2862 2782 2730 3218 2742 2655
3664 3099 3146 3356 Local timer interrupts
RES: 904 2930 98 65 1083 3723
158 84 46 1899 157 60
2476 971 114 97 Rescheduling interrupts
CAL: 12 89 71 65 65 142
77 66 65 118 77 67
66 106 72 67 function call interrupts
TLB: 7 90 18 5 3 115
16 10 3 123 19 5
2 157 18 3 TLB shootdowns
TRM: 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 Threshold APIC interrupts
SPU: 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 Spurious interrupts
ERR: 1

system with neptune:
LBSuse:~ # cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
CPU6 CPU7
0x0: 92 0 0 0 0 0
0 1 IO-APIC-edge timer
0x4: 0 0 0 0 0 0
1 532 IO-APIC-edge serial
0x7: 1 0 0 0 0 0
0 0 IO-APIC-edge
0x8: 0 0 0 0 0 0
0 1 IO-APIC-edge rtc0
0x9: 0 0 0 0 0 0
0 0 IO-APIC-fasteoi acpi
0x17: 0 0 0 0 0
0 0 0 IO-APIC-fasteoi sata_nv
0x16: 0 0 0 0 0
0 2 105 IO-APIC-fasteoi ohci_hcd:usb2
0x15: 0 0 0 0 0
0 0 1014 IO-APIC-fasteoi ehci_hcd:usb1
0x14: 0 0 0 0 0
0 0 1 IO-APIC-fasteoi sata_nv, sata_nv
0x2e: 0 0 0 0 0
0 0 0 IO-APIC-fasteoi sata_nv
0x2d: 0 0 0 0 0
0 0 0 IO-APIC-fasteoi sata_nv
0x2c: 0 0 0 0 0
0 0 0 IO-APIC-fasteoi sata_nv
0x50100: 0 0 0 0 0
0 0 0 PCI-MSI-edge aerdrv
0x70100: 0 0 0 0 0
0 0 0 PCI-MSI-edge aerdrv
0x78100: 0 0 0 0 0
0 0 0 PCI-MSI-edge aerdrv
0x8058100: 0 0 0 0 0
0 0 0 PCI-MSI-edge aerdrv
0x8070100: 0 0 0 0 0
0 0 0 PCI-MSI-edge aerdrv
0x8078100: 0 0 0 0 0
0 0 0 PCI-MSI-edge aerdrv
0x8301100: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010ff: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010fe: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010fd: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010fc: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010fb: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010fa: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010f9: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010f8: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010f7: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010f6: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010f5: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010f4: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010f3: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010f2: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010f1: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010f0: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010ef: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010ee: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010ed: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x83010ec: 0 0 0 0 0
0 0 0 PCI-MSI-edge eth5
0x40100: 0 0 0 0 0
0 9 5352 PCI-MSI-edge eth0
0x48100: 0 0 0 0 0
0 4 148 none-edge
0x8040100: 0 0 0 154 0
0 0 0 none-edge
0x8048100: 0 0 0 154 0
0 0 0 none-edge
NMI: 0 0 0 0 0 0
0 0 Non-maskable interrupts
LOC: 4780 4021 2441 2831 3978 3672
2576 4601 Local timer interrupts
RES: 647 4295 485 282 1324 3561
620 1902 Rescheduling interrupts
CAL: 18 92 53 44 33 53
47 39 function call interrupts
TLB: 23 176 65 41 48 274
95 62 TLB shootdowns
TRM: 0 0 0 0 0 0
0 0 Thermal event interrupts
THR: 0 0 0 0 0 0
0 0 Threshold APIC interrupts
SPU: 0 0 0 0 0 0
0 0 Spurious interrupts
ERR: 1

2008-08-16 20:10:35

by Andrew Vasquez

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Sat, 16 Aug 2008, Yinghai Lu wrote:

> On Sat, Aug 16, 2008 at 9:13 AM, James Bottomley
> <[email protected]> wrote:
> > On Sat, 2008-08-16 at 16:39 +0100, Alan Cox wrote:
> >> > Where exactly is this code in the kernel? Most arches assume the irq is
> >> > an index to a compact table bounded by NR_IRQS, so something like this
> >> > would violate that assumption.
> >>
> >> Yes, which is no bad thing for some platforms. There are some driver
> >> assumptions like that but those have also been stomped.
> >
> > I'm not saying we couldn't do this, or even that we shouldn't; I'm just
> > asking why would we want to?
> >
> > All arches currently seem to have show_interrupts() which loop over
> > 0..NR_IRQS where the interrupt is printed as %d. In this encoded scheme
> > they would show up with rather nastily large numbers that have no
> > visible meaning unless we switch to hex for displaying them.
> >
> > What I'm really saying is that irq as the interrupt number is really the
> > *user's* handle for the interrupt not the machine's, so it needs to be
> > something the user is comfortable with. We could overcome this
> > objection by encoding the number to something meaningful for the
> > user ... I'm just asking if there's any benefit to doing this?
> >
> the code is tip/irq/sparseirq or tip/master
>
> story:
> 1. for x86_64: first we have NR_IRQS = NR_CPUS * NR_VECTORS, because
> it already supports per_cpu vector
> 2. SGI want MAX_SMP support: NR_CPUS=4096, so everything is broken.
> 3. Mike spent some time to make every array [NR_CPUS] to per_cpu
> define as possible.
> 4. Mike or someone else reduce NR_IRQS to 224, because NR=256*4096,
> will make kstat_irqs[NR_CPUS][NR_VECTORS*NR_VECTORS] too big, and it
> could be complied.
> 5. IBM guys report their one server is broken, that system GSI > 256,
> so some irq can not work.
> 6. Yinghai tried one patch change NR_IRQS=32*NR_CPUS., but sgi said it
> still broke their system. --- for 2.6.27
> 7. Eric provide one patch NR_IRQS = min(32*NR_CPUS, NR_VECTORS *
> MAX_IO_APICS) --- for 2.6.27
> 8. For 2.6.28 later, Yinghai add code dyn_array, and probe nr_irqs, so
> NR_IRQS related will be dynamically allocated after nr_irqs is probed.
> 9. Eric said using dyn_array still waste ram, because a lot of
> irq_desc is not used. when MSI-X is involved, some card could use 256
> vectors or 4096 in theory.
> 10. Eric said he had one dyn irq_desc, with 90% done. but didn't have
> time to work it out left 10%
> 11. Yinghai add sparese_irq support. those array will be increased by
> 32, and be claimed one by one.
> 12. according to Eric, we could have irq spread out [0, -1U), irq =
> bus/dev/fn + entry_of_msix
> 13. with sparseirq, /proc/interrupts will have irq_number in hex.
>
> but msix current cached irq number, and it only use 16bit to store
> unsigned int irq., and later cards will call request_irq with
> truncated irq_number...card will fallback to MSI or INTa
>
> only two places need to be changed about that.
>
> BTW, any reason qlogic card need to cache that irq number second times?

So that the driver can release the two request_irq() allocated
handlers during tear-down (via qla24xx_disable_msix()->free_irq()).
Beyond caching (vector/irq) what's returned during pci_enable_msix(),
is there some other mechanism a driver can use to get the IRQ number?

2008-08-16 20:26:03

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Sat, 2008-08-16 at 11:56 -0700, Yinghai Lu wrote:
> On Sat, Aug 16, 2008 at 9:13 AM, James Bottomley
> <[email protected]> wrote:
> > On Sat, 2008-08-16 at 16:39 +0100, Alan Cox wrote:
> >> > Where exactly is this code in the kernel? Most arches assume the irq is
> >> > an index to a compact table bounded by NR_IRQS, so something like this
> >> > would violate that assumption.
> >>
> >> Yes, which is no bad thing for some platforms. There are some driver
> >> assumptions like that but those have also been stomped.
> >
> > I'm not saying we couldn't do this, or even that we shouldn't; I'm just
> > asking why would we want to?
> >
> > All arches currently seem to have show_interrupts() which loop over
> > 0..NR_IRQS where the interrupt is printed as %d. In this encoded scheme
> > they would show up with rather nastily large numbers that have no
> > visible meaning unless we switch to hex for displaying them.
> >
> > What I'm really saying is that irq as the interrupt number is really the
> > *user's* handle for the interrupt not the machine's, so it needs to be
> > something the user is comfortable with. We could overcome this
> > objection by encoding the number to something meaningful for the
> > user ... I'm just asking if there's any benefit to doing this?
> >
> the code is tip/irq/sparseirq or tip/master

OK, that's either a quilt or a specifier for a git head ...
unfortunately linux-next doesn't give you those, so I'd need either a
commit id or a pointer to the base tree or quilt for that to make sense.

> story:
> 1. for x86_64: first we have NR_IRQS = NR_CPUS * NR_VECTORS, because
> it already supports per_cpu vector

Hmm ... the first thing that springs to mind is are you sure? We have
architectures (like voyager and parisc) that always had these per cpu
vector type interrupts. On each of them we actually factored the CPU
affinity out of the irq number for sound reasons (although the per CPU
vectors still exist): The user understands better that irq line 50 is
currently going to CPU1 and that they could change it to CPU2 (or just
use irqbalance). Combining the affinity into the irq number looks like
a bad idea because users won't be able to parse it correctly.

> 2. SGI want MAX_SMP support: NR_CPUS=4096, so everything is broken.
> 3. Mike spent some time to make every array [NR_CPUS] to per_cpu
> define as possible.
> 4. Mike or someone else reduce NR_IRQS to 224, because NR=256*4096,
> will make kstat_irqs[NR_CPUS][NR_VECTORS*NR_VECTORS] too big, and it
> could be complied.
> 5. IBM guys report their one server is broken, that system GSI > 256,
> so some irq can not work.
> 6. Yinghai tried one patch change NR_IRQS=32*NR_CPUS., but sgi said it
> still broke their system. --- for 2.6.27
> 7. Eric provide one patch NR_IRQS = min(32*NR_CPUS, NR_VECTORS *
> MAX_IO_APICS) --- for 2.6.27
> 8. For 2.6.28 later, Yinghai add code dyn_array, and probe nr_irqs, so
> NR_IRQS related will be dynamically allocated after nr_irqs is probed.
> 9. Eric said using dyn_array still waste ram, because a lot of
> irq_desc is not used. when MSI-X is involved, some card could use 256
> vectors or 4096 in theory.
> 10. Eric said he had one dyn irq_desc, with 90% done. but didn't have
> time to work it out left 10%
> 11. Yinghai add sparese_irq support. those array will be increased by
> 32, and be claimed one by one.
> 12. according to Eric, we could have irq spread out [0, -1U), irq =
> bus/dev/fn + entry_of_msix
> 13. with sparseirq, /proc/interrupts will have irq_number in hex.
>
> but msix current cached irq number, and it only use 16bit to store
> unsigned int irq., and later cards will call request_irq with
> truncated irq_number...card will fallback to MSI or INTa

OK, sorry, I get that there's a bug in the msix_entry ... if it's going
to assign an irq to it, it should at least be the same type as irq.

What I still don't quite get is the benefit of large IRQ spaces ...
particularly if you encode things the system doesn't really need to know
in them.

> only two places need to be changed about that.
>
> BTW, any reason qlogic card need to cache that irq number second times?
>
> YH
>
>
> system with qlogic and lpfc

Yes, but if these are all single CPU bound, the matrix display doesn't
really make sense any more, does it?

James

> LBSuse:~ # cat /proc/interrupts
> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
> CPU6 CPU7 CPU8 CPU9 CPU10 CPU11
> CPU12 CPU13 CPU14 CPU15
> 0x0: 111 0 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 IO-APIC-edge timer
> 0x4: 450 0 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 IO-APIC-edge serial
> 0x7: 1 0 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 IO-APIC-edge
> 0x8: 1 0 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 IO-APIC-edge rtc0
> 0x9: 0 0 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 IO-APIC-fasteoi acpi
> 0x17: 0 0 0 0 0
> 0 0 0 0 0 0 0
> 0 0 0 0 IO-APIC-fasteoi sata_nv
> 0x16: 140 0 0 0 0
> 0 0 0 0 0 0 0
> 0 0 0 0 IO-APIC-fasteoi
> ohci_hcd:usb2, sata_nv
> 0x15: 384 0 0 0 0
> 0 0 0 0 0 0 0
> 0 0 0 0 IO-APIC-fasteoi
> ehci_hcd:usb1
> 0x14: 0 0 0 0 0
> 0 0 0 0 0 0 0
> 0 0 0 0 IO-APIC-fasteoi sata_nv
> 0x10: 1083 0 0 0 0
> 0 0 0 0 0 0 0
> 0 0 0 0 IO-APIC-fasteoi aacraid
> 0x2e: 0 0 0 0 0
> 0 0 0 0 0 0 0
> 0 0 0 0 IO-APIC-fasteoi sata_nv
> 0x2d: 0 0 0 0 0
> 0 0 0 0 0 0 0
> 0 0 0 0 IO-APIC-fasteoi sata_nv
> 0x2c: 0 0 0 0 0
> 0 0 0 0 0 0 0
> 0 0 0 0 IO-APIC-fasteoi sata_nv
> 0x50100: 0 0 0 0 0
> 0 0 0 0 0 0 0
> 0 0 0 0 PCI-MSI-edge aerdrv
> 0x70100: 0 0 0 0 0
> 0 0 0 0 0 0 0
> 0 0 0 0 PCI-MSI-edge aerdrv
> 0x78100: 0 0 0 0 0
> 0 0 0 0 0 0 0
> 0 0 0 0 PCI-MSI-edge aerdrv
> 0x8058100: 0 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 0 PCI-MSI-edge
> aerdrv
> 0x8070100: 0 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 0 PCI-MSI-edge
> aerdrv
> 0x8078100: 0 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 0 PCI-MSI-edge
> aerdrv
> 0x8300100: 41 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 0 PCI-MSI-edge
> qla2xxx (default)
> 0x83000ff: 0 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 0 PCI-MSI-edge
> qla2xxx (rsp_q)
> 0x8301100: 41 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 0 PCI-MSI-edge
> qla2xxx (default)
> 0x83010ff: 0 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 0 PCI-MSI-edge
> qla2xxx (rsp_q)
> 0x300100: 2 0 0 0 0
> 0 0 0 0 0 0 0
> 0 0 0 0 PCI-MSI-edge lpfc
> 0x301100: 2 0 0 0 0
> 0 0 0 0 0 0 0
> 0 0 0 0 PCI-MSI-edge lpfc
> 0x40100: 326 0 0 0 0
> 0 0 0 0 0 0 0
> 0 0 0 0 none-edge
> 0x48100: 328 0 0 0 0
> 0 0 0 0 0 0 0
> 0 0 0 0 none-edge
> 0x8040100: 2222 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 0 PCI-MSI-edge eth2
> 0x8048100: 326 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 0 none-edge
> NMI: 0 0 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 Non-maskable interrupts
> LOC: 8782 5209 3029 3222 4556 3328
> 2862 2782 2730 3218 2742 2655
> 3664 3099 3146 3356 Local timer interrupts
> RES: 904 2930 98 65 1083 3723
> 158 84 46 1899 157 60
> 2476 971 114 97 Rescheduling interrupts
> CAL: 12 89 71 65 65 142
> 77 66 65 118 77 67
> 66 106 72 67 function call interrupts
> TLB: 7 90 18 5 3 115
> 16 10 3 123 19 5
> 2 157 18 3 TLB shootdowns
> TRM: 0 0 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 Thermal event interrupts
> THR: 0 0 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 Threshold APIC interrupts
> SPU: 0 0 0 0 0 0
> 0 0 0 0 0 0
> 0 0 0 0 Spurious interrupts
> ERR: 1
>
> system with neptune:
> LBSuse:~ # cat /proc/interrupts
> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
> CPU6 CPU7
> 0x0: 92 0 0 0 0 0
> 0 1 IO-APIC-edge timer
> 0x4: 0 0 0 0 0 0
> 1 532 IO-APIC-edge serial
> 0x7: 1 0 0 0 0 0
> 0 0 IO-APIC-edge
> 0x8: 0 0 0 0 0 0
> 0 1 IO-APIC-edge rtc0
> 0x9: 0 0 0 0 0 0
> 0 0 IO-APIC-fasteoi acpi
> 0x17: 0 0 0 0 0
> 0 0 0 IO-APIC-fasteoi sata_nv
> 0x16: 0 0 0 0 0
> 0 2 105 IO-APIC-fasteoi ohci_hcd:usb2
> 0x15: 0 0 0 0 0
> 0 0 1014 IO-APIC-fasteoi ehci_hcd:usb1
> 0x14: 0 0 0 0 0
> 0 0 1 IO-APIC-fasteoi sata_nv, sata_nv
> 0x2e: 0 0 0 0 0
> 0 0 0 IO-APIC-fasteoi sata_nv
> 0x2d: 0 0 0 0 0
> 0 0 0 IO-APIC-fasteoi sata_nv
> 0x2c: 0 0 0 0 0
> 0 0 0 IO-APIC-fasteoi sata_nv
> 0x50100: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge aerdrv
> 0x70100: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge aerdrv
> 0x78100: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge aerdrv
> 0x8058100: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge aerdrv
> 0x8070100: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge aerdrv
> 0x8078100: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge aerdrv
> 0x8301100: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010ff: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010fe: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010fd: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010fc: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010fb: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010fa: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010f9: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010f8: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010f7: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010f6: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010f5: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010f4: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010f3: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010f2: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010f1: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010f0: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010ef: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010ee: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010ed: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x83010ec: 0 0 0 0 0
> 0 0 0 PCI-MSI-edge eth5
> 0x40100: 0 0 0 0 0
> 0 9 5352 PCI-MSI-edge eth0
> 0x48100: 0 0 0 0 0
> 0 4 148 none-edge
> 0x8040100: 0 0 0 154 0
> 0 0 0 none-edge
> 0x8048100: 0 0 0 154 0
> 0 0 0 none-edge
> NMI: 0 0 0 0 0 0
> 0 0 Non-maskable interrupts
> LOC: 4780 4021 2441 2831 3978 3672
> 2576 4601 Local timer interrupts
> RES: 647 4295 485 282 1324 3561
> 620 1902 Rescheduling interrupts
> CAL: 18 92 53 44 33 53
> 47 39 function call interrupts
> TLB: 23 176 65 41 48 274
> 95 62 TLB shootdowns
> TRM: 0 0 0 0 0 0
> 0 0 Thermal event interrupts
> THR: 0 0 0 0 0 0
> 0 0 Threshold APIC interrupts
> SPU: 0 0 0 0 0 0
> 0 0 Spurious interrupts
> ERR: 1

2008-08-16 20:48:01

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Sat, Aug 16, 2008 at 1:25 PM, James Bottomley
<[email protected]> wrote:
> On Sat, 2008-08-16 at 11:56 -0700, Yinghai Lu wrote:
>> On Sat, Aug 16, 2008 at 9:13 AM, James Bottomley
>> <[email protected]> wrote:
>> > On Sat, 2008-08-16 at 16:39 +0100, Alan Cox wrote:
>> >> > Where exactly is this code in the kernel? Most arches assume the irq is
>> >> > an index to a compact table bounded by NR_IRQS, so something like this
>> >> > would violate that assumption.
>> >>
>> >> Yes, which is no bad thing for some platforms. There are some driver
>> >> assumptions like that but those have also been stomped.
>> >
>> > I'm not saying we couldn't do this, or even that we shouldn't; I'm just
>> > asking why would we want to?
>> >
>> > All arches currently seem to have show_interrupts() which loop over
>> > 0..NR_IRQS where the interrupt is printed as %d. In this encoded scheme
>> > they would show up with rather nastily large numbers that have no
>> > visible meaning unless we switch to hex for displaying them.
>> >
>> > What I'm really saying is that irq as the interrupt number is really the
>> > *user's* handle for the interrupt not the machine's, so it needs to be
>> > something the user is comfortable with. We could overcome this
>> > objection by encoding the number to something meaningful for the
>> > user ... I'm just asking if there's any benefit to doing this?
>> >
>> the code is tip/irq/sparseirq or tip/master
>
> OK, that's either a quilt or a specifier for a git head ...
> unfortunately linux-next doesn't give you those, so I'd need either a
> commit id or a pointer to the base tree or quilt for that to make sense.
>
>> story:
>> 1. for x86_64: first we have NR_IRQS = NR_CPUS * NR_VECTORS, because
>> it already supports per_cpu vector
>
> Hmm ... the first thing that springs to mind is are you sure? We have
> architectures (like voyager and parisc) that always had these per cpu
> vector type interrupts. On each of them we actually factored the CPU
> affinity out of the irq number for sound reasons (although the per CPU
> vectors still exist): The user understands better that irq line 50 is
> currently going to CPU1 and that they could change it to CPU2 (or just
> use irqbalance). Combining the affinity into the irq number looks like
> a bad idea because users won't be able to parse it correctly.
>
>> 2. SGI want MAX_SMP support: NR_CPUS=4096, so everything is broken.
>> 3. Mike spent some time to make every array [NR_CPUS] to per_cpu
>> define as possible.
>> 4. Mike or someone else reduce NR_IRQS to 224, because NR=256*4096,
>> will make kstat_irqs[NR_CPUS][NR_VECTORS*NR_VECTORS] too big, and it
>> could be complied.
>> 5. IBM guys report their one server is broken, that system GSI > 256,
>> so some irq can not work.
>> 6. Yinghai tried one patch change NR_IRQS=32*NR_CPUS., but sgi said it
>> still broke their system. --- for 2.6.27
>> 7. Eric provide one patch NR_IRQS = min(32*NR_CPUS, NR_VECTORS *
>> MAX_IO_APICS) --- for 2.6.27
>> 8. For 2.6.28 later, Yinghai add code dyn_array, and probe nr_irqs, so
>> NR_IRQS related will be dynamically allocated after nr_irqs is probed.
>> 9. Eric said using dyn_array still waste ram, because a lot of
>> irq_desc is not used. when MSI-X is involved, some card could use 256
>> vectors or 4096 in theory.
>> 10. Eric said he had one dyn irq_desc, with 90% done. but didn't have
>> time to work it out left 10%
>> 11. Yinghai add sparese_irq support. those array will be increased by
>> 32, and be claimed one by one.
>> 12. according to Eric, we could have irq spread out [0, -1U), irq =
>> bus/dev/fn + entry_of_msix
>> 13. with sparseirq, /proc/interrupts will have irq_number in hex.
>>
>> but msix current cached irq number, and it only use 16bit to store
>> unsigned int irq., and later cards will call request_irq with
>> truncated irq_number...card will fallback to MSI or INTa
>
> OK, sorry, I get that there's a bug in the msix_entry ... if it's going
> to assign an irq to it, it should at least be the same type as irq.

good. for 2.6.27?

>
> What I still don't quite get is the benefit of large IRQ spaces ...
> particularly if you encode things the system doesn't really need to know
> in them.

then set nr_irqs = nr_cpu_ids * NR_VECTORS))
and count down for msi/msi-x?

YH

2008-08-16 20:49:10

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Sat, 2008-08-16 at 13:34 -0700, Yinghai Lu wrote:
> On Sat, Aug 16, 2008 at 1:25 PM, James Bottomley
> <[email protected]> wrote:
> >> but msix current cached irq number, and it only use 16bit to store
> >> unsigned int irq., and later cards will call request_irq with
> >> truncated irq_number...card will fallback to MSI or INTa
> >
> > OK, sorry, I get that there's a bug in the msix_entry ... if it's going
> > to assign an irq to it, it should at least be the same type as irq.
>
> good. for 2.6.27?

Well, given that 2.6.27 uses a compact irq space, probably not ...
unless there's actually something in 2.6.27 that trips it?

> >
> > What I still don't quite get is the benefit of large IRQ spaces ...
> > particularly if you encode things the system doesn't really need to know
> > in them.
>
> then set nr_irqs = nr_cpu_ids * NR_VECTORS))
> and count down for msi/msi-x?

No, what I mean is that msis can trip directly to CPUs, so this is an
affinity thing (that MSI is directly bound to that CPU now), so in the
matrixed way we display this in show_interrupts() with the CPU along the
top and the IRQ down the side, it doesn't make sense to me to encode IRQ
affinity in the irq number again. So it makes more sense to assign the
vectors based on both the irq number and the CPU affinity so that if the
PCI MSI for qla is assigned to CPU4 you can reassign it to CPU5 and so
on.

James

2008-08-16 22:17:45

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Sat, Aug 16, 2008 at 1:45 PM, James Bottomley
<[email protected]> wrote:
>> > What I still don't quite get is the benefit of large IRQ spaces ...
>> > particularly if you encode things the system doesn't really need to know
>> > in them.
>>
>> then set nr_irqs = nr_cpu_ids * NR_VECTORS))
>> and count down for msi/msi-x?
>
> No, what I mean is that msis can trip directly to CPUs, so this is an
> affinity thing (that MSI is directly bound to that CPU now), so in the
> matrixed way we display this in show_interrupts() with the CPU along the
> top and the IRQ down the side, it doesn't make sense to me to encode IRQ
> affinity in the irq number again. So it makes more sense to assign the
> vectors based on both the irq number and the CPU affinity so that if the
> PCI MSI for qla is assigned to CPU4 you can reassign it to CPU5 and so
> on.

msi-x entry index, cpu_vector, irq number...

you want to different cpus have same vector?

YH

2008-08-16 23:09:38

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Sat, 2008-08-16 at 15:17 -0700, Yinghai Lu wrote:
> On Sat, Aug 16, 2008 at 1:45 PM, James Bottomley
> <[email protected]> wrote:
> >> > What I still don't quite get is the benefit of large IRQ spaces ...
> >> > particularly if you encode things the system doesn't really need to know
> >> > in them.
> >>
> >> then set nr_irqs = nr_cpu_ids * NR_VECTORS))
> >> and count down for msi/msi-x?
> >
> > No, what I mean is that msis can trip directly to CPUs, so this is an
> > affinity thing (that MSI is directly bound to that CPU now), so in the
> > matrixed way we display this in show_interrupts() with the CPU along the
> > top and the IRQ down the side, it doesn't make sense to me to encode IRQ
> > affinity in the irq number again. So it makes more sense to assign the
> > vectors based on both the irq number and the CPU affinity so that if the
> > PCI MSI for qla is assigned to CPU4 you can reassign it to CPU5 and so
> > on.
>
> msi-x entry index, cpu_vector, irq number...
>
> you want to different cpus have same vector?

Obviously I'm not communicating very well. Your apparent assumption is
that irq number == vector. What I'm saying is that's not what we've
done for individually vectored CPU interrupts in other architectures.
In those we did (cpu no, irq) == vector. i.e. the affinity and the irq
number identify the vector. For non-numa systems, this is effectively
what you're interested in doing anyway. For numa systems, it just
becomes a sparse matrix.

James

2008-08-16 23:21:50

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Sat, Aug 16, 2008 at 4:09 PM, James Bottomley
<[email protected]> wrote:
> On Sat, 2008-08-16 at 15:17 -0700, Yinghai Lu wrote:
>> On Sat, Aug 16, 2008 at 1:45 PM, James Bottomley
>> <[email protected]> wrote:
>> >> > What I still don't quite get is the benefit of large IRQ spaces ...
>> >> > particularly if you encode things the system doesn't really need to know
>> >> > in them.
>> >>
>> >> then set nr_irqs = nr_cpu_ids * NR_VECTORS))
>> >> and count down for msi/msi-x?
>> >
>> > No, what I mean is that msis can trip directly to CPUs, so this is an
>> > affinity thing (that MSI is directly bound to that CPU now), so in the
>> > matrixed way we display this in show_interrupts() with the CPU along the
>> > top and the IRQ down the side, it doesn't make sense to me to encode IRQ
>> > affinity in the irq number again. So it makes more sense to assign the
>> > vectors based on both the irq number and the CPU affinity so that if the
>> > PCI MSI for qla is assigned to CPU4 you can reassign it to CPU5 and so
>> > on.
>>
>> msi-x entry index, cpu_vector, irq number...
>>
>> you want to different cpus have same vector?
>
> Obviously I'm not communicating very well. Your apparent assumption is
> that irq number == vector. What I'm saying is that's not what we've
> done for individually vectored CPU interrupts in other architectures.
> In those we did (cpu no, irq) == vector. i.e. the affinity and the irq
> number identify the vector. For non-numa systems, this is effectively
> what you're interested in doing anyway. For numa systems, it just
> becomes a sparse matrix.
>
msix_entry.vector == irq.

and cpu have per_cpu(vector_irq), and use that cpu get irq number.

when irq migratition, different cpu could use different vector for same irq.
or say phys_flat mode, same vector in different cpus could server different irq.
it seems x86 (64 bit) and ia64 already did that.

YH

2008-08-18 20:04:06

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

James Bottomley <[email protected]> writes:

> On Sat, 2008-08-16 at 15:17 -0700, Yinghai Lu wrote:
>> On Sat, Aug 16, 2008 at 1:45 PM, James Bottomley
>> <[email protected]> wrote:
>> >> > What I still don't quite get is the benefit of large IRQ spaces ...
>> >> > particularly if you encode things the system doesn't really need to know
>> >> > in them.
>> >>
>> >> then set nr_irqs = nr_cpu_ids * NR_VECTORS))
>> >> and count down for msi/msi-x?
>> >
>> > No, what I mean is that msis can trip directly to CPUs, so this is an
>> > affinity thing (that MSI is directly bound to that CPU now), so in the
>> > matrixed way we display this in show_interrupts() with the CPU along the
>> > top and the IRQ down the side, it doesn't make sense to me to encode IRQ
>> > affinity in the irq number again. So it makes more sense to assign the
>> > vectors based on both the irq number and the CPU affinity so that if the
>> > PCI MSI for qla is assigned to CPU4 you can reassign it to CPU5 and so
>> > on.
>>
>> msi-x entry index, cpu_vector, irq number...
>>
>> you want to different cpus have same vector?
>
> Obviously I'm not communicating very well. Your apparent assumption is
> that irq number == vector.

Careful. There are two entities termed vector in this conversation.
There is the MSI-X vector which can hold up to 4096 entries per device.
There is the idt vector which has 256 entries per cpu.

> What I'm saying is that's not what we've
> done for individually vectored CPU interrupts in other architectures.
> In those we did (cpu no, irq) == vector. i.e. the affinity and the irq
> number identify the vector. For non-numa systems, this is effectively
> what you're interested in doing anyway. For numa systems, it just
> becomes a sparse matrix.

I believe assign_irq_vector on x86_64 and soon on x86_32 does this already.

The number that was being changed was the irq number of for the
msi-x ``vectors'' from some random free irq number to roughly
bus(8 bits):device+function(8 bits):msix-vector(12 bits) so that we
could have a stable irq number for msi irqs.

Once pci domain is considered it is hard to claim we have enough bits.
I expect we need at least pci domains to have one per NUMA node, in
the general case.

The big motivation for killing NR_IRQS sized arrays comes from 2 directions.
msi-x which allows up to 4096 irqs per device and nic vendors starting
to produce cards with 256 queues, and from large SGI systems that don't do
I/O and want to be supported with the same kernel build as smaller systems.
A kernel built to handle 4096*32 irqs which is more or less reasonable if
the system was I/O heavy is a ridiculously sized array on smaller machines.

So a static irq_desc is out. And since with the combination of msi-x hotplug
we can not tell how many irq sources and thus irq numbers the machine is going
to have we can not reasonably even have a dynamic array at boot time. Further
we also want to allocate the irq_desc entries in node-local memory on NUMA
machines for better performance. Which means we need to dynamically allocate
irq_desc entries and have some lookup mechanism from irq# to irq_desc entry.

So once we have all of that. It becomes possible to look at assigning a static
irq number to each pci (bus:device:function:msi-x vector) pair so the system
is more reproducible.

Eric

2008-08-18 20:59:33

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Mon, 2008-08-18 at 12:59 -0700, Eric W. Biederman wrote:
> James Bottomley <[email protected]> writes:
>
> > On Sat, 2008-08-16 at 15:17 -0700, Yinghai Lu wrote:
> >> On Sat, Aug 16, 2008 at 1:45 PM, James Bottomley
> >> <[email protected]> wrote:
> >> >> > What I still don't quite get is the benefit of large IRQ spaces ...
> >> >> > particularly if you encode things the system doesn't really need to know
> >> >> > in them.
> >> >>
> >> >> then set nr_irqs = nr_cpu_ids * NR_VECTORS))
> >> >> and count down for msi/msi-x?
> >> >
> >> > No, what I mean is that msis can trip directly to CPUs, so this is an
> >> > affinity thing (that MSI is directly bound to that CPU now), so in the
> >> > matrixed way we display this in show_interrupts() with the CPU along the
> >> > top and the IRQ down the side, it doesn't make sense to me to encode IRQ
> >> > affinity in the irq number again. So it makes more sense to assign the
> >> > vectors based on both the irq number and the CPU affinity so that if the
> >> > PCI MSI for qla is assigned to CPU4 you can reassign it to CPU5 and so
> >> > on.
> >>
> >> msi-x entry index, cpu_vector, irq number...
> >>
> >> you want to different cpus have same vector?
> >
> > Obviously I'm not communicating very well. Your apparent assumption is
> > that irq number == vector.
>
> Careful. There are two entities termed vector in this conversation.
> There is the MSI-X vector which can hold up to 4096 entries per device.
> There is the idt vector which has 256 entries per cpu.

Yes ... I was trying to discriminate, but not very successfully.

Where I'm coming from is parisc CPUs have 64 CPU vectors (what you'd
call the idt in the local apic) and the bus hardware has a variable
number per bridge (These aren't MSI they're bridge triggered: The way
PSI works is that bridge<->CPU is done as MSI).

Anyway, the argument about vector or matrix representation goes to the
core of what irq number represents below.

> > What I'm saying is that's not what we've
> > done for individually vectored CPU interrupts in other architectures.
> > In those we did (cpu no, irq) == vector. i.e. the affinity and the irq
> > number identify the vector. For non-numa systems, this is effectively
> > what you're interested in doing anyway. For numa systems, it just
> > becomes a sparse matrix.
>
> I believe assign_irq_vector on x86_64 and soon on x86_32 does this already.
>
> The number that was being changed was the irq number of for the
> msi-x ``vectors'' from some random free irq number to roughly
> bus(8 bits):device+function(8 bits):msix-vector(12 bits) so that we
> could have a stable irq number for msi irqs.
>
> Once pci domain is considered it is hard to claim we have enough bits.
> I expect we need at least pci domains to have one per NUMA node, in
> the general case.

Right ... I think the idea of trying to give some meaningful number to
irq is a lost cause. Since IRQ is just a handle, I'd actually use
something like dev->bus_id and individual card functions rather than a
number (i.e. no longer exposing the raw number to user space). Then
what irq number means and how it's used becomes an internal matter.

The only real concern I had was not wanting the per-cpu idt vectors to
be mangled into the irq number because they're essentially columns in
the sparse matrix with CPU across the top.

> The big motivation for killing NR_IRQS sized arrays comes from 2 directions.
> msi-x which allows up to 4096 irqs per device and nic vendors starting
> to produce cards with 256 queues, and from large SGI systems that don't do
> I/O and want to be supported with the same kernel build as smaller systems.
> A kernel built to handle 4096*32 irqs which is more or less reasonable if
> the system was I/O heavy is a ridiculously sized array on smaller machines.
>
> So a static irq_desc is out. And since with the combination of msi-x hotplug
> we can not tell how many irq sources and thus irq numbers the machine is going
> to have we can not reasonably even have a dynamic array at boot time. Further
> we also want to allocate the irq_desc entries in node-local memory on NUMA
> machines for better performance. Which means we need to dynamically allocate
> irq_desc entries and have some lookup mechanism from irq# to irq_desc entry.
>
> So once we have all of that. It becomes possible to look at assigning a static
> irq number to each pci (bus:device:function:msi-x vector) pair so the system
> is more reproducible.

Yes, agree with the above except the static irq number ... although once
that's hidden from the user, I suppose I really don't care any more.

James

2008-08-18 21:25:54

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

Eric W. Biederman wrote:
>
> I believe assign_irq_vector on x86_64 and soon on x86_32 does this already.
>
> The number that was being changed was the irq number of for the
> msi-x ``vectors'' from some random free irq number to roughly
> bus(8 bits):device+function(8 bits):msix-vector(12 bits) so that we
> could have a stable irq number for msi irqs.
>
> Once pci domain is considered it is hard to claim we have enough bits.
> I expect we need at least pci domains to have one per NUMA node, in
> the general case.
>
[...]
>
> So once we have all of that. It becomes possible to look at assigning a static
> irq number to each pci (bus:device:function:msi-x vector) pair so the system
> is more reproducible.
>

Domain, again?

-hpa

2008-08-18 21:54:19

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

James Bottomley <[email protected]> writes:

> On Mon, 2008-08-18 at 12:59 -0700, Eric W. Biederman wrote:
>> James Bottomley <[email protected]> writes:
>>
>> > On Sat, 2008-08-16 at 15:17 -0700, Yinghai Lu wrote:
>> >> On Sat, Aug 16, 2008 at 1:45 PM, James Bottomley
>> >> <[email protected]> wrote:
>> >> >> > What I still don't quite get is the benefit of large IRQ spaces ...
>> >> >> > particularly if you encode things the system doesn't really need to
> know
>> >> >> > in them.
>> >> >>
>> >> >> then set nr_irqs = nr_cpu_ids * NR_VECTORS))
>> >> >> and count down for msi/msi-x?
>> >> >
>> >> > No, what I mean is that msis can trip directly to CPUs, so this is an
>> >> > affinity thing (that MSI is directly bound to that CPU now), so in the
>> >> > matrixed way we display this in show_interrupts() with the CPU along the
>> >> > top and the IRQ down the side, it doesn't make sense to me to encode IRQ
>> >> > affinity in the irq number again. So it makes more sense to assign the
>> >> > vectors based on both the irq number and the CPU affinity so that if the
>> >> > PCI MSI for qla is assigned to CPU4 you can reassign it to CPU5 and so
>> >> > on.
>> >>
>> >> msi-x entry index, cpu_vector, irq number...
>> >>
>> >> you want to different cpus have same vector?
>> >
>> > Obviously I'm not communicating very well. Your apparent assumption is
>> > that irq number == vector.
>>
>> Careful. There are two entities termed vector in this conversation.
>> There is the MSI-X vector which can hold up to 4096 entries per device.
>> There is the idt vector which has 256 entries per cpu.
>
> Yes ... I was trying to discriminate, but not very successfully.
>
> Where I'm coming from is parisc CPUs have 64 CPU vectors (what you'd
> call the idt in the local apic) and the bus hardware has a variable
> number per bridge (These aren't MSI they're bridge triggered: The way
> PSI works is that bridge<->CPU is done as MSI).
>
> Anyway, the argument about vector or matrix representation goes to the
> core of what irq number represents below.
>
>> > What I'm saying is that's not what we've
>> > done for individually vectored CPU interrupts in other architectures.
>> > In those we did (cpu no, irq) == vector. i.e. the affinity and the irq
>> > number identify the vector. For non-numa systems, this is effectively
>> > what you're interested in doing anyway. For numa systems, it just
>> > becomes a sparse matrix.
>>
>> I believe assign_irq_vector on x86_64 and soon on x86_32 does this already.
>>
>> The number that was being changed was the irq number of for the
>> msi-x ``vectors'' from some random free irq number to roughly
>> bus(8 bits):device+function(8 bits):msix-vector(12 bits) so that we
>> could have a stable irq number for msi irqs.
>>
>> Once pci domain is considered it is hard to claim we have enough bits.
>> I expect we need at least pci domains to have one per NUMA node, in
>> the general case.
>
> Right ... I think the idea of trying to give some meaningful number to
> irq is a lost cause.

Possibly. So far except for msi the number is irq-controller-base +
interrupt source number. For debugging interrupt problems that
appears to work fairly well, as it means we can have BIOS and the
kernel giving the same names to the interrupts which facilitates
things working well together. When we don't do that we had some very
hideous hard to deal with acpi interoperability issues. I don't know
how well that will work for MSI but I think the conversation is very
well worth having as the benefits can be huge.

So my experience is that stability and consistency in the naming seems
to help.

> Since IRQ is just a handle, I'd actually use
> something like dev->bus_id and individual card functions rather than a
> number (i.e. no longer exposing the raw number to user space). Then
> what irq number means and how it's used becomes an internal matter.

I agree that long term a nice ascii string for the irq looks desirable.

> The only real concern I had was not wanting the per-cpu idt vectors to
> be mangled into the irq number because they're essentially columns in
> the sparse matrix with CPU across the top.

Yes. The per-cpu idt vectors are a completely hidden implementation detail
at this point.

>> The big motivation for killing NR_IRQS sized arrays comes from 2 directions.
>> msi-x which allows up to 4096 irqs per device and nic vendors starting
>> to produce cards with 256 queues, and from large SGI systems that don't do
>> I/O and want to be supported with the same kernel build as smaller systems.
>> A kernel built to handle 4096*32 irqs which is more or less reasonable if
>> the system was I/O heavy is a ridiculously sized array on smaller machines.
>>
>> So a static irq_desc is out. And since with the combination of msi-x hotplug
>> we can not tell how many irq sources and thus irq numbers the machine is going
>> to have we can not reasonably even have a dynamic array at boot time. Further
>> we also want to allocate the irq_desc entries in node-local memory on NUMA
>> machines for better performance. Which means we need to dynamically allocate
>> irq_desc entries and have some lookup mechanism from irq# to irq_desc entry.
>>
>> So once we have all of that. It becomes possible to look at assigning a
> static
>> irq number to each pci (bus:device:function:msi-x vector) pair so the system
>> is more reproducible.
>
> Yes, agree with the above except the static irq number ... although once
> that's hidden from the user, I suppose I really don't care any more.

There is no point at all in having an irq number if it is not visible
to user space.

Reaching a point where we don't export irq numbers to user space is
a hard problem. We have the irq balancer in user space (gag) and as
well as interfaces like /proc/interrupts, and for old ISA devices
there is no way to autodetect the configuration. So it requires a lot
of work.

The obvious first step is still to remove the architecture dependence
on irq number, and introduce another way of talking about irqs. Then
we can proceed to change the driver and the user space interfaces.

I completely agree that irq number 99.9% of the time should be a completely
abstract token.

Eric

2008-08-18 22:04:40

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Mon, 2008-08-18 at 14:45 -0700, Eric W. Biederman wrote:
> > Yes, agree with the above except the static irq number ... although once
> > that's hidden from the user, I suppose I really don't care any more.
>
> There is no point at all in having an irq number if it is not visible
> to user space.
>
> Reaching a point where we don't export irq numbers to user space is
> a hard problem. We have the irq balancer in user space (gag) and as
> well as interfaces like /proc/interrupts, and for old ISA devices
> there is no way to autodetect the configuration. So it requires a lot
> of work.

Sure, but you have 16 (or whatever) legacy interrupts. You still call
them 1-16 (or ISA-1 through ISA-16). By the time we reach this stage,
we're essentially doing string table lookups for the interrupts, so
there's no need to pre-allocate them (except as a possible arch
implementation detail).

> The obvious first step is still to remove the architecture dependence
> on irq number, and introduce another way of talking about irqs. Then
> we can proceed to change the driver and the user space interfaces.
>
> I completely agree that irq number 99.9% of the time should be a completely
> abstract token.

Sure, although one nice reason for doing the abstraction first is that
it stops people imposing fragile numbering schemes on irq ...

James

2008-08-18 22:09:19

by Alan

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

> > I completely agree that irq number 99.9% of the time should be a completely
> > abstract token.
>
> Sure, although one nice reason for doing the abstraction first is that
> it stops people imposing fragile numbering schemes on irq ...

On a lot of embedded devices IRQ numbers are not abstract and not
fragile. I'm all for abstracting out interrupts nicely but it isn't just
the legacy PC cases to consider - a lot of embedded is at least as
defined, rigid and meaningfully numbered as ISA.

Alan

2008-08-18 22:14:34

by H. Peter Anvin

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

Alan Cox wrote:
>>> I completely agree that irq number 99.9% of the time should be a completely
>>> abstract token.
>> Sure, although one nice reason for doing the abstraction first is that
>> it stops people imposing fragile numbering schemes on irq ...
>
> On a lot of embedded devices IRQ numbers are not abstract and not
> fragile. I'm all for abstracting out interrupts nicely but it isn't just
> the legacy PC cases to consider - a lot of embedded is at least as
> defined, rigid and meaningfully numbered as ISA.
>

Note that James said:

> Sure, but you have 16 (or whatever) legacy interrupts. You still call
> them 1-16 (or ISA-1 through ISA-16). By the time we reach this stage,
> we're essentially doing string table lookups for the interrupts, so
> there's no need to pre-allocate them (except as a possible arch
> implementation detail).

I think the point is that if we're going to have a meaningful name, it
should be a string, so we can impose whatever naming scheme makes sense
for the platform. Even on embedded platforms it may mean that the flat
number scheme isn't what makes sense.

-hpa

2008-08-18 22:27:21

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Mon, 2008-08-18 at 22:51 +0100, Alan Cox wrote:
> > > I completely agree that irq number 99.9% of the time should be a completely
> > > abstract token.
> >
> > Sure, although one nice reason for doing the abstraction first is that
> > it stops people imposing fragile numbering schemes on irq ...
>
> On a lot of embedded devices IRQ numbers are not abstract and not
> fragile. I'm all for abstracting out interrupts nicely but it isn't just
> the legacy PC cases to consider - a lot of embedded is at least as
> defined, rigid and meaningfully numbered as ISA.

So for these, the abstraction transformation is the identity ... meaning
they behave as they always have.

James

2008-08-21 20:33:30

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Friday, August 15, 2008 7:36 pm Yinghai Lu wrote:
> we are using 28bit pci (bus/dev/fn + 12 bits) as irq number, so the
> cache for irq number should be 32 bit too.
>
> Signed-off-by: Yinghai Lu <[email protected]>
> Cc: Andrew Vasquez <[email protected]>
>
> ---
> drivers/scsi/qla2xxx/qla_def.h | 2 +-
> include/linux/pci.h | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/drivers/scsi/qla2xxx/qla_def.h
> ===================================================================
> --- linux-2.6.orig/drivers/scsi/qla2xxx/qla_def.h
> +++ linux-2.6/drivers/scsi/qla2xxx/qla_def.h
> @@ -2109,7 +2109,7 @@ struct scsi_qla_host;
>
> struct qla_msix_entry {
> int have_irq;
> - uint16_t msix_vector;
> + uint32_t msix_vector;
> uint16_t msix_entry;
> };
>
> Index: linux-2.6/include/linux/pci.h
> ===================================================================
> --- linux-2.6.orig/include/linux/pci.h
> +++ linux-2.6/include/linux/pci.h
> @@ -730,7 +730,7 @@ enum pci_dma_burst_strategy {
> };
>
> struct msix_entry {
> - u16 vector; /* kernel uses to write allocated vector */
> + u32 vector; /* kernel uses to write allocated vector */
> u16 entry; /* driver uses to specify entry, OS writes */
> };

I see a lot of smoke about this, but not a clear consensus. It sounds like no
one objects to making the MSI vector field 32 bits, but there were a lot of
concerns about IRQ vector naming/numbering in general? And I assume this
particular patch isn't 2.6.27 material...

Thanks,
--
Jesse Barnes, Intel Open Source Technology Center

2008-08-21 20:54:26

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

Jesse Barnes <[email protected]> writes:

> On Friday, August 15, 2008 7:36 pm Yinghai Lu wrote:
>> we are using 28bit pci (bus/dev/fn + 12 bits) as irq number, so the
>> cache for irq number should be 32 bit too.
>>
>> Signed-off-by: Yinghai Lu <[email protected]>
>> Cc: Andrew Vasquez <[email protected]>
>>
>> ---
>> drivers/scsi/qla2xxx/qla_def.h | 2 +-
>> include/linux/pci.h | 2 +-
>> 2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> Index: linux-2.6/drivers/scsi/qla2xxx/qla_def.h
>> ===================================================================
>> --- linux-2.6.orig/drivers/scsi/qla2xxx/qla_def.h
>> +++ linux-2.6/drivers/scsi/qla2xxx/qla_def.h
>> @@ -2109,7 +2109,7 @@ struct scsi_qla_host;
>>
>> struct qla_msix_entry {
>> int have_irq;
>> - uint16_t msix_vector;
>> + uint32_t msix_vector;
>> uint16_t msix_entry;
>> };
>>
>> Index: linux-2.6/include/linux/pci.h
>> ===================================================================
>> --- linux-2.6.orig/include/linux/pci.h
>> +++ linux-2.6/include/linux/pci.h
>> @@ -730,7 +730,7 @@ enum pci_dma_burst_strategy {
>> };
>>
>> struct msix_entry {
>> - u16 vector; /* kernel uses to write allocated vector */
>> + u32 vector; /* kernel uses to write allocated vector */
>> u16 entry; /* driver uses to specify entry, OS writes */
>> };
>
> I see a lot of smoke about this, but not a clear consensus. It sounds like no
> one objects to making the MSI vector field 32 bits, but there were a lot of
> concerns about IRQ vector naming/numbering in general? And I assume this
> particular patch isn't 2.6.27 material...

There is no need for this particular patch in 2.6.27.
It is a trivial bug fix so it can go in whenever.

Mostly this is a discussion about code that should land in the 2.6.28
timeframe assuming all is well.

Eric

2008-08-21 23:07:37

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Thursday, August 21, 2008 1:47 pm Eric W. Biederman wrote:
> Jesse Barnes <[email protected]> writes:
> > On Friday, August 15, 2008 7:36 pm Yinghai Lu wrote:
> >> we are using 28bit pci (bus/dev/fn + 12 bits) as irq number, so the
> >> cache for irq number should be 32 bit too.
> >>
> >> Signed-off-by: Yinghai Lu <[email protected]>
> >> Cc: Andrew Vasquez <[email protected]>
> >>
> >> ---
> >> drivers/scsi/qla2xxx/qla_def.h | 2 +-
> >> include/linux/pci.h | 2 +-
> >> 2 files changed, 2 insertions(+), 2 deletions(-)
> >>
> >> Index: linux-2.6/drivers/scsi/qla2xxx/qla_def.h
> >> ===================================================================
> >> --- linux-2.6.orig/drivers/scsi/qla2xxx/qla_def.h
> >> +++ linux-2.6/drivers/scsi/qla2xxx/qla_def.h
> >> @@ -2109,7 +2109,7 @@ struct scsi_qla_host;
> >>
> >> struct qla_msix_entry {
> >> int have_irq;
> >> - uint16_t msix_vector;
> >> + uint32_t msix_vector;
> >> uint16_t msix_entry;
> >> };
> >>
> >> Index: linux-2.6/include/linux/pci.h
> >> ===================================================================
> >> --- linux-2.6.orig/include/linux/pci.h
> >> +++ linux-2.6/include/linux/pci.h
> >> @@ -730,7 +730,7 @@ enum pci_dma_burst_strategy {
> >> };
> >>
> >> struct msix_entry {
> >> - u16 vector; /* kernel uses to write allocated vector */
> >> + u32 vector; /* kernel uses to write allocated vector */
> >> u16 entry; /* driver uses to specify entry, OS writes */
> >> };
> >
> > I see a lot of smoke about this, but not a clear consensus. It sounds
> > like no one objects to making the MSI vector field 32 bits, but there
> > were a lot of concerns about IRQ vector naming/numbering in general? And
> > I assume this particular patch isn't 2.6.27 material...
>
> There is no need for this particular patch in 2.6.27.
> It is a trivial bug fix so it can go in whenever.
>
> Mostly this is a discussion about code that should land in the 2.6.28
> timeframe assuming all is well.

Ok, thanks Eric. I'll queue it up for linux-next after the next 2.6.27 pull.

Thanks,
--
Jesse Barnes, Intel Open Source Technology Center

2008-08-22 00:14:36

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

Jesse Barnes <[email protected]> writes:

> On Thursday, August 21, 2008 1:47 pm Eric W. Biederman wrote:
>> Jesse Barnes <[email protected]> writes:
>> > On Friday, August 15, 2008 7:36 pm Yinghai Lu wrote:
>> >> we are using 28bit pci (bus/dev/fn + 12 bits) as irq number, so the
>> >> cache for irq number should be 32 bit too.
>> >>
>> >> Signed-off-by: Yinghai Lu <[email protected]>
>> >> Cc: Andrew Vasquez <[email protected]>
>> >>
>> >> ---
>> >> drivers/scsi/qla2xxx/qla_def.h | 2 +-
>> >> include/linux/pci.h | 2 +-
>> >> 2 files changed, 2 insertions(+), 2 deletions(-)
>> >>
>> >> Index: linux-2.6/drivers/scsi/qla2xxx/qla_def.h
>> >> ===================================================================
>> >> --- linux-2.6.orig/drivers/scsi/qla2xxx/qla_def.h
>> >> +++ linux-2.6/drivers/scsi/qla2xxx/qla_def.h
>> >> @@ -2109,7 +2109,7 @@ struct scsi_qla_host;
>> >>
>> >> struct qla_msix_entry {
>> >> int have_irq;
>> >> - uint16_t msix_vector;
>> >> + uint32_t msix_vector;
>> >> uint16_t msix_entry;
>> >> };
>> >>
>> >> Index: linux-2.6/include/linux/pci.h
>> >> ===================================================================
>> >> --- linux-2.6.orig/include/linux/pci.h
>> >> +++ linux-2.6/include/linux/pci.h
>> >> @@ -730,7 +730,7 @@ enum pci_dma_burst_strategy {
>> >> };
>> >>
>> >> struct msix_entry {
>> >> - u16 vector; /* kernel uses to write allocated vector */
>> >> + u32 vector; /* kernel uses to write allocated vector */
>> >> u16 entry; /* driver uses to specify entry, OS writes */
>> >> };
>> >
>> > I see a lot of smoke about this, but not a clear consensus. It sounds
>> > like no one objects to making the MSI vector field 32 bits, but there
>> > were a lot of concerns about IRQ vector naming/numbering in general? And
>> > I assume this particular patch isn't 2.6.27 material...
>>
>> There is no need for this particular patch in 2.6.27.
>> It is a trivial bug fix so it can go in whenever.
>>
>> Mostly this is a discussion about code that should land in the 2.6.28
>> timeframe assuming all is well.
>
> Ok, thanks Eric. I'll queue it up for linux-next after the next 2.6.27 pull.

Are you maintaining the pci tree now? I get lost who is doing what some days.

Eric

2008-08-22 00:35:55

[permalink] [raw]

Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Thursday, August 21, 2008 5:11 pm Eric W. Biederman wrote:
> Jesse Barnes <[email protected]> writes:
> > On Thursday, August 21, 2008 1:47 pm Eric W. Biederman wrote:
> >> Jesse Barnes <[email protected]> writes:
> >> > On Friday, August 15, 2008 7:36 pm Yinghai Lu wrote:
> >> >> we are using 28bit pci (bus/dev/fn + 12 bits) as irq number, so the
> >> >> cache for irq number should be 32 bit too.
> >> >>
> >> >> Signed-off-by: Yinghai Lu <[email protected]>
> >> >> Cc: Andrew Vasquez <[email protected]>
> >> >>
> >> >> ---
> >> >> drivers/scsi/qla2xxx/qla_def.h | 2 +-
> >> >> include/linux/pci.h | 2 +-
> >> >> 2 files changed, 2 insertions(+), 2 deletions(-)
> >> >>
> >> >> Index: linux-2.6/drivers/scsi/qla2xxx/qla_def.h
> >> >> ===================================================================
> >> >> --- linux-2.6.orig/drivers/scsi/qla2xxx/qla_def.h
> >> >> +++ linux-2.6/drivers/scsi/qla2xxx/qla_def.h
> >> >> @@ -2109,7 +2109,7 @@ struct scsi_qla_host;
> >> >>
> >> >> struct qla_msix_entry {
> >> >> int have_irq;
> >> >> - uint16_t msix_vector;
> >> >> + uint32_t msix_vector;
> >> >> uint16_t msix_entry;
> >> >> };
> >> >>
> >> >> Index: linux-2.6/include/linux/pci.h
> >> >> ===================================================================
> >> >> --- linux-2.6.orig/include/linux/pci.h
> >> >> +++ linux-2.6/include/linux/pci.h
> >> >> @@ -730,7 +730,7 @@ enum pci_dma_burst_strategy {
> >> >> };
> >> >>
> >> >> struct msix_entry {
> >> >> - u16 vector; /* kernel uses to write allocated vector */
> >> >> + u32 vector; /* kernel uses to write allocated vector */
> >> >> u16 entry; /* driver uses to specify entry, OS writes */
> >> >> };
> >> >
> >> > I see a lot of smoke about this, but not a clear consensus. It sounds
> >> > like no one objects to making the MSI vector field 32 bits, but there
> >> > were a lot of concerns about IRQ vector naming/numbering in general?
> >> > And I assume this particular patch isn't 2.6.27 material...
> >>
> >> There is no need for this particular patch in 2.6.27.
> >> It is a trivial bug fix so it can go in whenever.
> >>
> >> Mostly this is a discussion about code that should land in the 2.6.28
> >> timeframe assuming all is well.
> >
> > Ok, thanks Eric. I'll queue it up for linux-next after the next 2.6.27
> > pull.
>
> Are you maintaining the pci tree now? I get lost who is doing what some
> days.

Yep, since af40b485ea2d957ae2f237ab0e33539ae8f29562.

--
Jesse Barnes, Intel Open Source Technology Center

2008-08-27 23:35:31