Hi,
I've established a regression in the MSI vector/irq allocation routine for both
i386 and x86_64. Our test labs repeatedly modprobe/rmmod the e1000 driver for
serveral minutes which allocates msi vectors and frees them. These tests have
been running fine until 2.6.19.
git-bisecting I've established that in between commit
04b9267b15206fc902a18de1f78de6c82ca47716 "Eric W. Biederman -- genirq: x86_64
irq: Remove the msi assumption that irq == vector" and commit
f29bd1ba68c8c6a0f50bd678bbd5a26674018f7c "Ingo Molnar -- genirq: convert the
x86_64 architecture to irq-chips" the behaviour broke.
The revisions in between seem to be dependent and give all sorts of other
issues, so it's rather hard for me to bisect that and give trustworthy results.
the e1000 driver hits the 256-mark cycle (I think - it consistently refuses to
do 500 msi irq/vector allocations which is my test case) and throws:
e1000: eth4: e1000_request_irq: Unable to allocate MSI interrupt Error: -16
which is caused by a `if ((err = pci_enable_msi(adapter->pdev))) {` call from
the e1000 driver. It's rather easy to hit this mark with the new 4-port e1000
adapters :).
as for the e1000 code, I can say that even our oldest msi-enabled e1000 driver
works fine with 2.6.18 and under. All kernels from 2.6.19 fail consistently.
I mostly suspect commit 7bd007e480672c99d8656c7b7b12ef0549432c37 at the moment.
Perhaps Eric Biederman can help?
Cheers,
Auke
---
PS After the msi enable fails, things go horribly wrong (of course...) if we
still try to disable+enable more new msi interrupts, but this may just be the
e1000 driver cleanup missing something. perhaps it's relevant:
e1000: 0000:04:00.1: e1000_probe: (PCI Express:2.5Gb/s:32-bit) 00:15:17:0c:0c:7f
e1000: eth5: e1000_probe: Intel(R) PRO/1000 Network Connection
ADDRCONF(NETDEV_UP): eth1: link is not ready
ADDRCONF(NETDEV_UP): eth2: link is not ready
ADDRCONF(NETDEV_UP): eth3: link is not ready
e1000: eth4: e1000_request_irq: Unable to allocate MSI interrupt Error: -16
ADDRCONF(NETDEV_UP): eth4: link is not ready
ADDRCONF(NETDEV_UP): eth5: link is not ready
ACPI: PCI interrupt for device 0000:04:00.1 disabled
ACPI: PCI interrupt for device 0000:04:00.0 disabled
ACPI: PCI interrupt for device 0000:03:00.1 disabled
ACPI: PCI interrupt for device 0000:03:00.0 disabled
ACPI: PCI interrupt for device 0000:00:19.0 disabled
Intel(R) PRO/1000 Network Driver - version 7.2.9-k4-NAPI
Copyright (c) 1999-2006 Intel Corporation.
ACPI: PCI Interrupt 0000:00:19.0[A] -> GSI 20 (level, low) -> IRQ 23
PCI: Setting latency timer of device 0000:00:19.0 to 64
e1000: 0000:00:19.0: e1000_probe: (PCI Express:2.5Gb/s:Width x1) 88:88:88:88:87:
88
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:03:00.0 to 64
e1000: 0000:03:00.0: e1000_probe: (PCI Express:2.5Gb/s:32-bit) 00:15:17:0c:0c:7c
e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt 0000:03:00.1[B] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:03:00.1 to 64
e1000: 0000:03:00.1: e1000_probe: (PCI Express:2.5Gb/s:32-bit) 00:15:17:0c:0c:7d
e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:04:00.0 to 64
e1000: 0000:04:00.0: e1000_probe: (PCI Express:2.5Gb/s:32-bit) 00:15:17:0c:0c:7e
e1000: eth4: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt 0000:04:00.1[B] -> GSI 18 (level, low) -> IRQ 18
PCI: Setting latency timer of device 0000:04:00.1 to 64
e1000: 0000:04:00.1: e1000_probe: (PCI Express:2.5Gb/s:32-bit) 00:15:17:0c:0c:7f
e1000: eth5: e1000_probe: Intel(R) PRO/1000 Network Connection
ADDRCONF(NETDEV_UP): eth1: link is not ready
ADDRCONF(NETDEV_UP): eth2: link is not ready
ADDRCONF(NETDEV_UP): eth3: link is not ready
ADDRCONF(NETDEV_UP): eth4: link is not ready
ADDRCONF(NETDEV_UP): eth5: link is not ready
irq 214: nobody cared (try booting with the "irqpoll" option)
[<c015b25a>] __report_bad_irq+0x2a/0x90
[<c015b38f>] note_interrupt+0xaf/0xe0
[<c015bcf3>] handle_edge_irq+0xd3/0x160
[<c0105dd8>] do_IRQ+0x68/0xd0
[<c0103e0a>] common_interrupt+0x1a/0x20
[<c015007b>] snapshot_write_next+0x5b/0x220
[<c012e862>] __do_softirq+0x62/0x100
[<c012e935>] do_softirq+0x35/0x40
[<c012e985>] irq_exit+0x45/0x50
[<c0105ddd>] do_IRQ+0x6d/0xd0
[<c0103e0a>] common_interrupt+0x1a/0x20
=======================
handlers:
[<f89df160>] (e1000_intr+0x0/0x110 [e1000])
Disabling IRQ #214
ACPI: PCI interrupt for device 0000:04:00.1 disabled
ACPI: PCI interrupt for device 0000:04:00.0 disabled
ACPI: PCI interrupt for device 0000:03:00.1 disabled
irq 214, desc: c05cc120, depth: 1, count: 0, unhandled: 0
->handle_irq(): c015a050, handle_bad_irq+0x0/0x380
->chip(): c05453c0, 0xc05453c0
->action(): 00000000
IRQ_DISABLED set
IRQ_PENDING set
IRQ_MASKED set
unexpected IRQ trap at vector d6
irq 214, desc: c05cc120, depth: 1, count: 0, unhandled: 0
->handle_irq(): c015a050, handle_bad_irq+0x0/0x380
->chip(): c05453c0, 0xc05453c0
->action(): 00000000
IRQ_DISABLED set
IRQ_PENDING set
IRQ_MASKED set
which repeats ad infinitum... perhaps this rings some bells for someone.
Auke Kok wrote:
> Hi,
>
> I've established a regression in the MSI vector/irq allocation routine for both
> i386 and x86_64. Our test labs repeatedly modprobe/rmmod the e1000 driver for
> serveral minutes which allocates msi vectors and frees them. These tests have
> been running fine until 2.6.19.
>
> git-bisecting I've established that in between commit
> 04b9267b15206fc902a18de1f78de6c82ca47716 "Eric W. Biederman -- genirq: x86_64
> irq: Remove the msi assumption that irq == vector" and commit
> f29bd1ba68c8c6a0f50bd678bbd5a26674018f7c "Ingo Molnar -- genirq: convert the
> x86_64 architecture to irq-chips" the behaviour broke.
>
> The revisions in between seem to be dependent and give all sorts of other
> issues, so it's rather hard for me to bisect that and give trustworthy results.
>
> the e1000 driver hits the 256-mark cycle (I think - it consistently refuses to
> do 500 msi irq/vector allocations which is my test case) and throws:
>
> e1000: eth4: e1000_request_irq: Unable to allocate MSI interrupt Error: -16
>
> which is caused by a `if ((err = pci_enable_msi(adapter->pdev))) {` call from
> the e1000 driver. It's rather easy to hit this mark with the new 4-port e1000
> adapters :).
>
> as for the e1000 code, I can say that even our oldest msi-enabled e1000 driver
> works fine with 2.6.18 and under. All kernels from 2.6.19 fail consistently.
>
> I mostly suspect commit 7bd007e480672c99d8656c7b7b12ef0549432c37 at the moment.
> Perhaps Eric Biederman can help?
PS PS
The exact same problem exists when doing "for n in `seq 1 300` ; do modprobe
snd_hda_intel enable_msi ; rmmod snd_hda_intel ; done".
I'm sure it will show up for other msi enabled devices.
Auke Kok <[email protected]> writes:
> Hi,
>
> I've established a regression in the MSI vector/irq allocation routine for both
> i386 and x86_64. Our test labs repeatedly modprobe/rmmod the e1000 driver for
> serveral minutes which allocates msi vectors and frees them. These tests have
> been running fine until 2.6.19.
>
> git-bisecting I've established that in between commit
> 04b9267b15206fc902a18de1f78de6c82ca47716 "Eric W. Biederman -- genirq: x86_64
> irq: Remove the msi assumption that irq == vector" and commit
> f29bd1ba68c8c6a0f50bd678bbd5a26674018f7c "Ingo Molnar -- genirq: convert the
> x86_64 architecture to irq-chips" the behaviour broke.
>
> The revisions in between seem to be dependent and give all sorts of other
> issues, so it's rather hard for me to bisect that and give trustworthy results.
>
> the e1000 driver hits the 256-mark cycle (I think - it consistently refuses to
> do 500 msi irq/vector allocations which is my test case) and throws:
>
> e1000: eth4: e1000_request_irq: Unable to allocate MSI interrupt Error: -16
>
> which is caused by a `if ((err = pci_enable_msi(adapter->pdev))) {` call from
> the e1000 driver. It's rather easy to hit this mark with the new 4-port e1000
> adapters :).
>
> as for the e1000 code, I can say that even our oldest msi-enabled e1000 driver
> works fine with 2.6.18 and under. All kernels from 2.6.19 fail consistently.
>
> I mostly suspect commit 7bd007e480672c99d8656c7b7b12ef0549432c37 at the
> moment. Perhaps Eric Biederman can help?
>
> Cheers,
>
> Auke
>
Does this patch fix it for you? It looks like i386 vector allocate
did not have logic to look through the set of vectors more than once.
The code in this patch is a simplified version of what we have
on x86_64.
Eric
diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
index 2424cc9..6a3875f 100644
--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
@@ -1227,26 +1227,32 @@ static u8 irq_vector[NR_IRQ_VECTORS] __read_mostly = { FIRST_DEVICE_VECTOR , 0 }
static int __assign_irq_vector(int irq)
{
- static int current_vector = FIRST_DEVICE_VECTOR, offset = 0;
- int vector;
+ static int current_vector = FIRST_DEVICE_VECTOR, current_offset = 0;
+ int vector, offset, i;
BUG_ON((unsigned)irq >= NR_IRQ_VECTORS);
if (irq_vector[irq] > 0)
return irq_vector[irq];
- current_vector += 8;
- if (current_vector == SYSCALL_VECTOR)
- current_vector += 8;
-
- if (current_vector >= FIRST_SYSTEM_VECTOR) {
- offset++;
- if (!(offset % 8))
- return -ENOSPC;
- current_vector = FIRST_DEVICE_VECTOR + offset;
- }
-
vector = current_vector;
+ offset = current_offset;
+next:
+ vector += 8;
+ if (vector >= FIRST_SYSTEM_VECTOR) {
+ offset = (offset + 1) % 8;
+ vector = FIRST_DEVICE_VECTOR + offset;
+ }
+ if (vector == current_vector)
+ return -ENOSPC;
+ if (vector == SYSCALL_VECTOR)
+ goto next;
+ for (i = 0; i < NR_IRQ_VECTORS; i++)
+ if (irq_vector[i] == vector)
+ goto next;
+
+ current_vector = vector;
+ current_offset = offset;
irq_vector[irq] = vector;
return vector;
Eric W. Biederman wrote:
> Auke Kok <[email protected]> writes:
>
>> Hi,
>>
>> I've established a regression in the MSI vector/irq allocation routine for both
>> i386 and x86_64. Our test labs repeatedly modprobe/rmmod the e1000 driver for
>> serveral minutes which allocates msi vectors and frees them. These tests have
>> been running fine until 2.6.19.
>>
>> git-bisecting I've established that in between commit
>> 04b9267b15206fc902a18de1f78de6c82ca47716 "Eric W. Biederman -- genirq: x86_64
>> irq: Remove the msi assumption that irq == vector" and commit
>> f29bd1ba68c8c6a0f50bd678bbd5a26674018f7c "Ingo Molnar -- genirq: convert the
>> x86_64 architecture to irq-chips" the behaviour broke.
>>
>> The revisions in between seem to be dependent and give all sorts of other
>> issues, so it's rather hard for me to bisect that and give trustworthy results.
>>
>> the e1000 driver hits the 256-mark cycle (I think - it consistently refuses to
>> do 500 msi irq/vector allocations which is my test case) and throws:
>>
>> e1000: eth4: e1000_request_irq: Unable to allocate MSI interrupt Error: -16
>>
>> which is caused by a `if ((err = pci_enable_msi(adapter->pdev))) {` call from
>> the e1000 driver. It's rather easy to hit this mark with the new 4-port e1000
>> adapters :).
>>
>> as for the e1000 code, I can say that even our oldest msi-enabled e1000 driver
>> works fine with 2.6.18 and under. All kernels from 2.6.19 fail consistently.
>>
>> I mostly suspect commit 7bd007e480672c99d8656c7b7b12ef0549432c37 at the
>> moment. Perhaps Eric Biederman can help?
>
> Does this patch fix it for you? It looks like i386 vector allocate
> did not have logic to look through the set of vectors more than once.
>
> The code in this patch is a simplified version of what we have
> on x86_64.
I highly doubt it - I've seen the problem even on this weeks git on x86_64.
Moreover, I'm at home for the weekend and testing resources are limited :). I'll
see what I can do
Auke
Auke Kok <[email protected]> writes:
> I highly doubt it - I've seen the problem even on this weeks git on
> x86_64. Moreover, I'm at home for the weekend and testing resources are limited
> :). I'll see what I can do
Thanks. There may be more to it than what I suspect, but I could not
reproduce it on x86_64.
Now I may have missed something as I optimized my tested based on the fact
that close and open are triggered when you up and down a network interface.
so I didn't do a complete rmmod, (since my network driver wasn't modular).
Since you have seen this on x86_64 I will look deeper.
Eric
Eric W. Biederman wrote:
> Auke Kok <[email protected]> writes:
>
>> I highly doubt it - I've seen the problem even on this weeks git on
>> x86_64. Moreover, I'm at home for the weekend and testing resources are limited
>> :). I'll see what I can do
>
> Thanks. There may be more to it than what I suspect, but I could not
> reproduce it on x86_64.
>
> Now I may have missed something as I optimized my tested based on the fact
> that close and open are triggered when you up and down a network interface.
> so I didn't do a complete rmmod, (since my network driver wasn't modular).
>
> Since you have seen this on x86_64 I will look deeper.
gah, strike that.
my only x86_64 system here survived the test with latest git tree.
my 386 system here has no msi devices and I can't reinstall my x86_64 system
since it's headless, so I can't test anything until monday. I'll give it a full
test again and see which 2.6.20rc kernels did fail, most likely a much older
tree (I suspect).
Auke
Eric W. Biederman wrote:
> Auke Kok <[email protected]> writes:
>
>> Hi,
>>
>> I've established a regression in the MSI vector/irq allocation routine for both
>> i386 and x86_64. Our test labs repeatedly modprobe/rmmod the e1000 driver for
>> serveral minutes which allocates msi vectors and frees them. These tests have
>> been running fine until 2.6.19.
[snip]
>> I mostly suspect commit 7bd007e480672c99d8656c7b7b12ef0549432c37 at the
>> moment. Perhaps Eric Biederman can help?
>
> Does this patch fix it for you? It looks like i386 vector allocate
> did not have logic to look through the set of vectors more than once.
Yes. A few hundred cycles of loading/unloading snd_hda_intel with enable_msi=1
didn't break it on i386.
I sure hope this can get into 2.6.20!
Auke
> The code in this patch is a simplified version of what we have
> on x86_64.
>
> Eric
>
>
> diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
> index 2424cc9..6a3875f 100644
> --- a/arch/i386/kernel/io_apic.c
> +++ b/arch/i386/kernel/io_apic.c
> @@ -1227,26 +1227,32 @@ static u8 irq_vector[NR_IRQ_VECTORS] __read_mostly = { FIRST_DEVICE_VECTOR , 0 }
>
> static int __assign_irq_vector(int irq)
> {
> - static int current_vector = FIRST_DEVICE_VECTOR, offset = 0;
> - int vector;
> + static int current_vector = FIRST_DEVICE_VECTOR, current_offset = 0;
> + int vector, offset, i;
>
> BUG_ON((unsigned)irq >= NR_IRQ_VECTORS);
>
> if (irq_vector[irq] > 0)
> return irq_vector[irq];
>
> - current_vector += 8;
> - if (current_vector == SYSCALL_VECTOR)
> - current_vector += 8;
> -
> - if (current_vector >= FIRST_SYSTEM_VECTOR) {
> - offset++;
> - if (!(offset % 8))
> - return -ENOSPC;
> - current_vector = FIRST_DEVICE_VECTOR + offset;
> - }
> -
> vector = current_vector;
> + offset = current_offset;
> +next:
> + vector += 8;
> + if (vector >= FIRST_SYSTEM_VECTOR) {
> + offset = (offset + 1) % 8;
> + vector = FIRST_DEVICE_VECTOR + offset;
> + }
> + if (vector == current_vector)
> + return -ENOSPC;
> + if (vector == SYSCALL_VECTOR)
> + goto next;
> + for (i = 0; i < NR_IRQ_VECTORS; i++)
> + if (irq_vector[i] == vector)
> + goto next;
> +
> + current_vector = vector;
> + current_offset = offset;
> irq_vector[irq] = vector;
>
> return vector;
On Mon, 29 Jan 2007, Auke Kok wrote:
>
> Yes. A few hundred cycles of loading/unloading snd_hda_intel with enable_msi=1
> didn't break it on i386.
>
> I sure hope this can get into 2.6.20!
Eric, can you write an explanation, add your sign-off, Auke's ACK, and
send out the result? The patch looked ok to me, but I do want the more
deeper explanation too (and sign-off, of course).
Linus
Linus Torvalds <[email protected]> writes:
> On Mon, 29 Jan 2007, Auke Kok wrote:
>>
>> Yes. A few hundred cycles of loading/unloading snd_hda_intel with enable_msi=1
>> didn't break it on i386.
>>
>> I sure hope this can get into 2.6.20!
>
> Eric, can you write an explanation, add your sign-off, Auke's ACK, and
> send out the result? The patch looked ok to me, but I do want the more
> deeper explanation too (and sign-off, of course).
Sure. Give me a couple of minutes.
Eric
When the world was a simple and static place setting up irqs was easy.
It sufficed to allocate a linux irq number and a find a free cpu
vector we could receive that linux irq on. In those days it was
a safe assumption that any allocated vector was actually in use
so after one global pass through all of the vectors we would have
none left.
These days things are much more dynamic with interrupt controllers
(in the form of MSI or MSI-X) appearing on plug in cards and linux
irqs appearing and disappearing. As these irqs come and go vectors
are allocated and freed, invalidating the ancient assumption that all
allocated vectors stayed in use forever.
So this patch modifies the vector allocator to walk through every
possible vector before giving up, and to check to see if a vector
is in use before assigning it. With these changes we stop leaking
freed vectors and it becomes possible to allocate and free irq vectors
all day long.
This changed was modeled after the vector allocator on x86_64 where
this limitation has already been removed. In essence we don't update
the static variables that hold the position of the last vector we
allocated until have successfully allocated another vector. This
allows us to detect if we have completed one complete scan through
all of the possible vectors.
Acked-by: Auke Kok <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
arch/i386/kernel/io_apic.c | 32 +++++++++++++++++++-------------
1 files changed, 19 insertions(+), 13 deletions(-)
diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
index 2424cc9..6a3875f 100644
--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
@@ -1227,26 +1227,32 @@ static u8 irq_vector[NR_IRQ_VECTORS] __read_mostly = { FIRST_DEVICE_VECTOR , 0 }
static int __assign_irq_vector(int irq)
{
- static int current_vector = FIRST_DEVICE_VECTOR, offset = 0;
- int vector;
+ static int current_vector = FIRST_DEVICE_VECTOR, current_offset = 0;
+ int vector, offset, i;
BUG_ON((unsigned)irq >= NR_IRQ_VECTORS);
if (irq_vector[irq] > 0)
return irq_vector[irq];
- current_vector += 8;
- if (current_vector == SYSCALL_VECTOR)
- current_vector += 8;
-
- if (current_vector >= FIRST_SYSTEM_VECTOR) {
- offset++;
- if (!(offset % 8))
- return -ENOSPC;
- current_vector = FIRST_DEVICE_VECTOR + offset;
- }
-
vector = current_vector;
+ offset = current_offset;
+next:
+ vector += 8;
+ if (vector >= FIRST_SYSTEM_VECTOR) {
+ offset = (offset + 1) % 8;
+ vector = FIRST_DEVICE_VECTOR + offset;
+ }
+ if (vector == current_vector)
+ return -ENOSPC;
+ if (vector == SYSCALL_VECTOR)
+ goto next;
+ for (i = 0; i < NR_IRQ_VECTORS; i++)
+ if (irq_vector[i] == vector)
+ goto next;
+
+ current_vector = vector;
+ current_offset = offset;
irq_vector[irq] = vector;
return vector;
--
1.4.4.1.g278f