2008-11-19 22:58:29

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: NIU driver: Sun x8 Express Quad Gigabit Ethernet Adapter (perf + regression IRQs)


On Thu, 13 Nov 2008, David Miller wrote:

> From: Jesper Dangaard Brouer <[email protected]>
> Date: Thu, 13 Nov 2008 11:29:31 +0100
>
>> Although I'm not happy about the new perf numbers, as I now on a SMP
>> system only can route approx 290 kpps, remember I could route 319 kpps
>> using a single CPU nosmp kernel.
>
> That unfortunately (can be) the cost of SMP :-/

[Regression]

Well that was not the real cause of the performance loss. Because on
kernel 2.6.27 I get really good performance (900-1200kpps) compared to
2.6.28 (git net-2.6).

The cause of this problem (tracked down together with Robert Olsson) is
that on 2.6.28 I have a lot less IRQs available. It seems max 34 IRQs.

Due the reduced number of IRQs the NIU driver cannot get enough IRQs to
the interfaces, and starts to use "IO-APIC" based IRQs.

On kernel 2.6.28:

My eth2 is using 10 IRQs all "PCI-MSI-edge".

BUT my eth3 is using a single IRQ using "IO-APIC-fasteoi" and shared
with the usb driver...

Think thats must be my performance problem on 2.6.28.


> With multi-flow tests, Robert Olsson is getting 4.2 mpps rates with
> NIU and pktgen. That's what this card is designed for, good
> multi-flow workload performance, rather than striving for maximum
> single-flow performance.

[Packet performance]

Yes, I know, I do use pktgen and multi-flows (rand dest IP+port).

For the two drivers NIU and Suns NXGE, my packet per sec performance is
now, on 2.6.27 (with backported NIU fixes).

With NIU driver I can route 900 kpps.

With NXGE driver (and enqueue=NULL hack) I can route 1200 kpps.

Actually I think I can go higher, because I'm limited by my packet rate
generator. I use pktgen (with rand dst IP+port) and can only generate 1200
kpps.

(I have actually ordered some new hardware, so I can get a faster pktgen
machine and perhaps test it as a router too. Also ordered the hardware
because I want to test PCI-express v.2.0. I have a prototype 12-port
gigabit NIC (from hotlava systems) that support PCIe v.2.0 and has 6x
82575 chips (4RX/4TX queues))


Hilsen
Jesper Brouer

--
-------------------------------------------------------------------
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
-------------------------------------------------------------------


2008-11-19 23:11:55

by David Miller

[permalink] [raw]
Subject: Re: NIU driver: Sun x8 Express Quad Gigabit Ethernet Adapter (perf + regression IRQs)

From: Jesper Dangaard Brouer <[email protected]>
Date: Wed, 19 Nov 2008 23:58:12 +0100 (CET)

> Well that was not the real cause of the performance loss. Because
> on kernel 2.6.27 I get really good performance (900-1200kpps)
> compared to 2.6.28 (git net-2.6).
>
> The cause of this problem (tracked down together with Robert Olsson)
> is that on 2.6.28 I have a lot less IRQs available. It seems max 34
> IRQs.
>
> Due the reduced number of IRQs the NIU driver cannot get enough IRQs
> to the interfaces, and starts to use "IO-APIC" based IRQs.

This is almost certainly related to the driver unload bug.

I know you ran into unbuildable/unbootable kernels during a bisect,
but you really need to track down this regression.

There were a lot of IRQ changes, especially on x86. The sequence is
something like:

1) dyn irqs
2) APIC/IO_APIC handling integration
3) by-hand REVERT of dyn irqs, it was done by hand in order to not
lose the #2 changes
4) interrupt remapping support

2008-11-20 19:49:15

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Regression: Bisected, IRQ and MSI allocations screwed without sparse irq

Hi Thomas Gleixner,

I have bisected a regression to your commit
3235e936c0cc3589309280b6f59e5096779adae3,
"x86: remove sparse irq from Kconfig".

Its actually not necessary your fault, as your commit simply removes
the config option HAVE_SPARSE_IRQ. This revels the bug / regression
I'm exposted to.

Guess I should bisect again to find the exact faulty commit, but I'm
rather sick of bisecting at the moment, and though you might have a
better idea whats going wrong. I would rather spend my time
performance tuning the multiqueue routing code...

[The regression]:

During my testing of the Sun Neptune based NICs. On kernel 2.6.27 I
get really good performance (900-1200kpps) compared to 2.6.28 (davem
git net-2.6).

The cause of this problem (tracked down together with Robert Olsson)
is that on 2.6.28 I have a lot less IRQs available. It seems max 34
IRQs. Due the reduced number of IRQs the NIU driver cannot get
enough IRQs to the interfaces, and starts to use "IO-APIC" based
IRQs.

On kernel 2.6.28: My eth2 is using 10 IRQs all "PCI-MSI-edge". BUT
my eth3 is using a single IRQ using "IO-APIC-fasteoi" and shared with
the usb driver. That my performance problem on 2.6.28.

[Other related bugs]:
Is that unloading the "niu" driver will give a kernel BUG during
deallocation og MSI interrupts. (See dmesg output below if interested)

(I have attached full bisect history)

Cheers,
Jesper Brouer

--
-------------------------------------------------------------------
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
-------------------------------------------------------------------


On Wed, 19 Nov 2008, David Miller wrote:
> From: Jesper Dangaard Brouer <[email protected]>
> Date: Wed, 19 Nov 2008 23:58:12 +0100 (CET)
>
>> Well that was not the real cause of the performance loss. Because
>> on kernel 2.6.27 I get really good performance (900-1200kpps)
>> compared to 2.6.28 (git net-2.6).
>>
>> The cause of this problem (tracked down together with Robert Olsson)
>> is that on 2.6.28 I have a lot less IRQs available. It seems max 34
>> IRQs.
>>
>> Due the reduced number of IRQs the NIU driver cannot get enough IRQs
>> to the interfaces, and starts to use "IO-APIC" based IRQs.
>
> This is almost certainly related to the driver unload bug.
>
> I know you ran into unbuildable/unbootable kernels during a bisect,
> but you really need to track down this regression.


------------[ cut here ]------------
kernel BUG at drivers/pci/msi.c:632!
invalid opcode: 0000 [#1] PREEMPT SMP
Modules linked in: ehci_hcd bnx2 uhci_hcd zlib_inflate serio_raw hpilo
niu(-)

Pid: 3036, comm: rmmod Not tainted (2.6.27-bisect #5) ProLiant DL380 G5
EIP: 0060:[<c021ecac>] EFLAGS: 00010286 CPU: 2
EIP is at msi_free_irqs+0xdc/0xe0
EAX: f6b8f860 EBX: 00000030 ECX: f7156ba8 EDX: c0420500
ESI: f7156800 EDI: f7156ba8 EBP: f6431eb4 ESP: f6431ea8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process rmmod (pid: 3036, ti=f6430000 task=f70f9b20 task.ti=f6430000)
Stack:
f7156800 f670c400 f7156800 f6431ebc c021ecb8 f6431ec8 c021ef41 f670c000
f6431edc f809d3f8 f7156800 f80a1ed4 f80a1ed4 f6431ee8 c0219c29 f7156858
f6431ef8 c026b0d4 f7156858 f7156914 f6431f0c c026b197 f80a1ea0 f80a1ed4
Call Trace:
[<c021ecb8>] ? msix_free_all_irqs+0x8/0x10
[<c021ef41>] ? pci_disable_msix+0x31/0x40
[<f809d3f8>] ? niu_pci_remove_one+0x88/0x8a [niu]
[<c0219c29>] ? pci_device_remove+0x19/0x40
[<c026b0d4>] ? __device_release_driver+0x54/0x80
[<c026b197>] ? driver_detach+0x97/0xa0
[<c026a475>] ? bus_remove_driver+0x75/0xa0
[<c026b609>] ? driver_unregister+0x39/0x40
[<c0219e51>] ? pci_unregister_driver+0x21/0x80
[<f809a0ad>] ? niu_exit+0xd/0x10 [niu]
[<c0145d74>] ? sys_delete_module+0x114/0x1d0
[<c016810a>] ? remove_vma+0x3a/0x50
[<c0168c29>] ? do_munmap+0x189/0x1e0
[<c0103229>] ? sysenter_do_call+0x12/0x21
[<c0330000>] ? quirk_disable_msi+0x30/0x50
Code: b7 43 08 8b 53 1c c1 e0 04 01 d0 ba 01 00 00 00 83 c0 0c 89 10 3b 7b
14 75 aa 8b 43 1c e8 3d 92 ef ff eb a0 5b 31 c0 5e 5f 5d c3 <0f> 0b eb fe
55 89 e5 e8 18 ff ff ff 5d c3 8d b6 00 00 00 00 55
EIP: [<c021ecac>] msi_free_irqs+0xdc/0xe0 SS:ESP 0068:f6431ea8
---[ end trace f72de2e283920207 ]---


Attachments:
bisect_IO-APIC.txt (31.75 kB)

2008-11-21 00:35:09

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Regression: Bisected, IRQ and MSI allocations screwed without sparse irq

Jesper,

On Thu, 20 Nov 2008, Jesper Dangaard Brouer wrote:
> I have bisected a regression to your commit
> 3235e936c0cc3589309280b6f59e5096779adae3,
> "x86: remove sparse irq from Kconfig".
>
> Its actually not necessary your fault, as your commit simply removes
> the config option HAVE_SPARSE_IRQ. This revels the bug / regression
> I'm exposted to.

Yup, the bisect result is pretty useless.

> The cause of this problem (tracked down together with Robert Olsson)
> is that on 2.6.28 I have a lot less IRQs available. It seems max 34
> IRQs. Due the reduced number of IRQs the NIU driver cannot get
> enough IRQs to the interfaces, and starts to use "IO-APIC" based
> IRQs.

Can you please try the attached patch ?

Thanks,

tglx

-----
arch/x86/kernel/io_apic.c | 22 +---------------------
1 file changed, 1 insertion(+), 21 deletions(-)

Index: linux-2.6/arch/x86/kernel/io_apic.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/io_apic.c
+++ linux-2.6/arch/x86/kernel/io_apic.c
@@ -3594,27 +3594,7 @@ int __init io_apic_get_redir_entries (in

int __init probe_nr_irqs(void)
{
- int idx;
- int nr = 0;
-#ifndef CONFIG_XEN
- int nr_min = 32;
-#else
- int nr_min = NR_IRQS;
-#endif
-
- for (idx = 0; idx < nr_ioapics; idx++)
- nr += io_apic_get_redir_entries(idx) + 1;
-
- /* double it for hotplug and msi and nmi */
- nr <<= 1;
-
- /* something wrong ? */
- if (nr < nr_min)
- nr = nr_min;
- if (WARN_ON(nr > NR_IRQS))
- nr = NR_IRQS;
-
- return nr;
+ return NR_IRQS;
}

/* --------------------------------------------------------------------------

2008-11-21 10:33:51

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: Regression: Bisected, IRQ and MSI allocations screwed without sparse irq

On Thu, 2008-11-20 at 16:34 -0800, Thomas Gleixner wrote:
> On Thu, 20 Nov 2008, Jesper Dangaard Brouer wrote:
> > I have bisected a regression to your commit
> > 3235e936c0cc3589309280b6f59e5096779adae3,
> > "x86: remove sparse irq from Kconfig".
> >
> > Its actually not necessary your fault, as your commit simply removes
> > the config option HAVE_SPARSE_IRQ. This revels the bug / regression
> > I'm exposted to.
>
> Yup, the bisect result is pretty useless.
>
> > The cause of this problem (tracked down together with Robert Olsson)
> > is that on 2.6.28 I have a lot less IRQs available. It seems max 34
> > IRQs. Due the reduced number of IRQs the NIU driver cannot get
> > enough IRQs to the interfaces, and starts to use "IO-APIC" based
> > IRQs.
>
> Can you please try the attached patch ?

I have tried the patch and it solved the problem! :-)

I'll gladly test other patches from your. Guess this patch needs to be
brushed up before a mainline patch is ready.

My hardware is a HP ProLiant DL380-G5.


> -----
> arch/x86/kernel/io_apic.c | 22 +---------------------
> 1 file changed, 1 insertion(+), 21 deletions(-)
>
> Index: linux-2.6/arch/x86/kernel/io_apic.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/io_apic.c
> +++ linux-2.6/arch/x86/kernel/io_apic.c
> @@ -3594,27 +3594,7 @@ int __init io_apic_get_redir_entries (in
>
> int __init probe_nr_irqs(void)
> {
> - int idx;
> - int nr = 0;
> -#ifndef CONFIG_XEN
> - int nr_min = 32;
> -#else
> - int nr_min = NR_IRQS;
> -#endif
> -
> - for (idx = 0; idx < nr_ioapics; idx++)
> - nr += io_apic_get_redir_entries(idx) + 1;
> -
> - /* double it for hotplug and msi and nmi */
> - nr <<= 1;
> -
> - /* something wrong ? */
> - if (nr < nr_min)
> - nr = nr_min;
> - if (WARN_ON(nr > NR_IRQS))
> - nr = NR_IRQS;
> -
> - return nr;
> + return NR_IRQS;
> }
>

--
Med venlig hilsen / Best regards
Jesper Brouer
ComX Networks A/S
Linux Network developer
Cand. Scient Datalog / MSc.
Author of http://adsl-optimizer.dk
LinkedIn: http://www.linkedin.com/in/brouer

2008-11-21 16:41:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Regression: Bisected, IRQ and MSI allocations screwed without sparse irq

On Fri, 21 Nov 2008, Jesper Dangaard Brouer wrote:
> > Can you please try the attached patch ?
>
> I have tried the patch and it solved the problem! :-)
>
> I'll gladly test other patches from your. Guess this patch needs to be
> brushed up before a mainline patch is ready.

Ok, I queue it for mainline. This solves just the number of irqs
limitation, the rmmod problem still persists, right ?

Thanks,

tglx

2008-11-21 19:35:51

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: Regression: Bisected, IRQ and MSI allocations screwed without sparse irq

On Fri, 21 Nov 2008, Thomas Gleixner wrote:

> On Fri, 21 Nov 2008, Jesper Dangaard Brouer wrote:
>>> Can you please try the attached patch ?
>>
>> I have tried the patch and it solved the problem! :-)
>>
>> I'll gladly test other patches from your. Guess this patch needs to be
>> brushed up before a mainline patch is ready.
>
> Ok, I queue it for mainline. This solves just the number of irqs
> limitation, the rmmod problem still persists, right ?

It solves both the irq limit and the NIU driver unload bug.

We should give it a good description.
I have cooked up a patch with a description below, will you accept that?

Who's tree do you want it to go upsteam via?
(You are listed as one of the X86 maintainers, but Ingo's tree seems more
up-to-date. My patch below is agains DaveM's tree)

Cheers,
Jesper Brouer

--
-------------------------------------------------------------------
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
-------------------------------------------------------------------

Fixing irq limit and NIU driver unload bug.

Removing the config option HAVE_SPARSE_IRQ (commit
3235e936c0cc3589309280b6f59e5096779adae3) revealed a regression that
limited the number of irqs on the system.

Besides limiting the number of IRQ, this also caused unloading of the
NIU driver to fail during msi_free_irqs(). The reduced number of IRQs
caused the NIU driver to use "IO-APIC" based IRQs instead of
"PCI-MSI-edge".

This patch changes probe_nr_irqs() to return NR_IRQS, which is
basically the same as the NOT CONFIG_X86_IO_APIC case. Thus being
fairly safe.

Thus, solving both the irq limit and the NIU driver unload bug.

Tested-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
---
arch/x86/kernel/io_apic.c | 22 +---------------------
1 files changed, 1 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/io_apic.c b/arch/x86/kernel/io_apic.c
index c9513e1..1fec0f9 100644
--- a/arch/x86/kernel/io_apic.c
+++ b/arch/x86/kernel/io_apic.c
@@ -3608,27 +3608,7 @@ int __init io_apic_get_redir_entries (int ioapic)

int __init probe_nr_irqs(void)
{
- int idx;
- int nr = 0;
-#ifndef CONFIG_XEN
- int nr_min = 32;
-#else
- int nr_min = NR_IRQS;
-#endif
-
- for (idx = 0; idx < nr_ioapics; idx++)
- nr += io_apic_get_redir_entries(idx) + 1;
-
- /* double it for hotplug and msi and nmi */
- nr <<= 1;
-
- /* something wrong ? */
- if (nr < nr_min)
- nr = nr_min;
- if (WARN_ON(nr > NR_IRQS))
- nr = NR_IRQS;
-
- return nr;
+ return NR_IRQS;
}

/* --------------------------------------------------------------------------
--
1.5.4.2

2008-11-21 21:12:25

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Regression: Bisected, IRQ and MSI allocations screwed without sparse irq

On Fri, 21 Nov 2008, Jesper Dangaard Brouer wrote:
> > Ok, I queue it for mainline. This solves just the number of irqs
> > limitation, the rmmod problem still persists, right ?
>
> It solves both the irq limit and the NIU driver unload bug.

I don't believe that it solves it. It hides it at the best.

> We should give it a good description.
> I have cooked up a patch with a description below, will you accept that?
>
> Who's tree do you want it to go upsteam via?
> (You are listed as one of the X86 maintainers, but Ingo's tree seems more
> up-to-date. My patch below is agains DaveM's tree)

I queued it already with a description of the irq nr. problem.

The rmmod problem is something different and should be investigated
thoroughly instead of declaring it solved by magic.

Thanks,

tglx

2008-11-21 23:07:09

by David Miller

[permalink] [raw]
Subject: Re: Regression: Bisected, IRQ and MSI allocations screwed without sparse irq

From: Jesper Dangaard Brouer <[email protected]>
Date: Fri, 21 Nov 2008 20:35:32 +0100 (CET)

> On Fri, 21 Nov 2008, Thomas Gleixner wrote:
>
> > On Fri, 21 Nov 2008, Jesper Dangaard Brouer wrote:
> >>> Can you please try the attached patch ?
> >>
> >> I have tried the patch and it solved the problem! :-)
> >>
> >> I'll gladly test other patches from your. Guess this patch needs to be
> >> brushed up before a mainline patch is ready.
> >
> > Ok, I queue it for mainline. This solves just the number of irqs
> > limitation, the rmmod problem still persists, right ?
>
> It solves both the irq limit and the NIU driver unload bug.

I think it "solves" the unload BUG because the driver never has
to fallback to IO_APIC irqs and abort trying to use MSI-X
any longer.

Only the IRQ limit bug is fixed by Thomas's patch.