2003-01-04 15:12:48

by Steffen Persvold

[permalink] [raw]
Subject: NAPI and tg3

Hi guys,

I have access to 8 Dell 2650s with onboard Broadcom BCM5701 chips. They
are quipped with Dual 2.4 GHz Xeon processors and 1GB of RAM. I'm running
RedHat 7.3, but with a stock 2.4.20 kernel.

As I understand it the tg3 driver is using NAPI on the 2.4.20 kernel
(dev->poll). I've been experiencing bad performance (low bandwidth) on
cluster applications running with LAM for example, but the problem
manifest itself if you run two bandwidth needy applications in parallel
on two machines (i.e two processes on each machine, one per processor)
using Gbe.

I've disabled the NAPI mode and went back to the old interrupt method and
this works much better (i.e the bandwidth is now evenly distributed
between the two applications).

What could be the cause of this problem ? Is it NAPI itself (doing RX
under scheduler control) or is it something else (for example lock
contetion).

Any ideas ?

Thanks,
--
Steffen Persvold | Scali AS
mailto:[email protected] | http://www.scali.com
Tel: (+47) 2262 8950 | Olaf Helsets vei 6
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY



2003-01-06 14:49:02

by Steffen Persvold

[permalink] [raw]
Subject: Re: NAPI and tg3

On Sat, 4 Jan 2003, Steffen Persvold wrote:

> Hi guys,
>
> I have access to 8 Dell 2650s with onboard Broadcom BCM5701 chips. They
> are quipped with Dual 2.4 GHz Xeon processors and 1GB of RAM. I'm running
> RedHat 7.3, but with a stock 2.4.20 kernel.
>
> As I understand it the tg3 driver is using NAPI on the 2.4.20 kernel
> (dev->poll). I've been experiencing bad performance (low bandwidth) on
> cluster applications running with LAM for example, but the problem
> manifest itself if you run two bandwidth needy applications in parallel
> on two machines (i.e two processes on each machine, one per processor)
> using Gbe.
>
> I've disabled the NAPI mode and went back to the old interrupt method and
> this works much better (i.e the bandwidth is now evenly distributed
> between the two applications).
>
> What could be the cause of this problem ? Is it NAPI itself (doing RX
> under scheduler control) or is it something else (for example lock
> contention).
>
> Any ideas ?

Hi again,

I discovered that if I renice the ksoftirqd processes to level 0, the
performance was actually better with the NAPI enabled driver compared to
the one without (as was intended my NAPI IIRC). With the default nice
level (19) on the ksoftirqd processes, the performance on multithreaded
programs was pretty lousy with the NAPI enabled driver.

Any reason why the ksoftirqd shouldn't be nice level 0 by default ? Is
this already fixed in 2.4.21-pre series ?

Regards,
--
Steffen Persvold | Scali AS
mailto:[email protected] | http://www.scali.com
Tel: (+47) 2262 8950 | Olaf Helsets vei 6
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY

2003-01-06 15:43:06

by Alan

[permalink] [raw]
Subject: Re: NAPI and tg3

On Mon, 2003-01-06 at 15:00, Steffen Persvold wrote:
> I discovered that if I renice the ksoftirqd processes to level 0, the
> performance was actually better with the NAPI enabled driver compared to
> the one without (as was intended my NAPI IIRC). With the default nice
> level (19) on the ksoftirqd processes, the performance on multithreaded
> programs was pretty lousy with the NAPI enabled driver.
>
> Any reason why the ksoftirqd shouldn't be nice level 0 by default ? Is
> this already fixed in 2.4.21-pre series ?

Hack the code to only fall back to ksoftirqd when there are say 10 rather
than 1 pending event and it should perform even better but still handle
overload properly

2003-01-06 16:01:08

by Steffen Persvold

[permalink] [raw]
Subject: Re: NAPI and tg3

On 6 Jan 2003, Alan Cox wrote:

> On Mon, 2003-01-06 at 15:00, Steffen Persvold wrote:
> > I discovered that if I renice the ksoftirqd processes to level 0, the
> > performance was actually better with the NAPI enabled driver compared to
> > the one without (as was intended my NAPI IIRC). With the default nice
> > level (19) on the ksoftirqd processes, the performance on multithreaded
> > programs was pretty lousy with the NAPI enabled driver.
> >
> > Any reason why the ksoftirqd shouldn't be nice level 0 by default ? Is
> > this already fixed in 2.4.21-pre series ?
>
> Hack the code to only fall back to ksoftirqd when there are say 10 rather
> than 1 pending event and it should perform even better but still handle
> overload properly
>

Ok I can try that, but what about the nice level of ksoftirqd ? Any
specific reason for it beeing 19 (lowest priority) and not 0 (equally to
most other processes in the system) ?

Regards,
--
Steffen Persvold | Scali AS
mailto:[email protected] | http://www.scali.com
Tel: (+47) 2262 8950 | Olaf Helsets vei 6
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY

2003-01-06 17:05:23

by Alan

[permalink] [raw]
Subject: Re: NAPI and tg3

> Ok I can try that, but what about the nice level of ksoftirqd ? Any
> specific reason for it beeing 19 (lowest priority) and not 0 (equally to
> most other processes in the system) ?

Its triggered (in theory but not practice) only when we are overloaded, in
which case we want to do other *useful* work first rather than using all
the cpu to process requests we can't fulfill

2003-01-07 15:12:54

by Steffen Persvold

[permalink] [raw]
Subject: Re: NAPI and tg3

On 6 Jan 2003, Alan Cox wrote:

> > Ok I can try that, but what about the nice level of ksoftirqd ? Any
> > specific reason for it beeing 19 (lowest priority) and not 0 (equally to
> > most other processes in the system) ?
>
> Its triggered (in theory but not practice) only when we are overloaded, in
> which case we want to do other *useful* work first rather than using all
> the cpu to process requests we can't fulfill
>

I've also tried the NAPI patch for e1000 and it experience the same
performance problem with multithreaded apps. The "NAPI-HOWTO" doesn't
mention that this could be an issue at all. Does any of the NAPI authors
(Jeff ?) have any comments ?

Regards,
--
Steffen Persvold | Scali AS
mailto:[email protected] | http://www.scali.com
Tel: (+47) 2262 8950 | Olaf Helsets vei 6
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY


2003-01-07 18:34:34

by Robert Olsson

[permalink] [raw]
Subject: Re: NAPI and tg3


Steffen Persvold writes:
>
> I've also tried the NAPI patch for e1000 and it experience the same
> performance problem with multithreaded apps. The "NAPI-HOWTO" doesn't
> mention that this could be an issue at all. Does any of the NAPI authors
> (Jeff ?) have any comments ?

Well wasn't ksoftirqd the general solution to schedule softirq's to run
before next interrupt and by putting them under scheduler control the
consecutive softirq's is prevented to monopolize the CPU.

Well you're right the doc may mention this...

Cheers.
--ro

2003-01-07 20:43:35

by Steffen Persvold

[permalink] [raw]
Subject: Re: NAPI and tg3

On Tue, 7 Jan 2003, Robert Olsson wrote:

>
> Steffen Persvold writes:
> >
> > I've also tried the NAPI patch for e1000 and it experience the same
> > performance problem with multithreaded apps. The "NAPI-HOWTO" doesn't
> > mention that this could be an issue at all. Does any of the NAPI authors
> > (Jeff ?) have any comments ?
>
> Well wasn't ksoftirqd the general solution to schedule softirq's to run
> before next interrupt and by putting them under scheduler control the
> consecutive softirq's is prevented to monopolize the CPU.
>
> Well you're right the doc may mention this...

True, but it doesn't say that if you have two applications loaded on
a SMP box, one which is for example constantly receiving and sending data
from/to the network and doing computations on the data (100 % CPU) while
some other app is only doing computations (also 100 % CPU), the ksoftirqd
which should receive packets and refill the TX and RX rings will be put
last in the queue because of its low nice level (19), thus the network
dependent application has very much lower performance than what could be
achieved with a nice level of 0 or even running the interrupt based
mechanism. A nice level of 0 on ksoftirqd is still a heck of a lot better
than interrupt context isn't it ?

One simple example would be to run a network throughput application
such as netpipe, and simultaneously start something like the McAlpin
stream test. You would notice that with a nice level of 19 (on ksoftirqd)
the netpipe application would get very low throughput, while the stream
application would be as optimal as it could get. With a nice level of 0
the netpipe application would get decent throughput and the stream
application would still produce the same result.

Regards,
--
Steffen Persvold | Scali AS
mailto:[email protected] | http://www.scali.com
Tel: (+47) 2262 8950 | Olaf Helsets vei 6
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY

2003-01-07 22:04:16

by Robert Olsson

[permalink] [raw]
Subject: Re: NAPI and tg3


Steffen Persvold writes:

> True, but it doesn't say that if you have two applications loaded on
> a SMP box, one which is for example constantly receiving and sending data
> from/to the network and doing computations on the data (100 % CPU) while
> some other app is only doing computations (also 100 % CPU), the ksoftirqd
> which should receive packets and refill the TX and RX rings will be put
> last in the queue because of its low nice level (19), thus the network
> dependent application has very much lower performance than what could be
> achieved with a nice level of 0 or even running the interrupt based
> mechanism. A nice level of 0 on ksoftirqd is still a heck of a lot better
> than interrupt context isn't it ?


Yes my scripts test/production has even been setting -19 to ksoftirq just
for that reason so I almost forgot this issue so I'm happy you brought
this up. But dev->poll is not the only user of ksoftirq but for heavy
networking it's gets pretty dominant. So we add something to NAPI_HOWTO
and pass the question about ksoftirq default priority to others.

>From a GIGE router in production.

USER PID %CPU %MEM SIZE RSS TTY STAT START TIME COMMAND
root 3 0.2 0.0 0 0 ? RWN Aug 15 602:00 (ksoftirqd_CPU0)
root 232 0.0 7.9 41400 40884 ? S Aug 15 74:12 gated

Cheers.
--ro

2003-01-07 23:55:30

by Steffen Persvold

[permalink] [raw]
Subject: Re: NAPI and tg3

On Tue, 7 Jan 2003, Robert Olsson wrote:

>
> Steffen Persvold writes:
>
> > True, but it doesn't say that if you have two applications loaded on
> > a SMP box, one which is for example constantly receiving and sending data
> > from/to the network and doing computations on the data (100 % CPU) while
> > some other app is only doing computations (also 100 % CPU), the ksoftirqd
> > which should receive packets and refill the TX and RX rings will be put
> > last in the queue because of its low nice level (19), thus the network
> > dependent application has very much lower performance than what could be
> > achieved with a nice level of 0 or even running the interrupt based
> > mechanism. A nice level of 0 on ksoftirqd is still a heck of a lot better
> > than interrupt context isn't it ?
>
>
> Yes my scripts test/production has even been setting -19 to ksoftirq just
> for that reason so I almost forgot this issue so I'm happy you brought
> this up. But dev->poll is not the only user of ksoftirq but for heavy
> networking it's gets pretty dominant. So we add something to NAPI_HOWTO
> and pass the question about ksoftirq default priority to others.
>
> >From a GIGE router in production.
>
> USER PID %CPU %MEM SIZE RSS TTY STAT START TIME COMMAND
> root 3 0.2 0.0 0 0 ? RWN Aug 15 602:00 (ksoftirqd_CPU0)
> root 232 0.0 7.9 41400 40884 ? S Aug 15 74:12 gated
>

I'm happy that atleast someone can agree on something these days, looking
at the latest discussions regarding binary only drivers and GPL could make
one think that all that kernel developers do is to argue about who is
right (allright, most of the quarrelsome people arent't really kernel
developers) ;) So, who takes the decission regarding the ksoftirqd and
when ?


Best regards,
--
Steffen Persvold | Scali AS
mailto:[email protected] | http://www.scali.com
Tel: (+47) 2262 8950 | Olaf Helsets vei 6
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY

2003-01-09 17:03:57

by Robert Olsson

[permalink] [raw]
Subject: Re: NAPI and tg3


Before it's get forgotten...

Cheers.
--ro


--- NAPI_HOWTO.txt.orig 2002-12-24 06:20:31.000000000 +0100
+++ NAPI_HOWTO.txt 2003-01-09 13:25:30.000000000 +0100
@@ -721,6 +721,23 @@



+
+APPENDIX 3: Scheduling issues.
+==============================
+As seen NAPI moves processing to softirq level. Linux uses the ksoftirqd as the
+general solution to schedule softirq's to run before next interrupt and by putting
+them under scheduler control. Also this prevents consecutive softirq's from
+monopolize the CPU. This also have the effect that the priority of ksoftirq needs
+to be considered when running very CPU-intensive applications and networking to
+get the proper balance of softirq/user balance. Increasing ksoftirq priority to 0
+(eventually more) is reported cure problems with low network performance at high
+CPU load.
+
+Most used processes in a GIGE router:
+USER PID %CPU %MEM SIZE RSS TTY STAT START TIME COMMAND
+root 3 0.2 0.0 0 0 ? RWN Aug 15 602:00 (ksoftirqd_CPU0)
+root 232 0.0 7.9 41400 40884 ? S Aug 15 74:12 gated
+
--------------------------------------------------------------------

relevant sites:

2003-01-10 09:01:17

by David Miller

[permalink] [raw]
Subject: Re: NAPI and tg3

From: Robert Olsson <[email protected]>
Date: Thu, 9 Jan 2003 18:21:00 +0100

Before it's get forgotten...

Applied, thanks.