Hi Manfred,
Today I had an opportunity to perform some functional and performance
tests on a SunFire X2100, which is a PCI Express-based Dual Core Opteron
equipped with a broadcom gigabit LAN chip (tg3) and an Nforce4 Pro
chipset offering a second LAN port (forcedeth).
With the forcedeth driver version 0.30 as shipped in 2.4.33-pre*, ping
was OK, but the driver hanged after a few megabytes of Gigabit-speed
outgoing traffic, with some "NETDEV transmit time out" messages. It was
necessary to unload then reload it. So I decided it was time to give
your updates a try.
I started from the latest backport you sent in september (0.42) and
incrementally applied 2.6 updates. I stopped at 0.50 which provides
VLAN support, because after this one, there are some 2.4-incompatible
changes (64bit consistent memory allocation for rings, and MSI/MSIX
support).
It compiled and worked immediately, and now shows very high performance !
Right now, there's a test running at 925 Mbps and 400 kpps, but I could
reach 1.09 Mpps of input traffic and full gigabit speed above 400 bytes
per packet without any trouble. The test above has been running for 6
hours now, which represents 2.5 TB and 8.6 billions of packets. Moreover,
the test only consumes 15% CPU after I set the poll_interval limit to 10
microseconds.
However, I had to increase the max_interrupt_work to 10, because at 5,
I would receive the following message almost every second (both in 0.30
and 0.50) : "too many iterations (6) in nv_nic_irq". At 10, it never
happened.
I guess we should raise it both in 2.4 and 2.6, but I did not do it in
the patch below because I wanted the code to be as similar as possible
between the two trees.
Given that the driver is almost not usable in 0.30, I suspect that very
few people currently use it as-is with kernel 2.4. And since it shows
excellent performance in 0.50, and does not exhibit any instability, I
think that there would be far more benefits than risks in merging it.
How do you feel about this ?
BTW, I have CCed John Linville who maintains network drivers for RHEL3
and who might be interested too.
Regard,
Willy
Hi Willy,
Willy Tarreau wrote:
>I started from the latest backport you sent in september (0.42) and
>incrementally applied 2.6 updates. I stopped at 0.50 which provides
>VLAN support, because after this one, there are some 2.4-incompatible
>changes (64bit consistent memory allocation for rings, and MSI/MSIX
>support).
>
>
>
I agree, 2.4 needs a backport. Either a full backport as you did, or a
minimal one-liner fix.
Right now, the driver is not usable due to an incorrect initialization.
Or to be more accurate:
# modprobe
# ifup
works.
But
# modprobe
# ifup
# ifdown
# ifup
causes a misconfiguration, and the nic hangs hard after a few MB. And
recent distros do the equivalent of ifup/ifdown/ifup somewhere in the
initialization.
Marcelo: Do you need a one-liner, or could you apply a large backport patch?
--
Manfred
On Wed, May 31, 2006 at 07:50:32AM +0200, Manfred Spraul wrote:
> Hi Willy,
>
> Willy Tarreau wrote:
>
> >I started from the latest backport you sent in september (0.42) and
> >incrementally applied 2.6 updates. I stopped at 0.50 which provides
> >VLAN support, because after this one, there are some 2.4-incompatible
> >changes (64bit consistent memory allocation for rings, and MSI/MSIX
> >support).
> >
> >
> >
> I agree, 2.4 needs a backport. Either a full backport as you did, or a
> minimal one-liner fix.
> Right now, the driver is not usable due to an incorrect initialization.
> Or to be more accurate:
> # modprobe
> # ifup
> works.
> But
> # modprobe
> # ifup
> # ifdown
> # ifup
> causes a misconfiguration, and the nic hangs hard after a few MB. And
> recent distros do the equivalent of ifup/ifdown/ifup somewhere in the
> initialization.
That's what I read in one of the changelogs, but I'm not sure at all that
it's what happened, because I had the problem after an ifup only. What I
was doing with this box was pure performance tests which drew me to compare
the broadcom and nforce performance. My tests measured 3 creteria :
- number of HTTP/1.0 hits/s
- maximum data rate
- maximum packets/s
on tg3, I got around 45 khits/s, 949 Mbps (TCP, =1.0 Gbps on wire) and
1.05 Mpps receive (I want to build a high speed load-balancer and a sniffer).
This was stable.
On the nforce, I tried with the hits/s first because it's a good indication
of hardware-based and driver-based optimizations. It reached 18 khits/s with
a lot of difficulty and the machine was stuck at 100% of one CPU. But it ran
for a few minutes like this. Then I tried data rate (which is the same test
with 1MB objects), and it failed after about 2 seconds and few megabytes (or
hundreds of megabytes) transferred.
I had to reboot to get it to work again. And I'm fairly sure that I did not
do down/up this time as well, but the test came to the same end.
That's why I'm not sure at all that the one-liner will be enough.
Moreover, after the update, I reached the same performance as with the
broadcom, with a slight improvement on packet reception (1.09 Mpps), and
low CPU usage (15%). So basically, the upgrade rendered the driver from
barely usable for SSH to very performant.
> Marcelo: Do you need a one-liner, or could you apply a large backport
> patch?
I would really vote for the full backport, and I can break it into pieces
if needed (I have them at hand, just have to re-inject the changelogs).
However, I have separate changes from 0.42 to 0.50, because I started
with your 0.30-0.42 backport patch.
I have this machine till the end of the week, so I can perform other tests
if you're interested in trying specific things.
> --
> Manfred
Cheers,
Willy
On Wed, May 31, 2006 at 07:54:38AM +0200, Willy Tarreau wrote:
> On Wed, May 31, 2006 at 07:50:32AM +0200, Manfred Spraul wrote:
> > Hi Willy,
> >
> > Willy Tarreau wrote:
> >
> > >I started from the latest backport you sent in september (0.42) and
> > >incrementally applied 2.6 updates. I stopped at 0.50 which provides
> > >VLAN support, because after this one, there are some 2.4-incompatible
> > >changes (64bit consistent memory allocation for rings, and MSI/MSIX
> > >support).
> > >
> > >
> > >
> > I agree, 2.4 needs a backport. Either a full backport as you did, or a
> > minimal one-liner fix.
> > Right now, the driver is not usable due to an incorrect initialization.
> > Or to be more accurate:
> > # modprobe
> > # ifup
> > works.
> > But
> > # modprobe
> > # ifup
> > # ifdown
> > # ifup
> > causes a misconfiguration, and the nic hangs hard after a few MB. And
> > recent distros do the equivalent of ifup/ifdown/ifup somewhere in the
> > initialization.
>
> That's what I read in one of the changelogs, but I'm not sure at all that
> it's what happened, because I had the problem after an ifup only. What I
> was doing with this box was pure performance tests which drew me to compare
> the broadcom and nforce performance. My tests measured 3 creteria :
>
> - number of HTTP/1.0 hits/s
> - maximum data rate
> - maximum packets/s
>
> on tg3, I got around 45 khits/s, 949 Mbps (TCP, =1.0 Gbps on wire) and
> 1.05 Mpps receive (I want to build a high speed load-balancer and a sniffer).
> This was stable.
>
> On the nforce, I tried with the hits/s first because it's a good indication
> of hardware-based and driver-based optimizations. It reached 18 khits/s with
> a lot of difficulty and the machine was stuck at 100% of one CPU. But it ran
> for a few minutes like this. Then I tried data rate (which is the same test
> with 1MB objects), and it failed after about 2 seconds and few megabytes (or
> hundreds of megabytes) transferred.
>
> I had to reboot to get it to work again. And I'm fairly sure that I did not
> do down/up this time as well, but the test came to the same end.
>
> That's why I'm not sure at all that the one-liner will be enough.
>
> Moreover, after the update, I reached the same performance as with the
> broadcom, with a slight improvement on packet reception (1.09 Mpps), and
> low CPU usage (15%). So basically, the upgrade rendered the driver from
> barely usable for SSH to very performant.
>
> > Marcelo: Do you need a one-liner, or could you apply a large backport
> > patch?
>
> I would really vote for the full backport, and I can break it into pieces
> if needed (I have them at hand, just have to re-inject the changelogs).
> However, I have separate changes from 0.42 to 0.50, because I started
> with your 0.30-0.42 backport patch.
>
> I have this machine till the end of the week, so I can perform other tests
> if you're interested in trying specific things.
Since v2.4.33 should be out RSN, my opinion is that applying the one-liner
to fix the bringup problem for now is more prudent..
Full patch could go into v2.4.34...
--- 2.6/drivers/net/forcedeth.c 2005-08-14 11:17:03.000000000 +0200
+++ build-2.6/drivers/net/forcedeth.c 2005-08-14 11:16:53.000000000 +0200
@@ -2178,6 +2180,9 @@
writel(NVREG_MIISTAT_MASK, base + NvRegMIIStatus);
dprintk(KERN_INFO "startup: got 0x%08x.\n", miistat);
}
+ /* set linkspeed to invalid value, thus force nv_update_linkspeed
+ * to init hw */
+ np->linkspeed = 0;
ret = nv_update_linkspeed(dev);
nv_start_rx(dev);
nv_start_tx(dev);
On Wed, May 31, 2006 at 09:50:38PM +0200, Manfred Spraul wrote:
> Marcelo Tosatti wrote:
>
> >Since v2.4.33 should be out RSN, my opinion is that applying the
> >one-liner to fix the bringup problem for now is more prudent..
> >
> >
> >
> It's attached. Untested, but it should work. Just the relevant hunk from
> the 0.42 patch.
I will test it tomorrow morning. John might be interested in merging it too,
as I have checked today that RHEL3 was affected by the same problem (rmmod
followed by modprobe).
> But I would disagree with waiting for 2.3.34 for a full backport:
> 0.30 basically doesn't work, thus the update to 0.50 would be a big step
> forward - it can't be worse that 0.30.
Seconded !
Manfred, if you have some corner cases in mind, are aware of anything which
might sometimes break, or have a few experimental patches to try, I'm OK for
a few tests while I have the machine (it's SMP BTW).
> --
> Manfred
Cheers,
Willy