2004-01-19 03:35:38

by Mark Williams (MWP)

[permalink] [raw]
Subject: TG3: very high CPU usage

Greetings all,

Has the TG3 driver been well tested with the AC9100 and compatible gigabit NIC chipsets?

iperf, between a 2.6.0 box and a WinXP box (both running Netgear GA302Ts with the AC9100), shows max throughput of 35MB/sec.

However, when using Apache or any FTP client/daemon, the TG3 driver appears to be VERY slow maxing out CPU usage at 100% while only transfering at around 12MB/sec.
This applies for both incoming or outgoing data.

2.6.1 behaves worse, using 100% CPU usage to maintain approx 9MB/sec rates.

Ive tried other NICs, etc and confirmed that it is a problem with the TG3 driver.

Is this a known problem?

Thanks,
Mark Williams.


2004-01-19 11:34:56

by Andreas Hartmann

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

Mark Williams (MWP) wrote:
[...]
> However, when using Apache or any FTP client/daemon, the TG3 driver appears to be VERY slow maxing out CPU usage at 100% while only transfering at around 12MB/sec.
> This applies for both incoming or outgoing data.

[...]

> Ive tried other NICs, etc and confirmed that it is a problem with the TG3 driver.

I saw the same problem with the bcm-driver (Kernel 2.4.x) shipped with
SuSE 9 / SLES 8. Testcase was the initial mirror of a 10 GB partition on a
raid5 serveraid / XSeries 235 (2 way) to the same hardware on the remote
machine using both times the onboard NIC (Broadcom GBit Ethernet) via drbd:
100% CPU usage, 12 MB/s, machine is nearly death.


Regards,
Andreas Hartmann

2004-01-20 03:54:54

by Mark Williams (MWP)

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

> Mark Williams (MWP) wrote:
> [...]
> >However, when using Apache or any FTP client/daemon, the TG3 driver
> >appears to be VERY slow maxing out CPU usage at 100% while only
> >transfering at around 12MB/sec.
> >This applies for both incoming or outgoing data.
>
> [...]
>
> >Ive tried other NICs, etc and confirmed that it is a problem with the TG3
> >driver.
>
> I saw the same problem with the bcm-driver (Kernel 2.4.x) shipped with
> SuSE 9 / SLES 8. Testcase was the initial mirror of a 10 GB partition on a
> raid5 serveraid / XSeries 235 (2 way) to the same hardware on the remote
> machine using both times the onboard NIC (Broadcom GBit Ethernet) via drbd:
> 100% CPU usage, 12 MB/s, machine is nearly death.

Well im glad someone else also has this problem.

Any of the TG3 maintainers have an idea as to whats causing it?

Im handy with C, but nowhere near good enough to go hacking away at the driver.
I would be happy to help test new drivers if needed.

Thanks.

2004-01-20 09:20:19

by Andreas Hartmann

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

Hi,

I searched for tg3 in lkml and found one more posting, dealing with these
problems (subject):

bcm5705 with tg3 driver and high rx load -> bad system responsiveness

There really seems to be a problem. Ronald Wahl pointed out, that the
driver from
http://www.broadcom.com/drivers/downloaddrivers.php does not have the
problem. Maybe, we could both look for drivers from the hardware producer
and test them? I will do it when I'm back at work in two weeks.


Regards,
Andreas Hartmann

2004-01-20 09:44:43

by Lincoln Dale

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

[you may want to use [email protected] instead of linux-kernel; its
possible that the tg3 folk lost your email in the flood]

At 08:17 PM 20/01/2004, Andreas Hartmann wrote:
>Hi,
>
>I searched for tg3 in lkml and found one more posting, dealing with these
>problems (subject):
>
>bcm5705 with tg3 driver and high rx load -> bad system responsiveness
>
>There really seems to be a problem. Ronald Wahl pointed out, that the
>driver from
>http://www.broadcom.com/drivers/downloaddrivers.php does not have the
>problem. Maybe, we could both look for drivers from the hardware producer
>and test them? I will do it when I'm back at work in two weeks.

how exactly are you "triggering" the high CPU load? i.e. what is the
server doing? file-sharing? NFS? CIFS? something else?

i have LOTS of IBM xSeries servers (IBM x335, x345, x440), all of which
have Broadcom BCM 5700 (tg3) NICs.
i drive them all at wire-rate gig-e with iSCSI.

i'm yet to see any 'excessive' CPU load associated with tg3 relative to
tigon2 (AceNIC2) and Intel e1000 NICs.


cheers,

lincoln.

2004-01-20 10:16:25

by Mark Williams (MWP)

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

> [you may want to use [email protected] instead of linux-kernel; its
> possible that the tg3 folk lost your email in the flood]
>
> At 08:17 PM 20/01/2004, Andreas Hartmann wrote:
> >Hi,
> >
> >I searched for tg3 in lkml and found one more posting, dealing with these
> >problems (subject):
> >
> >bcm5705 with tg3 driver and high rx load -> bad system responsiveness
> >
> >There really seems to be a problem. Ronald Wahl pointed out, that the
> >driver from
> >http://www.broadcom.com/drivers/downloaddrivers.php does not have the
> >problem. Maybe, we could both look for drivers from the hardware producer
> >and test them? I will do it when I'm back at work in two weeks.
>
> how exactly are you "triggering" the high CPU load? i.e. what is the
> server doing? file-sharing? NFS? CIFS? something else?

Any transfer (Apache, FTP, Samaba), causes it.

> i have LOTS of IBM xSeries servers (IBM x335, x345, x440), all of which
> have Broadcom BCM 5700 (tg3) NICs.
> i drive them all at wire-rate gig-e with iSCSI.
>
> i'm yet to see any 'excessive' CPU load associated with tg3 relative to
> tigon2 (AceNIC2) and Intel e1000 NICs.

It might not effect those cards.
I think the TG3 driver was changed to support the card im trying to use (Netgear GA302T) and similar.

2004-01-20 12:09:44

by Lincoln Dale

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

At 09:16 PM 20/01/2004, Mark Williams (MWP) wrote:
> > i have LOTS of IBM xSeries servers (IBM x335, x345, x440), all of which
> > have Broadcom BCM 5700 (tg3) NICs.
> > i drive them all at wire-rate gig-e with iSCSI.
> >
> > i'm yet to see any 'excessive' CPU load associated with tg3 relative to
> > tigon2 (AceNIC2) and Intel e1000 NICs.
>
>It might not effect those cards.
>I think the TG3 driver was changed to support the card im trying to use
>(Netgear GA302T) and similar.

curious.

i remember from the Tigon2 days, it didn't matter if you used a NetGear
card, an Alteon card, or an Alteon card ripped out of the inside of an
ACEDirector switch -- they were all the same reference design.

i don't believe that anyone using the bcm5700 would deviate significantly
beyond the reference design - there wouldn't be any reason to.

(the only variants are probably due to dual-port versions ... of course,
i'm sure the tg3 driver authors will now correct me on the differences.
<grin>).


cheers,

lincoln.

2004-01-20 12:34:18

by JG

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

hi,

> iperf, between a 2.6.0 box and a WinXP box (both running Netgear GA302Ts with the AC9100), shows max throughput of 35MB/sec.

i have also two boxes (one with 2.6.0, the other one 2.6.1-mm2) equipped with netgear ga302t cards (x-over cable).
i don't see a very high cpu usage, but since upgrading to 2.6.x kernels i sometimes have really weird speed issues. i often only get transfer rates of about ~200-300 kilobytes/second...yes, and this over a gigabit interface, tested over ftp.
i'm also running a nfs server on the 2.6.1-mm2 box, the 2.6.0 pc is the client, but again, sometimes it's *very* slow. if i reboot my 2.6.1-mm2 box (the other one is a server which can't be rebooted) it seems to be fine for some time.

i didn't have such problems with 2.4.19 kernels on both pcs, there i got about 30-35MB/s over ftp without any problems, so i don't think it's hardware related.

lspci -v
2.6.1-mm2:
00:09.0 Ethernet controller: Altima (nee Broadcom) AC9100 Gigabit Ethernet (rev 15)
Subsystem: Netgear: Unknown device 302a
Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 16
Memory at cffe0000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [40] PCI-X non-bridge device.
Capabilities: [48] Power Management version 2
Capabilities: [50] Vital Product Data
Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-

2.6.0:
same as above, only other interrupt

this is also something i don't know how to debug, it is on the 2.6.0 box with an uptime of 7 days.
ifconfig:
eth1 Link encap:Ethernet HWaddr 00:09:5B:1F:1F:BC
inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:217871027 errors:2769019 dropped:0 overruns:0 frame:2771160
TX packets:150029615 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2016894721 (1923.4 Mb) TX bytes:1073040436 (1023.3 Mb)
Interrupt:11

how can i find out where these errors come from?

thx,
JG


Attachments:
(No filename) (2.05 kB)
(No filename) (189.00 B)
Download all attachments

2004-01-20 23:14:44

by Lincoln Dale

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

At 11:33 PM 20/01/2004, JG wrote:
>i have also two boxes (one with 2.6.0, the other one 2.6.1-mm2) equipped
>with netgear ga302t cards (x-over cable).
>i don't see a very high cpu usage, but since upgrading to 2.6.x kernels i
>sometimes have really weird speed issues. i often only get transfer rates
>of about ~200-300 kilobytes/second...yes, and this over a gigabit
>interface, tested over ftp.
>i'm also running a nfs server on the 2.6.1-mm2 box, the 2.6.0 pc is the
>client, but again, sometimes it's *very* slow. if i reboot my 2.6.1-mm2
>box (the other one is a server which can't be rebooted) it seems to be
>fine for some time.
>
>i didn't have such problems with 2.4.19 kernels on both pcs, there i got
>about 30-35MB/s over ftp without any problems, so i don't think it's
>hardware related.

IBM x335 server (dual P4 Xeons @ 2.4GHz), BCM 5702 onboard 2 x 10/100/1000,
connected via copper 1000baseT to a Cisco Catalyst 3750 ethernet switch.
running ttcp between two hosts shows wire-rate @ 17% CPU. gig-e is not
using jumbo frames:

[root@mel-stglab-host31 root]# ttcp -t -l 65536 -v -b 2097152 -s -D
-n100000 10.67.16.91
ttcp-t: buflen=65536, nbuf=100000, align=16384/0, port=5001,
sockbufsize=2097152 tcp -> 10.67.16.91
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: nodelay
ttcp-t: connect
ttcp-t: 6553600000 bytes in 58.42 real seconds = 109558.82 KB/sec +++
ttcp-t: 6553600000 bytes in 10.38 CPU seconds = 616723.50 KB/cpu sec
ttcp-t: 100000 I/O calls, msec/call = 0.60, calls/sec = 1711.86
ttcp-t: 0.0user 10.3sys 0:58real 17% 0i+0d 0maxrss 0+16pf 79360+131csw
ttcp-t: buffer address 0x8050000

[root@mel-stglab-host31 root]# uname -a
Linux mel-stglab-host31 2.6.0-test9 #13 SMP Mon Nov 3 17:18:17 EST 2003
i686 i686 i386 GNU/Linux

[root@mel-stglab-host31 root]# lspci -v
[..]
03:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703 Gigabit
Ethernet (rev 02)
Subsystem: IBM: Unknown device 026f
Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 24
Memory at f87f0000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [40] PCI-X non-bridge device.
Capabilities: [48] Power Management version 2
Capabilities: [50] Vital Product Data
Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3
Enable-

03:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703 Gigabit
Ethernet (rev 02)
Subsystem: IBM: Unknown device 026f
Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 25
Memory at f87e0000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [40] PCI-X non-bridge device.
Capabilities: [48] Power Management version 2
Capabilities: [50] Vital Product Data
Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3
Enable-

[root@mel-stglab-host31 asm]# ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX: 511
RX Mini: 0
RX Jumbo: 255
TX: 0
Current hardware settings:
RX: 200
RX Mini: 0
RX Jumbo: 100
TX: 511

[root@mel-stglab-host31 asm]# ethtool -i eth0
driver: tg3
version: 2.2
firmware-version:
bus-info: 0000:03:01.0
[root@mel-stglab-host31 asm]# ethtool eth0
Settings for eth0:
Supported ports: [ MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: d
Current message level: 0x000000ff (255)
Link detected: yes
[root@mel-stglab-host31 asm]#


the only thing unusual about this kernel that i'm running is that i don't
use HighMem; i fixup PAGE_OFFSET to 0x80000000 to avoid the performance
overhead of PAE mode.



cheers,

lincoln.

2004-01-21 03:19:50

by Tom Sightler

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

On Tue, 2004-01-20 at 18:13, Lincoln Dale wrote:
> At 11:33 PM 20/01/2004, JG wrote:
> >i have also two boxes (one with 2.6.0, the other one 2.6.1-mm2) equipped
> >with netgear ga302t cards (x-over cable).
> >i don't see a very high cpu usage, but since upgrading to 2.6.x kernels i
> >sometimes have really weird speed issues. i often only get transfer rates
> >of about ~200-300 kilobytes/second...yes, and this over a gigabit
> >interface, tested over ftp.
> >i'm also running a nfs server on the 2.6.1-mm2 box, the 2.6.0 pc is the
> >client, but again, sometimes it's *very* slow. if i reboot my 2.6.1-mm2
> >box (the other one is a server which can't be rebooted) it seems to be
> >fine for some time.
> >
> >i didn't have such problems with 2.4.19 kernels on both pcs, there i got
> >about 30-35MB/s over ftp without any problems, so i don't think it's
> >hardware related.

I'm curious is the people seeing this problem happen to have preempt
enabled in their config. I've noticed that my laptop, which also
happens to have a tg3 based 10/100/1000 card, uses tons of CPU during
trasfers, but only when preempt is enabled.

After looking into this, my Aironet wireless has exactly the same
problem. When preempt is enabled a simple scp transfer running at
approximately maximum speed for 802.11b (7.5Mb/sec) uses almost 70% of
the CPU. The tg3 driver doing the same scp at 40Mb/sec (100Mb ethernet)
uses > 90% of the CPU.

However, turning off preempt and my system runs at approximately the
same speed on wireless (7.5Mb/sec) but only about 5% CPU. The tg3
driver with preempt disabled allows the scp to run at near wire speed
(95-100Mb/sec) and uses only a fraction of the CPU.

Just curious if this might be what others are seeing.

Later,
Tom


2004-01-22 03:57:41

by Lincoln Dale

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

At 02:19 PM 21/01/2004, Tom Sightler wrote:
>I'm curious is the people seeing this problem happen to have preempt
>enabled in their config. I've noticed that my laptop, which also
>happens to have a tg3 based 10/100/1000 card, uses tons of CPU during
>trasfers, but only when preempt is enabled.

nope.
i didn't use PREEMPT=y in my previous test, but i have just done so now.

the difference in CPU utilization when pushing wire-rate gig-e ttcp on this
system (Dual P4 Xeon) with PREEMPT=y or PREEMPT=n is just noise.

you should run oprofile and see where your cpu time is spent.

with preempt enabled:
ttcp-t: 6553600000 bytes in 58.29 real seconds = 109797.21 KB/sec +++
ttcp-t: 6553600000 bytes in 18.47 CPU seconds = 346485.49 KB/cpu sec
ttcp-t: 100000 I/O calls, msec/call = 0.60, calls/sec = 1715.58
ttcp-t: 0.1user 18.3sys 0:58real 31% 0i+0d 0maxrss 0+16pf 8038+1csw

with preempt disabled:
ttcp-t: 6553600000 bytes in 58.42 real seconds = 109543.94 KB/sec +++
ttcp-t: 6553600000 bytes in 18.82 CPU seconds = 340115.47 KB/cpu sec
ttcp-t: 100000 I/O calls, msec/call = 0.60, calls/sec = 1711.62
ttcp-t: 0.0user 18.7sys 0:58real 32% 0i+0d 0maxrss 0+16pf 7985+2csw


--
with PREEMPT=y:

[root@mel-stglab-host31 linux]# zcat /proc/config.gz |grep PREEM
CONFIG_PREEMPT=y
[root@mel-stglab-host31 linux]# sh -c 'opcontrol --start; opcontrol
--reset; ttcp -t -l65536 -s -v -b2097152 -D -n100000 10.67.16.91; opcontrol
--stop; opreport -l /usr/src/linux/vmlinux' 2>&1 | head -30
Profiler running.
Signalling daemon... done
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: nodelay
ttcp-t: connect
ttcp-t: buflen=65536, nbuf=100000, align=16384/0, port=5001,
sockbufsize=2097152 tcp -> 10.67.16.91
ttcp-t: 6553600000 bytes in 58.29 real seconds = 109797.21 KB/sec +++
ttcp-t: 6553600000 bytes in 18.47 CPU seconds = 346485.49 KB/cpu sec
ttcp-t: 100000 I/O calls, msec/call = 0.60, calls/sec = 1715.58
ttcp-t: 0.1user 18.3sys 0:58real 31% 0i+0d 0maxrss 0+16pf 8038+1csw
ttcp-t: buffer address 0x8050000
Stopping profiling.
CPU: P4 / Xeon with 2 hyper-threads, speed 2393.64 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not
stopped) with a unit mask of 0x01 (count cycles when processor is active)
count 100000
samples % symbol name
198646 14.4698 tg3_enable_ints
166885 12.1562 __copy_from_user_ll
93473 6.8088 tg3_interrupt
57435 4.1837 default_idle
50924 3.7094 skb_clone
47592 3.4667 tg3_rx
45568 3.3193 tcp_sendmsg
36887 2.6869 qdisc_restart
35256 2.5681 tg3_poll
31588 2.3009 ip_queue_xmit
29742 2.1665 skb_release_data
29258 2.1312 tcp_write_xmit
27799 2.0249 irq_entries_start
24281 1.7687 alloc_skb
--

with PREEMPT=n:

[root@mel-stglab-host31 linux]# zcat /proc/config.gz |grep PREEM
# CONFIG_PREEMPT is not set

[root@mel-stglab-host31 linux]# sh -c 'opcontrol --start; opcontrol
--reset; ttcp -t -l65536 -s -v -b2097152 -D -n100000 10.67.16.91; opcontrol
--stop; opreport -l /usr/src/linux/vmlinux' 2>&1 | head -30
Profiler running.
Signalling daemon... done
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: nodelay
ttcp-t: connect
ttcp-t: buflen=65536, nbuf=100000, align=16384/0, port=5001,
sockbufsize=2097152 tcp -> 10.67.16.91
ttcp-t: 6553600000 bytes in 58.42 real seconds = 109543.94 KB/sec +++
ttcp-t: 6553600000 bytes in 18.82 CPU seconds = 340115.47 KB/cpu sec
ttcp-t: 100000 I/O calls, msec/call = 0.60, calls/sec = 1711.62
ttcp-t: 0.0user 18.7sys 0:58real 32% 0i+0d 0maxrss 0+16pf 7985+2csw
ttcp-t: buffer address 0x8050000
Stopping profiling.
CPU: P4 / Xeon with 2 hyper-threads, speed 2393.76 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not
stopped) with a unit mask of 0x01 (count cycles when processor is active)
count 100000
samples % symbol name
225502 18.1062 tg3_enable_ints
152548 12.2485 __copy_from_user_ll
85683 6.8797 tg3_interrupt
58248 4.6769 skb_clone
52288 4.1984 tcp_sendmsg
37893 3.0425 default_idle
35365 2.8396 tg3_rx
32555 2.6139 ip_queue_xmit
31778 2.5515 qdisc_restart
30089 2.4159 tg3_poll
29935 2.4036 tcp_v4_rcv
24818 1.9927 tcp_write_xmit
23781 1.9094 tcp_transmit_skb
23329 1.8732 skb_release_data


cheers,

lincoln.

2004-01-22 12:55:39

by JG

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

hi,

> nope.
> i didn't use PREEMPT=y in my previous test, but i have just done so now.

i have preempt enabled on both machines. at the moment i don't have time to recompile my kernel, but i'm going to test 2.6.2-rc1-mm1 soon on one of my machines where i'll disable it.

i'm also going to test my systems with ttcp, because at the moment i'm transferring my backup from the server to my machine with 105.48 kB/s over the gigabit line via ftp :( but cpu is normal on both machines.

JG


Attachments:
(No filename) (488.00 B)
(No filename) (189.00 B)
Download all attachments

2004-01-22 16:06:58

by Tom Sightler

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

On Wed, 2004-01-21 at 22:57, Lincoln Dale wrote:
> At 02:19 PM 21/01/2004, Tom Sightler wrote:
> >I'm curious is the people seeing this problem happen to have preempt
> >enabled in their config. I've noticed that my laptop, which also
> >happens to have a tg3 based 10/100/1000 card, uses tons of CPU during
> >trasfers, but only when preempt is enabled.
>
> nope.
> i didn't use PREEMPT=y in my previous test, but i have just done so now.
>
> the difference in CPU utilization when pushing wire-rate gig-e ttcp on this
> system (Dual P4 Xeon) with PREEMPT=y or PREEMPT=n is just noise.
>
> you should run oprofile and see where your cpu time is spent.

Well, it was just a tought. As it turns out in my case it seems the
problem was related to ACPI and PREEMPT (I still don't understand what
exactly). Everything seems normal with ACPI without PREEMPT, or without
ACPI with PREEMPT, but if I enable both ACPI and PREEMPT I get a ton of
CPU usage. In Fedora Core top it shows up as IRQ time.

I haven't run oprofile yet but it seems this problem is something to do
with ACPI and PREEMPT on my machine (perhaps something to do with IRQ
routing when ACPI is enabled). Sounds like that doesn't apply to any of
the systems you guys are talking about. Sorry for the noise.

PREEMPT with ACPI is showing some other problems on my machine as well
(for example when PREEMPT is enabled my battery status applet fails
after several hours of uptime, or even shorter if a stress the
network). I can't reproduce this if I disable PREEMPT.

Anyway, good luck in finding a common issue for your problems.

Later,
Tom


2004-01-24 13:43:43

by JG

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

hi,

> i'm also going to test my systems with ttcp, because at the moment i'm transferring my backup from the server to my machine with 105.48 kB/s over the gigabit line via ftp :( but cpu is normal on both machines.

i did some tests now, here are the results.
box1 = 2.6.0 (tg3 driver v2.3, nov5/03)
box2 = 2.6.2-rc1-mm2 (tg3 v2.5, dec22/03)

box1 was sending, box2 receiving:

box1 # ttcp -t -l 65536 -v -b 2097152 -s -D -n100000 192.168.0.3
ttcp-t: buflen=65536, nbuf=100000, align=16384/0, port=5001, sockbufsize=2097152 tcp -> 192.168.0.3
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: nodelay
ttcp-t: connect
ttcp-t: -2036334592 bytes in 1247.57 real seconds = 1768.00 KB/sec +++
ttcp-t: -2036334592 bytes in 30.01 CPU seconds = 73492.73 KB/cpu sec
ttcp-t: 100000 I/O calls, msec/call = 12.78, calls/sec = 80.16
ttcp-t: 0.1user 29.8sys 20:47real 2% 0i+0d 0maxrss 1+16pf 67585+105csw
ttcp-t: buffer address 0x807c000

------------------------------------------
now the opposite, box2 was sending, box1 receiving:

box2 ttcp # ttcp -t -l 65536 -v -b 2097152 -s -D -n100000 192.168.0.2
ttcp-t: buflen=65536, nbuf=100000, align=16384/0, port=5001, sockbufsize=2097152 tcp -> 192.168.0.2
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: nodelay
ttcp-t: connect
ttcp-t: -2036334592 bytes in 153.82 real seconds = 14339.52 KB/sec +++
ttcp-t: -2036334592 bytes in 28.61 CPU seconds = 77085.45 KB/cpu sec
ttcp-t: 100000 I/O calls, msec/call = 1.58, calls/sec = 650.11
ttcp-t: 0.1user 28.4sys 2:33real 18% 0i+0d 0maxrss 0+17pf 63153+846csw
ttcp-t: buffer address 0x807c000

i thought the cable could be defective because of the results, but i tested with another machine (windows xp, 100mbit card) and both up and download speed via ftp (from both boxes!) was at about 8-9MB/s. so no problem with the cable and it seems also no problem with 100mbit, but as soon as i connect the two tg3 cards together with 1000mbit, one direction is slow (cable is gbit certified and worked with 2.4 kernels without any problem).

as i already mentionend in a previous email, the errors on the tg3 cards are quite high, but only in RX:
box1:
RX packets:18585312 errors:102500 dropped:0 overruns:0 frame:102598
TX packets:12435471 errors:0 dropped:0 overruns:0 carrier:0
box2:
RX packets:6864695 errors:202162 dropped:0 overruns:0 frame:204652
TX packets:10049776 errors:0 dropped:0 overruns:0 carrier:0

cpu usage was also normal in every test (about 15-30%).

JG


Attachments:
(No filename) (2.37 kB)
(No filename) (189.00 B)
Download all attachments

2004-01-25 00:04:15

by Lincoln Dale

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

Hi,

At 12:43 AM 25/01/2004, JG wrote:
>box1 was sending, box2 receiving:
>box1 # ttcp -t -l 65536 -v -b 2097152 -s -D -n100000 192.168.0.3
>ttcp-t: -2036334592 bytes in 1247.57 real seconds = 1768.00 KB/sec +++
>ttcp-t: -2036334592 bytes in 30.01 CPU seconds = 73492.73 KB/cpu sec

urgh, those are terrible numbers!

>now the opposite, box2 was sending, box1 receiving:
>box2 ttcp # ttcp -t -l 65536 -v -b 2097152 -s -D -n100000 192.168.0.2
>ttcp-t: -2036334592 bytes in 153.82 real seconds = 14339.52 KB/sec +++
>ttcp-t: -2036334592 bytes in 28.61 CPU seconds = 77085.45 KB/cpu sec

better, but still terrible.

even an old Pentium3 @ 500MHz here is capable of pushing GbE wire-rate (i
just tested this using a Tigon2).

>i thought the cable could be defective because of the results, but i
>tested with another machine (windows xp, 100mbit card) and both up and
>download speed via ftp (from both boxes!) was at about 8-9MB/s. so no
>problem with the cable and it seems also no problem with 100mbit, but as
>soon as i connect the two tg3 cards together with 1000mbit, one direction
>is slow (cable is gbit certified and worked with 2.4 kernels without any
>problem).

actually, this isn't necessarily the case.

Fast Ethernet only uses 1 pair of wires each for Tx/Rx (4 wires), whereas
copper GbE uses 2 pairs each for Tx/Rx (8 wires).
it may be the case that your cable has some bad connections on the pins
only used for 1000baseT.

>as i already mentionend in a previous email, the errors on the tg3 cards
>are quite high, but only in RX:
>box1:
>RX packets:18585312 errors:102500 dropped:0 overruns:0 frame:102598
>TX packets:12435471 errors:0 dropped:0 overruns:0 carrier:0
>box2:
>RX packets:6864695 errors:202162 dropped:0 overruns:0 frame:204652
>TX packets:10049776 errors:0 dropped:0 overruns:0 carrier:0

on a x-over cable, you should NEVER have any errors.
if this is indeed simply an x-over cable, then i'd replace it and try again.

(note that for 1000baseT you don't need to worry about whether the cable is
x-over or not; 1000baseT on most NICs/switches will auto-detect the parity
anyway..).

Broadcom have a tool on their web site called "BACS" which can take
advantage of some of the neat stuff in the PHY used on these boards. one
of the tests it can do is to check the quality of the cable and report any
problems it sees; it can run a signal/noise test on each pair.

FYI, doing a "Cable Analysis" on a single port of a BCM5703 here connected
to a switch (not x-over) with a ~1 metre patch cable shows:
Distance (m): ~1
Margin (dB): 5.132
Frequency Margin (MHz): 41.382


cheers,

lincoln.

2004-01-25 12:32:06

by JG

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

hi,

> urgh, those are terrible numbers!

yes ;)

> even an old Pentium3 @ 500MHz here is capable of pushing GbE wire-rate (i
> just tested this using a Tigon2).

my machines are athlon xp 1700+ and 2400+ so they should be fast enough...

> Fast Ethernet only uses 1 pair of wires each for Tx/Rx (4 wires), whereas
> copper GbE uses 2 pairs each for Tx/Rx (8 wires).

oh, yes, i didn't think of that (my bad...) because i thought "it worked with 2.4 kernels".

> if this is indeed simply an x-over cable, then i'd replace it and try again.

yes, they are located in different rooms and connected via an 20m x-over cable through the wall (easier than affording a gbit switch ;))

> Broadcom have a tool on their web site called "BACS" which [...] check the quality of the cable and report any
> problems it sees; it can run a signal/noise test on each pair.

thank you for the info! i searched their site, but i only found a reference to BACS on their faq page and that this software should be on their driver cdrom (well, it is not on my netgear cdrom).
but i'll test my cable with a fluke networks cable tester tomorrorw or on tuesday. i'll post the results if they are relevant.

i also tested with a knoppix cdrom on box2, which i can reboot, with 2.4.21 kernel and v1.5 tg3 driver, but the problem was also there so it really seems to be the cable...

thx,
JG


Attachments:
(No filename) (1.34 kB)
(No filename) (189.00 B)
Download all attachments

2004-01-31 09:16:04

by JG

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

hi,

i'm replying to my email.

> thank you for the info! i searched their site, but i only found a reference to BACS on their faq page and that this software should be on their driver cdrom (well, it is not on my netgear cdrom).
> but i'll test my cable with a fluke networks cable tester tomorrorw or on tuesday. i'll post the results if they are relevant.

well, i did a thorough cable test with a DSP-4100 fluke networks cable tester and i had some bad values. i've been using 3 cables (24m) with adapters, all single cables were fine, so the adapters seemed to cause the problem.
but i'm now using a longer x-over cable (30m) where i also get those speed problems. it is a *bit* better, i get about 1-2MB/s in both directions, but i'm also experiencing a very high error rate over the x-over cable...(~40-50 errors per second)

do you have this BACS software and is it possible to test the NIC itself with it? maybe one of my NICs is causing this.

thx,
JG


Attachments:
(No filename) (964.00 B)
(No filename) (189.00 B)
Download all attachments

2004-02-01 00:21:07

by Lincoln Dale

[permalink] [raw]
Subject: Re: TG3: very high CPU usage

At 08:15 PM 31/01/2004, JG wrote:
>well, i did a thorough cable test with a DSP-4100 fluke networks cable
>tester and i had some bad values. i've been using 3 cables (24m) with
>adapters, all single cables were fine, so the adapters seemed to cause the
>problem.
>but i'm now using a longer x-over cable (30m) where i also get those speed
>problems. it is a *bit* better, i get about 1-2MB/s in both directions,
>but i'm also experiencing a very high error rate over the x-over
>cable...(~40-50 errors per second)

if you get ANY errors, then its bad; even 1 error per second basically
means "one lost packet per second", which will severly limit your TCP
throughput.

one thing you may want to do to is drop the link to 100mbit/s rather than
gig-e; that will use less cable pairs and may avoid the problem.
100mbit/s without errors will likely be way way faster than 1000mbit/s with
50 errors/sec.

>do you have this BACS software and is it possible to test the NIC itself
>with it? maybe one of my NICs is causing this.

it seems there is only a Windows version of their diagnostics.
personally, i use IBM xSeries servers. their version of the BACS code is
at <http://www-306.ibm.com/pc/support/site.wss/document.do?lndocid=MIGR-43815>.
i've seen other servers (e.g. Compaq DL360?) that also use the BCM57xx;
their BACS tool is rebadged as being a HP tool.


cheers,

lincoln.


>thx,
>JG
>