Hello, I am trying to find the reason for very, very poor network
performance with sustained data transfers on Linux 2.4.1. I found
a work-around, but don't think user-mode code should have to provide
such work-arounds.
In the following, with Linux 2.4.1, on a dedicated 100/Base
link:
s = socket connected to DISCARD (null-sink) server.
while(len)
{
stat = write(s, buf, min(len, MTU));
/* Yes, I do check for an error */
buf += stat;
len -= stat;
}
Data length is 0x00010000 bytes.
MTU Average trans rate Fastest trans rate
---- ----------------- -----------------
65536 0.468 Mb/s 0.902 Mb/s
32768 0.684 Mb/s 0.813 Mb/s
16384 2.989 Mb/s 3.121 Mb/s
8192 5.211 Mb/s 6.160 Mb/s
4094 8.212 Mb/s 9.101 Mb/s
2048 8.561 Mb/s 9.280 Mb/s
1024 7.250 Mb/s 7.500 Mb/s
512 4.818 Mb/s 5.107 Mb/s
As you can see, there is a maximum data length that can be
handled with reasonable speed from a socket. Trying to find
out what that was, I discovered that the best MTU was 3924.
I don't know why. It shows:
MTU Average trans rate Fastest trans rate
---- ----------------- -----------------
3924 8.920 Mb/s 9.31 Mb/s
If the user's data length is higher than this, there is a 1/100th
of a second wait between packets. The larger the user's data length,
the more the data gets chopped up into 1/100th of a second intervals.
It looks as though user data that can't fit into two Ethernet packets
is queued until the next time-slice on a 100 Hz system. This severely
hurts sustained data performance. The performance with a single
64k data buffer is abysmal. If it gets chopped up into 2048 byte
blocks in user-space, it's reasonable.
Both machines are Dual Pentium 600 MHz machines with identical eepro100
Ethernet boards. I substituted, LANCE (Hewlett Packard), and 3COM boards
(3c59x) with essentially no change.
Does this point out a problem? Or should user-mode code be required
to chop up data lengths to something more "reasonable" for the kernel?
If so, how does the user know what "reasonable" is?
Cheers,
Dick Johnson
Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).
"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.
Hello,
The problem with awful socket performance on 2.4.1 has been discovered
and fixed by Manfred Spraul. Here is some info, and his patch:
On Fri, 23 Feb 2001, Manfred Spraul wrote:
> Could you post your results to linux-kernel?
> My mail from this morning wasn't accurate enough, you patched the wrong
> line. Sorry.
Yep. The patch you sent was a little broken. I tried to fix it, but
ended up pathing the wrong line.
>
> I've attached the 2 patches that should cure your problems.
> patch-new is integrated into the -ac series, and it's a bugfix - simple
> unix socket sends eat into memory reserved for atomic allocs.
> patch-new2 is the other variant, it just deletes the fallback system.
--- linux/net/core/sock.c Fri Dec 29 23:07:24 2000
+++ linux/net/core/sock.c.new Fri Feb 23 15:02:46 2001
@@ -777,7 +777,7 @@
/* The buffer get won't block, or use the atomic queue.
* It does produce annoying no free page messages still.
*/
- skb = alloc_skb(size, GFP_BUFFER);
+ skb = alloc_skb(size, sk->allocation & (~__GFP_WAIT));
if (skb)
break;
try_size = fallback;
Cheers,
Dick Johnson
Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).
"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.
Richard B. Johnson writes:
> > unix socket sends eat into memory reserved for atomic allocs.
OK (Manfred is being quoted here, to be clear).
I'm still talking with Alexey about how to fix this, I might just
prefer killing this fallback mechanism of skb_alloc_send_skb then
make AF_UNIX act just like everyone else.
This was always just a performance hack, and one which makes less
and less sense as time goes on.
Later,
David S. Miller
[email protected]
> I'm still talking with Alexey about how to fix this, I might just
> prefer killing this fallback mechanism of skb_alloc_send_skb then
> make AF_UNIX act just like everyone else.
>
> This was always just a performance hack, and one which makes less
> and less sense as time goes on.
When I first did the hack it was worth about 20% performance, but at the time
the fallback allocation and initial allocations didnt eat into pools in a
problematic way
Hi!
> Hello, I am trying to find the reason for very, very poor network
> performance with sustained data transfers on Linux 2.4.1. I found
> a work-around, but don't think user-mode code should have to provide
> such work-arounds.
>
> In the following, with Linux 2.4.1, on a dedicated 100/Base
> link:
>
> s = socket connected to DISCARD (null-sink) server.
>
> while(len)
> {
> stat = write(s, buf, min(len, MTU));
> /* Yes, I do check for an error */
> buf += stat;
> len -= stat;
> }
>
> Data length is 0x00010000 bytes.
>
> MTU Average trans rate Fastest trans rate
> ---- ----------------- -----------------
> 65536 0.468 Mb/s 0.902 Mb/s
> 32768 0.684 Mb/s 0.813 Mb/s
> 16384 2.989 Mb/s 3.121 Mb/s
> 8192 5.211 Mb/s 6.160 Mb/s
> 4094 8.212 Mb/s 9.101 Mb/s
> 2048 8.561 Mb/s 9.280 Mb/s
> 1024 7.250 Mb/s 7.500 Mb/s
> 512 4.818 Mb/s 5.107 Mb/s
>
> As you can see, there is a maximum data length that can be
> handled with reasonable speed from a socket. Trying to find
> out what that was, I discovered that the best MTU was 3924.
> I don't know why. It shows:
Looks like that's page_size - epsilon.
> MTU Average trans rate Fastest trans rate
> ---- ----------------- -----------------
> 3924 8.920 Mb/s 9.31 Mb/s
But even this is *not* reasonable speed for 100MBit ethernet!
> If the user's data length is higher than this, there is a 1/100th
> of a second wait between packets. The larger the user's data length,
> the more the data gets chopped up into 1/100th of a second intervals.
>
> It looks as though user data that can't fit into two Ethernet packets
> is queued until the next time-slice on a 100 Hz system. This severely
> hurts sustained data performance. The performance with a single
> 64k data buffer is abysmal. If it gets chopped up into 2048 byte
> blocks in user-space, it's reasonable.
>
> Both machines are Dual Pentium 600 MHz machines with identical eepro100
> Ethernet boards. I substituted, LANCE (Hewlett Packard), and 3COM boards
> (3c59x) with essentially no change.
Strange. Do you have interrupts working okay? [I'm able to get 4Mbit with
ne2000 hooked on timer IRQ, so this is not totally stupid Question.]
> Does this point out a problem? Or should user-mode code be required
Definitely problem.
> to chop up data lengths to something more "reasonable" for the kernel?
> If so, how does the user know what "reasonable" is?
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.