2002-01-30 16:07:27

by Richard B. Johnson

[permalink] [raw]
Subject: TCP/IP Speed


When I ping two linux machines on a private link, I get 0.1 ms delay.
When I send large TCP/IP stream data between them, I get almost
10 megabytes per second on a 100-base link. Wonderful.

However, if I send 64 bytes from one machine and send it back, simple
TCP/IP strean connection, it takes 1 millisecond to get it back? There
seems to be some artifical delay somewhere. How do I turn this OFF?


Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips).

I was going to compile a list of innovations that could be
attributed to Microsoft. Once I realized that Ctrl-Alt-Del
was handled in the BIOS, I found that there aren't any.



2002-01-30 16:22:17

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: TCP/IP Speed

On Wed, 30 Jan 2002, Richard B. Johnson wrote:

>
> When I ping two linux machines on a private link, I get 0.1 ms delay.
> When I send large TCP/IP stream data between them, I get almost
> 10 megabytes per second on a 100-base link. Wonderful.
>
> However, if I send 64 bytes from one machine and send it back, simple
> TCP/IP strean connection, it takes 1 millisecond to get it back? There
> seems to be some artifical delay somewhere. How do I turn this OFF?

I would say its all in the TCP connection initiation (socket(), create()
etc...)

Cheers,
Zwane Mwaikambo


2002-01-30 16:43:49

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: TCP/IP Speed

On Wed, 30 Jan 2002, Richard B. Johnson wrote:

> > I would say its all in the TCP connection initiation (socket(), create()

I think its getting late...

s/create/connect/


2002-01-30 16:47:56

by Richard B. Johnson

[permalink] [raw]
Subject: Re: TCP/IP Speed

On Wed, 30 Jan 2002, Zwane Mwaikambo wrote:

> On Wed, 30 Jan 2002, Richard B. Johnson wrote:
>
> > But it's already connected.
> >
> >
> > host:
> > for (;;) {
> > gettimeofday(...);
> > write(s, buf, 64);
> > read(s, buf, sizeof(buffer));
> > gettimeofday(...);
> > /* delay is 1.0 ms */
> > }
> > server is IPPORT_ECHO
>
> You didn't make that explicit in your previous email, and anyway what kind
> of resolution can you expect from gettimeofday...
>

The resolution is in microseconds. That's the specification. Not
all 'codes' are exercised of course, but the resolution is sufficient
to discern a 10 to 30 microsecond difference. I'm trying to measure
milliseconds, well within its capability. FYI, this is what `ping`
and `traceroute` use. It's fine.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips).

I was going to compile a list of innovations that could be
attributed to Microsoft. Once I realized that Ctrl-Alt-Del
was handled in the BIOS, I found that there aren't any.


2002-01-30 16:52:29

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: TCP/IP Speed

On Wed, 30 Jan 2002, Richard B. Johnson wrote:

> The resolution is in microseconds. That's the specification. Not
> all 'codes' are exercised of course, but the resolution is sufficient
> to discern a 10 to 30 microsecond difference. I'm trying to measure
> milliseconds, well within its capability. FYI, this is what `ping`
> and `traceroute` use. It's fine.

Then you might wanna check whats going on in read and write, have you
tried this with UDP too?

Cheers,
Zwane Mwaikambo


2002-01-30 16:37:09

by Richard B. Johnson

[permalink] [raw]
Subject: Re: TCP/IP Speed

On Wed, 30 Jan 2002, Zwane Mwaikambo wrote:

> On Wed, 30 Jan 2002, Richard B. Johnson wrote:
>
> >
> > When I ping two linux machines on a private link, I get 0.1 ms delay.
> > When I send large TCP/IP stream data between them, I get almost
> > 10 megabytes per second on a 100-base link. Wonderful.
> >
> > However, if I send 64 bytes from one machine and send it back, simple
> > TCP/IP strean connection, it takes 1 millisecond to get it back? There
> > seems to be some artifical delay somewhere. How do I turn this OFF?
>
> I would say its all in the TCP connection initiation (socket(), create()
> etc...)

But it's already connected.


host:
for (;;) {
gettimeofday(...);
write(s, buf, 64);
read(s, buf, sizeof(buffer));
gettimeofday(...);
/* delay is 1.0 ms */
}
server is IPPORT_ECHO



Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips).

I was going to compile a list of innovations that could be
attributed to Microsoft. Once I realized that Ctrl-Alt-Del
was handled in the BIOS, I found that there aren't any.


2002-01-30 17:36:48

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: TCP/IP Speed

On Wed, 30 Jan 2002, Richard B. Johnson wrote:

> But it's already connected.
>
>
> host:
> for (;;) {
> gettimeofday(...);
> write(s, buf, 64);
> read(s, buf, sizeof(buffer));
> gettimeofday(...);
> /* delay is 1.0 ms */
> }
> server is IPPORT_ECHO

You didn't make that explicit in your previous email, and anyway what kind
of resolution can you expect from gettimeofday...

Cheers,
Zwane Mwaikambo


2002-01-30 18:09:46

by Corey Minyard

[permalink] [raw]
Subject: Re: TCP/IP Speed

Zwane Mwaikambo wrote:

>On Wed, 30 Jan 2002, Richard B. Johnson wrote:
>
>>But it's already connected.
>>
>>
>> host:
>> for (;;) {
>> gettimeofday(...);
>> write(s, buf, 64);
>> read(s, buf, sizeof(buffer));
>> gettimeofday(...);
>> /* delay is 1.0 ms */
>> }
>> server is IPPORT_ECHO
>>
>
>You didn't make that explicit in your previous email, and anyway what kind
>of resolution can you expect from gettimeofday...
>
Depending on the processor, gettimeofday has very high resolution.

If I remember correctly, the TCP stacks put in delays for small sends so
they can pack multiple things together. I think there are ways to work
around this via some type of flush, but memory fails me on exactly how.

-Corey

2002-01-30 18:20:28

by Dan Maas

[permalink] [raw]
Subject: Re: TCP/IP Speed

> When I ping two linux machines on a private link, I get 0.1 ms delay.
> When I send large TCP/IP stream data between them, I get almost
> 10 megabytes per second on a 100-base link. Wonderful.
>
> However, if I send 64 bytes from one machine and send it back, simple
> TCP/IP strean connection, it takes 1 millisecond to get it back? There
> seems to be some artifical delay somewhere. How do I turn this OFF?

Stupid question - did you turn Nagle off?

int one = 1;
setsockopt(fd, SOL_TCP, TCP_NDELAY, &one);

(I think; typing from memory...)

Regards,
Dan

2002-01-30 18:24:29

by Gregory Maxwell

[permalink] [raw]
Subject: Re: TCP/IP Speed

On Wed, Jan 30, 2002 at 12:07:17PM -0600, Corey Minyard wrote:
> Zwane Mwaikambo wrote:
> >You didn't make that explicit in your previous email, and anyway what kind
> >of resolution can you expect from gettimeofday...
> >
> Depending on the processor, gettimeofday has very high resolution.
> If I remember correctly, the TCP stacks put in delays for small sends so
> they can pack multiple things together. I think there are ways to work
> around this via some type of flush, but memory fails me on exactly how.

Please! People, if you don't know exactly what you are talking about, at
least keep your replies off list.

The problem he's having is caused by nagle, I replied to him off list
because not knowing the API to disable a perfectly standard TCP behavior is
not related to kernel development.

I get enough traffic from this list without having a dozen blind people
trying to lead the blind.

It's great that you are trying to be helpful, but please do it off list,
unless it's actual pertinent to kernel development.

[For future archive searchers: The solution is to set the TCP_NODELAY socket
option]

2002-01-30 18:33:23

by Richard B. Johnson

[permalink] [raw]
Subject: Re: TCP/IP Speed

On Wed, 30 Jan 2002, Dan Maas wrote:

> > When I ping two linux machines on a private link, I get 0.1 ms delay.
> > When I send large TCP/IP stream data between them, I get almost
> > 10 megabytes per second on a 100-base link. Wonderful.
> >
> > However, if I send 64 bytes from one machine and send it back, simple
> > TCP/IP strean connection, it takes 1 millisecond to get it back? There
> > seems to be some artifical delay somewhere. How do I turn this OFF?
>
> Stupid question - did you turn Nagle off?
>
> int one = 1;
> setsockopt(fd, SOL_TCP, TCP_NDELAY, &one);
>
> (I think; typing from memory...)
>
> Regards,
> Dan
>

I did, but I thought it was a TCP option, not a socket option.
I will change it and see if it does anything. Currently, it
seems like a no-op, no errors, but does nothing.


Early in code:

int on = 1;

#define ON &on


Where accept is called. Returned socket value is set to nodelay.

len = sizeof(addr);
if((hs = accept(s, SSAP &addr, &len))) == FAIL)
ERRORS(Accept);
if(setsockopt(hs, IPPROTO_TCP, TCP_NODELAY, ON, sizeof(on)) == FAIL)
ERRORS(Setsockopt);



So, maybe it's supposed to be SOL_TCP? I'll look for it.



Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips).

I was going to compile a list of innovations that could be
attributed to Microsoft. Once I realized that Ctrl-Alt-Del
was handled in the BIOS, I found that there aren't any.


2002-01-30 18:23:27

by Tobias Ringstrom

[permalink] [raw]
Subject: Re: TCP/IP Speed

On Wed, 30 Jan 2002, Corey Minyard wrote:

> If I remember correctly, the TCP stacks put in delays for small sends so

You do not remember correctly. I think you are confused by Nagle and
delayed ACKs. If there is no unacknowledged data, a small packet will be
sent immediately.

/Tobias

2002-01-30 21:24:28

by Rob Landley

[permalink] [raw]
Subject: Re: TCP/IP Speed

On Wednesday 30 January 2002 11:07 am, Richard B. Johnson wrote:
> When I ping two linux machines on a private link, I get 0.1 ms delay.
> When I send large TCP/IP stream data between them, I get almost
> 10 megabytes per second on a 100-base link. Wonderful.
>
> However, if I send 64 bytes from one machine and send it back, simple
> TCP/IP strean connection, it takes 1 millisecond to get it back? There
> seems to be some artifical delay somewhere. How do I turn this OFF?

This is just a guess, but it sounds to me like a scheduling issue. When
you're sending data from one network stack to another, how often the
receiving program scoops data out of the incoming file descriptor isn't too
much of a limiting factor, as long as you've got enough buffer space in the
receiving network stack that the sender doesn't have to pause.

But to bounce the data back, the program at the far end doing the receive and
resend has be woken up and handed a time slice with which to receive,
process, and return the packet.

Have you tried ingo's O(1) scheduler? :)

> Cheers,
> Dick Johnson

Rob

2002-01-30 21:52:31

by Richard B. Johnson

[permalink] [raw]
Subject: Re: TCP/IP Speed

On Wed, 30 Jan 2002, Rob Landley wrote:

> On Wednesday 30 January 2002 11:07 am, Richard B. Johnson wrote:
> > When I ping two linux machines on a private link, I get 0.1 ms delay.
> > When I send large TCP/IP stream data between them, I get almost
> > 10 megabytes per second on a 100-base link. Wonderful.
> >
> > However, if I send 64 bytes from one machine and send it back, simple
> > TCP/IP strean connection, it takes 1 millisecond to get it back? There
> > seems to be some artifical delay somewhere. How do I turn this OFF?
>
> This is just a guess, but it sounds to me like a scheduling issue. When
> you're sending data from one network stack to another, how often the
> receiving program scoops data out of the incoming file descriptor isn't too
> much of a limiting factor, as long as you've got enough buffer space in the
> receiving network stack that the sender doesn't have to pause.
>
> But to bounce the data back, the program at the far end doing the receive and
> resend has be woken up and handed a time slice with which to receive,
> process, and return the packet.
>
> Have you tried ingo's O(1) scheduler? :)

No. I set all sockets, even the original listen() socket to
TCP_NODELAY. Nothing makes any difference. I tried it on a
600+ MHz machine with and a 133 MHz machine with no aparent
difference in the turn-around time. Of course the 600 MHz
machine sends large buffers of data faster:

With a 64k buffer:

600 MHz 133MHz RAM ~ 9.98 Megabytes/second.
133 MHz 100MHz RAM ~ 6.50 Megabytes/second.

With a 4 k buffer:

600 MHz 133MHz RAM ~ 5.15 Megabytes/second.
133 MHz 100MHz RAM ~ 3.30 Megabytes/second.

With 64 bytes:

600 MHz 133MHz RAM ~ 1.2 kilobytes/second.
133 MHz 100MHz RAM ~ 1.1 kilobytes/second.

Time from transmission to reception of a small buffer
from the simplist echo server (read, write, no select):

600 MHz 133MHz RAM ~ 0.9 millisecond.
133 MHz 100MHz RAM ~ 1.0 millisecond.

This is with two eepro100 network boards and two
pairs of machines.

The turn-around time for small buffers is about
1 millisecond for both.

`ping` shows low microseconds in all cases.

The problem was discovered on an embedded system at
a customer site so I set up two pairs of machines to
see what performance is possible. I though first that
there was a half/full/duplex problem or a hub problem.
These two pair are connected with a X-over cable, no
hubs.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips).

I was going to compile a list of innovations that could be
attributed to Microsoft. Once I realized that Ctrl-Alt-Del
was handled in the BIOS, I found that there aren't any.


2002-01-30 22:00:30

by David Miller

[permalink] [raw]
Subject: Re: TCP/IP Speed

From: "Richard B. Johnson" <[email protected]>
Date: Wed, 30 Jan 2002 16:54:46 -0500 (EST)

No. I set all sockets, even the original listen() socket to
TCP_NODELAY.

How about setting it on the resulting socket, not just the
listen one? Ie. at both ends always set TCP_NODELAY to 1
on each new socket created.

Nagle (Ie. TCP_NODELAY) is the only thing which could explain
the behavior you are complaining about.

2002-01-31 13:31:40

by Gregory Maxwell

[permalink] [raw]
Subject: Re: TCP/IP Speed

On Wed, Jan 30, 2002 at 04:54:46PM -0500, Richard B. Johnson wrote:
> No. I set all sockets, even the original listen() socket to
> TCP_NODELAY. Nothing makes any difference. I tried it on a
[snip]

Nagle is a sending side thing, not listening.

It clearly explains your issue.

2002-01-31 14:01:03

by Terje Eggestad

[permalink] [raw]
Subject: Re: TCP/IP Speed

Hmmm,

I tend to use the rdtsc register to do these kind of measurements,
but I get ~110 uS round trip with or without TCP_NDELAY

I get ~100uS wehen pinging with raw ethernet frames.

PS: Intel 82557 Ethernet Pro 100 NIC's

TJ

ons, 2002-01-30 kl. 17:47 skrev Richard B. Johnson:
> On Wed, 30 Jan 2002, Zwane Mwaikambo wrote:
>
> > On Wed, 30 Jan 2002, Richard B. Johnson wrote:
> >
> > > But it's already connected.
> > >
> > >
> > > host:
> > > for (;;) {
> > > gettimeofday(...);
> > > write(s, buf, 64);
> > > read(s, buf, sizeof(buffer));
> > > gettimeofday(...);
> > > /* delay is 1.0 ms */
> > > }
> > > server is IPPORT_ECHO
> >
> > You didn't make that explicit in your previous email, and anyway what kind
> > of resolution can you expect from gettimeofday...
> >
>
> The resolution is in microseconds. That's the specification. Not
> all 'codes' are exercised of course, but the resolution is sufficient
> to discern a 10 to 30 microsecond difference. I'm trying to measure
> milliseconds, well within its capability. FYI, this is what `ping`
> and `traceroute` use. It's fine.
>
>
> Cheers,
> Dick Johnson
>
> Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips).
>
> I was going to compile a list of innovations that could be
> attributed to Microsoft. Once I realized that Ctrl-Alt-Del
> was handled in the BIOS, I found that there aren't any.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
_________________________________________________________________________

Terje Eggestad mailto:[email protected]
Scali Scalable Linux Systems http://www.scali.com

Olaf Helsets Vei 6 tel: +47 22 62 89 61 (OFFICE)
P.O.Box 150, Oppsal +47 975 31 574 (MOBILE)
N-0619 Oslo fax: +47 22 62 89 51
NORWAY
_________________________________________________________________________

2002-01-31 15:24:44

by Richard B. Johnson

[permalink] [raw]
Subject: Re: TCP/IP Speed

On 31 Jan 2002, Terje Eggestad wrote:

> Hmmm,
>
> I tend to use the rdtsc register to do these kind of measurements,
> but I get ~110 uS round trip with or without TCP_NDELAY
>

AMD-SC520 ( 'i486 ) don't have such an instruction. Therefore benchmarks,
which have to run 'everywhere' need to be generic.

> I get ~100uS wehen pinging with raw ethernet frames.

>
> PS: Intel 82557 Ethernet Pro 100 NIC's
>

Ping doesn't count. ICMP is handled in the kernel a few procedures
removed from actual frame acquisition. It looks like the kernel
takes the data I'm waiting for, hides it away for awhile, scheduling
other tasks, then finally decides to wake me up, having been sleeping
in read() for a whole millisecond. This is entirely unacceptable.

Since this 1000 data-buffers-per-second limitation is independent of
CPU speed and/or RAM speed, it seems to be imposed artificially by
the network implementation. It may have something to do with additions
to stop DOS or whatever. I need to turn it OFF.

I have done setsockopt(s, IPPROTO_TCP, TCP_NODELAY, &x sizeof(x));
... SOL_TCP, TCP_NODELAY...
with int x = 1;, everywhere a socket is created. It seems like a
no-op.

My sun uses SunOS 5.5.1, with GNU 'C' runtime libraries, gcc 2.7.2, and
it doesn't display this problem. The speed is lower because it has only
a 10-base network connection, but the Tx/Rx time is running around
250 to 300 microseconds, not the awful 1 millisecond.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips).

I was going to compile a list of innovations that could be
attributed to Microsoft. Once I realized that Ctrl-Alt-Del
was handled in the BIOS, I found that there aren't any.


2002-01-31 16:07:04

by Terje Eggestad

[permalink] [raw]
Subject: Re: TCP/IP Speed

tor, 2002-01-31 kl. 16:26 skrev Richard B. Johnson:
> On 31 Jan 2002, Terje Eggestad wrote:
>
> > Hmmm,
> >
> > I tend to use the rdtsc register to do these kind of measurements,
> > but I get ~110 uS round trip with or without TCP_NDELAY
> >
>
> AMD-SC520 ( 'i486 ) don't have such an instruction. Therefore benchmarks,
> which have to run 'everywhere' need to be generic.
>

too bad...

> > I get ~100uS wehen pinging with raw ethernet frames.
>
> >
> > PS: Intel 82557 Ethernet Pro 100 NIC's
> >
>
> Ping doesn't count. ICMP is handled in the kernel a few procedures
> removed from actual frame acquisition. It looks like the kernel
> takes the data I'm waiting for, hides it away for awhile, scheduling
> other tasks, then finally decides to wake me up, having been sleeping
> in read() for a whole millisecond. This is entirely unacceptable.
>

Actually, my "ping" is even less than icmp. I open the ethX with family
= AF_PACKET, and provide the full 802 frame.

> Since this 1000 data-buffers-per-second limitation is independent of
> CPU speed and/or RAM speed, it seems to be imposed artificially by
> the network implementation. It may have something to do with additions
> to stop DOS or whatever. I need to turn it OFF.
>

Hmmm, guess I haven't tried abutally sending 1000 writes.
If you send my you test program, I could try it on my HW's tomorrow.


> I have done setsockopt(s, IPPROTO_TCP, TCP_NODELAY, &x sizeof(x));
> ... SOL_TCP, TCP_NODELAY...
> with int x = 1;, everywhere a socket is created. It seems like a
> no-op.
>
> My sun uses SunOS 5.5.1, with GNU 'C' runtime libraries, gcc 2.7.2, and
> it doesn't display this problem. The speed is lower because it has only
> a 10-base network connection, but the Tx/Rx time is running around
> 250 to 300 microseconds, not the awful 1 millisecond.
>
>
> Cheers,
> Dick Johnson
>
> Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips).
>
> I was going to compile a list of innovations that could be
> attributed to Microsoft. Once I realized that Ctrl-Alt-Del
> was handled in the BIOS, I found that there aren't any.
>
--
_________________________________________________________________________

Terje Eggestad mailto:[email protected]
Scali Scalable Linux Systems http://www.scali.com

Olaf Helsets Vei 6 tel: +47 22 62 89 61 (OFFICE)
P.O.Box 150, Oppsal +47 975 31 574 (MOBILE)
N-0619 Oslo fax: +47 22 62 89 51
NORWAY
_________________________________________________________________________