2005-03-10 22:11:35

by Felix Matathias

[permalink] [raw]
Subject: select() doesn't respect SO_RCVLOWAT ?


Dear all,

I am running a 2.4.21-9.0.3.ELsmp #1 kernel and I can setsockopt and
getsockopt correctly the SO_RCVLOWAT option, but select() seems to mark a
socket readable even if a single byte is ready to be read. Then, a read()
blocks until the specified number of bytes in SO_RCVLOWAT makes it to the
socket buffer.

This is the exact opposite behaviour of what I yould have
expected/desired. Our application receives data at many Khz rate and we
want to avoid reading the socket until a predetermined amount of data is
sent, to avoid partial reads. SO_RCVLOWAT seemed to be a nice way to
implement that.

An earlier message by Alan Cox was a bit cryptic:

"But is the cost of all those special case checks and all the handling
for it such as select computing if enough tcp packets together accumulated
worth the cost on every app not using LOWAT for the microscopic gain given
that essentially nobody uses it."

Does this mean that select() in Linux will wake up no matter what
SO_RCVLOWAT is set to ?

Best Regards,
Felix Matathias

P.S. I would appreciate if you could also cc your response to me.

--

______________________________________________________________________
Felix Matathias of Columbia University, Nevis Labs

Brookhaven National Lab cell : 631-988-3694
Bldg 1005, 3-304 web : http://www.matathias.com
Upton, NY, 11973 photo: http://www.pbase.com/matathias
tel/fax :631-344-7622/3253 email: [email protected]
_______________________________________________________________________


2005-03-11 05:50:49

by Willy Tarreau

[permalink] [raw]
Subject: Re: select() doesn't respect SO_RCVLOWAT ?

On Thu, Mar 10, 2005 at 04:58:51PM -0500, Felix Matathias wrote:
>
> I am running a 2.4.21-9.0.3.ELsmp #1 kernel and I can setsockopt and
> getsockopt correctly the SO_RCVLOWAT option, but select() seems to mark a
> socket readable even if a single byte is ready to be read. Then, a read()
> blocks until the specified number of bytes in SO_RCVLOWAT makes it to the
> socket buffer.

as discussed in a previous thread, if you use select(), you should also
use non-blocking sockets. There are cases where select() can wake you up
without anything to read, eg if there is a packet waiting with a wrong
checksum.

> This is the exact opposite behaviour of what I yould have
> expected/desired. Our application receives data at many Khz rate and we
> want to avoid reading the socket until a predetermined amount of data is
> sent, to avoid partial reads. SO_RCVLOWAT seemed to be a nice way to
> implement that.

I too came across this problem a long time ago and concluded that LOWAT
was not really usable on Linux. But in the end, this is not really a big
deal, because as long as your application doesn't eat all CPU, it does
not change anything performance-wise, and when it becomes to eat a lot
of CPU, the latency will increase, letting more data come in when you
do one read.

> An earlier message by Alan Cox was a bit cryptic:
>
> "But is the cost of all those special case checks and all the handling
> for it such as select computing if enough tcp packets together accumulated
> worth the cost on every app not using LOWAT for the microscopic gain given
> that essentially nobody uses it."
>
> Does this mean that select() in Linux will wake up no matter what
> SO_RCVLOWAT is set to ?

Yes.

Regards,
Willy

2005-03-11 19:25:25

by Alan

[permalink] [raw]
Subject: Re: select() doesn't respect SO_RCVLOWAT ?

On Iau, 2005-03-10 at 21:58, Felix Matathias wrote:
> Dear all,
>
> I am running a 2.4.21-9.0.3.ELsmp #1 kernel and I can setsockopt and
> getsockopt correctly the SO_RCVLOWAT option

The only value the code at least used to support was setting it to 1.
Are you sure you are actually setting/checking ok ?

2005-03-11 20:35:44

by Felix Matathias

[permalink] [raw]
Subject: Re: select() doesn't respect SO_RCVLOWAT ?


Dear Alan,

I am positive. I can setsockopt, and then, getsockopt returns the value
that I requested.

Stevens very clearly states that SO_RCVLOWAT has a direct impact on
select() and I assumed that this would be the case for Linux.
What is the rationale for not complying with that ? Is it the micromanagement
of select() that you dislike ? Isn't a significant reduction in the
amount of read operations a real gain in high speed networking ?

Best Regards,
Felix


On Fri, 11 Mar 2005, Alan Cox wrote:

> On Iau, 2005-03-10 at 21:58, Felix Matathias wrote:
>> Dear all,
>>
>> I am running a 2.4.21-9.0.3.ELsmp #1 kernel and I can setsockopt and
>> getsockopt correctly the SO_RCVLOWAT option
>
> The only value the code at least used to support was setting it to 1.
> Are you sure you are actually setting/checking ok ?
>

--

______________________________________________________________________
Felix Matathias of Columbia University, Nevis Labs

Brookhaven National Lab cell : 631-988-3694
Bldg 1005, 3-304 web : http://www.matathias.com
Upton, NY, 11973 photo: http://www.pbase.com/matathias
tel/fax :631-344-7622/3253 email: [email protected]
_______________________________________________________________________

2005-03-14 13:26:36

by Alan

[permalink] [raw]
Subject: Re: select() doesn't respect SO_RCVLOWAT ?

On Gwe, 2005-03-11 at 20:26, Felix Matathias wrote:
> Dear Alan,
>
> I am positive. I can setsockopt, and then, getsockopt returns the value
> that I requested.

Ok I misremembered - its SNDLOWAT that is locked to one in Linux.

> Stevens very clearly states that SO_RCVLOWAT has a direct impact on
> select() and I assumed that this would be the case for Linux.
> What is the rationale for not complying with that ? Is it the micromanagement
> of select() that you dislike ? Isn't a significant reduction in the
> amount of read operations a real gain in high speed networking ?

I believe since we implement SO_SNDLOWAT that its a bug. Stevens and
1003.1g both agree with your expectations. The right list is probably
[email protected] however.

Alan

2005-03-14 13:32:43

by YOSHIFUJI Hideaki

[permalink] [raw]
Subject: Re: select() doesn't respect SO_RCVLOWAT ?

In article <[email protected]> (at Mon, 14 Mar 2005 13:24:24 +0000), Alan Cox <[email protected]> says:

> 1003.1g both agree with your expectations. The right list is probably
> [email protected] however.

I've just forwarded this thread to netdev.

--yoshfuji

2005-03-22 02:51:05

by Robert White

[permalink] [raw]
Subject: RE: select() doesn't respect SO_RCVLOWAT ?



-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of Felix Matathias
Sent: Friday, March 11, 2005 12:27 PM

> Isn't a significant reduction in the amount of read operations
> a real gain in high speed networking ?

In a word? No.

Here at my company we make various pieces of cell phone test equipment in diverse
configurations. One of these involves an XScale based linux board and an DSTnI based
board running RTXC. The XScale board has several devices connected to it and a
private Ethernet segment connects the two boards (blah blah blah, I'll skip the
boring parts, but leave it to say that I have intimate control over the RTXC end of
the link and we are doing 1-5 millisecond timing of real time events +/- uniform
event delay factoring).

The cost of receiving large numbers of small packet data dwarfs the cost of read().
Depending on your actual network media, you will better sustained throughtput by
worrying about transmit fragmentation that you will ever have to concern yourself
with (well written) small-read() buffer reassembly.

Consider Ethernet, where sending one byte uses something like 70 (?) bytes of wire
bandwidth. If you have the slightest chance of recognizing framing or sagging in
your datastream, using TCP_CORK to make sure you only transmit if you have more than
~45 characters pending can make a real difference. Your particular mileage will, of
course, vary. [On our box the win/win point was to cork for ~800 bytes or a known
end-of-frame, whichever came first; said calculation included the DSTnI board's
byte-copy and task switching rates and a bunch of other things.]

In practical terms, if you can get to the read() before more data arrived, then,
unless you _really_ have something better to do, you might as well do the read(). If
your processing takes longer than the strobe on the read() you will get some
backlogging between reads that you will make up next time. There is a "natural
speed" for any given application, and as long as your data is slower than this speed,
the practical load doesn't matter much. Something somewhere is going to have to
combine fragments, or not, so until you get to the point where your particular
application is starting to waste too much time in context switching (the "real
overhead" cost of a syscall) then you need to sweat the syscall density.

If you are "always" passing in a read buffer that is bigger than the pending data,
[e.g. if Y != X for all "Y = read(fd,buf,X);"] then you are pretty much under the
power curve and doing nicely.

Regardless, the real place to maximize network throughput is in intelligent write()
combining. "The media is always slower than the computer" is the watch-phrase for
eeking out your best throughput.

Rob White,
Casabyte, Inc.

P.S. "High Speed Networking" is not the same thing as "Fair Resource Usage
Networking" for the purposes of this discussion... 8-)