2004-06-02 05:33:52

by jyotiraditya

[permalink] [raw]
Subject: Select/Poll

Hello All,

In one of the threads named: "Linux's implementation of poll() not
scalable?'
Linus has stated the following:
**************
Neither poll() nor select() have this problem: they don't get more
expensive as you have more and more events - their expense is the number
of file descriptors, not the number of events per se. In fact, both poll()
and select() tend to perform _better_ when you have pending events, as
they are both amenable to optimizations when there is no need for waiting,
and scanning the arrays can use early-out semantics.
**************

Please help me understand the above.. I'm using select in a server to read
on multiple FDs and the clients are dumping messages (of fixed size) in a
loop on these FDs and the server maintainig those FDs is not able to get all
the messages.. Some of the last messages sent by each client are lost.
If the number of clients and hence the number of FDs (in the server) is
increased the loss of data is proportional.
eg: 5 clients send messages (100 each) to 1 server and server receives
96 messages from each client.
10 clients send messages (100 by each) to 1 server and server again
receives 96 from each client.

If a small sleep in introduced between sending messages the loss of data
decreases.
Also please explain the algorithm select uses to read messages on FDs and
how does it perform better when number of FDs increases.

Thanks and Regards,
Jyotiraditya


2004-06-02 05:54:41

by David Schwartz

[permalink] [raw]
Subject: RE: Select/Poll


> In one of the threads named: "Linux's implementation of poll() not
> scalable?'
> Linus has stated the following:
> **************
> Neither poll() nor select() have this problem: they don't get more
> expensive as you have more and more events - their expense is the number
> of file descriptors, not the number of events per se. In fact, both poll()
> and select() tend to perform _better_ when you have pending events, as
> they are both amenable to optimizations when there is no need for waiting,
> and scanning the arrays can use early-out semantics.
> **************
>
> Please help me understand the above.. I'm using select in a server to read
> on multiple FDs and the clients are dumping messages (of fixed size) in a
> loop on these FDs and the server maintainig those FDs is not able
> to get all
> the messages.. Some of the last messages sent by each client are lost.
> If the number of clients and hence the number of FDs (in the server) is
> increased the loss of data is proportional.
> eg: 5 clients send messages (100 each) to 1 server and server receives
> 96 messages from each client.
> 10 clients send messages (100 by each) to 1 server and server again
> receives 96 from each client.
>
> If a small sleep in introduced between sending messages the loss of data
> decreases.
> Also please explain the algorithm select uses to read messages on FDs and
> how does it perform better when number of FDs increases.

Your issue has nothing to do with select or poll scalability, it has to do
with the fact that UDP is unreliable and you must provide your own send
timing. A UDP server or client cannot just send 100 messages in one shot and
expect the other end to get all of them. They probably won't all even make
it to the wire, so the recipient can't solve the problem.

DS


2004-06-02 06:09:20

by Ben Greear

[permalink] [raw]
Subject: Re: Select/Poll

[email protected] wrote:
> Hello All,
> In one of the threads named: "Linux's implementation of poll() not
> scalable?'
> Linus has stated the following:
> **************
> Neither poll() nor select() have this problem: they don't get more
> expensive as you have more and more events - their expense is the number
> of file descriptors, not the number of events per se. In fact, both poll()
> and select() tend to perform _better_ when you have pending events, as
> they are both amenable to optimizations when there is no need for waiting,
> and scanning the arrays can use early-out semantics.
> **************
> Please help me understand the above.. I'm using select in a server to read
> on multiple FDs and the clients are dumping messages (of fixed size) in a
> loop on these FDs and the server maintainig those FDs is not able to get
> all
> the messages.. Some of the last messages sent by each client are lost.
> If the number of clients and hence the number of FDs (in the server) is
> increased the loss of data is proportional.
> eg: 5 clients send messages (100 each) to 1 server and server receives
> 96 messages from each client.
> 10 clients send messages (100 by each) to 1 server and server again
> receives 96 from each client.
> If a small sleep in introduced between sending messages the loss of data
> decreases.
> Also please explain the algorithm select uses to read messages on FDs and
> how does it perform better when number of FDs increases.

Try increasing your socket buffers so that the kernel will queue up more
packets while your user-space server is trying to wake up.

I used to have no problem receiving data with up to 1024 file descriptors
using select, but when you need more than 1024, you will need to go to poll
because fd_set has a maximum size of 1024 by default...

To increase your buffers, google for these files:
/proc/sys/net/core/wmem_max
/proc/sys/net/core/rmem_max
/proc/sys/net/core/netdev_max_backlog
...

Here is some sample code I use to set the buffer size based on the
maximum rate I think this socket will want to send:

int set_sock_wr_buffer_size(int desc, uint32 mx_rate) {
int sz = (mx_rate / 40);
if (sz < 32000) {
sz = 32000;
}
if (sz > 4096000) {
sz = 4096000;
}

while (sz >= 32000) {
if (setsockopt(desc, SOL_SOCKET, SO_SNDBUF, (void*)&sz,
sizeof(sz)) < 0) {
VLOG_WRN(VLOG << "ERROR: setting send buffer to: " << sz << " failed: "
<< strerror(errno) << endl);
sz = sz >> 1;
}
else {
VLOG_INF(VLOG << "Set SNDBUF sz to: " << sz << " for desc: " << desc << endl);
break;
}
}

sz = max(2048000, sz);
while (sz >= 32000) {
if (setsockopt(desc, SOL_SOCKET, SO_RCVBUF, (void*)&sz,
sizeof(sz)) < 0) {
VLOG_WRN(VLOG << "ERROR: setting receive buffer to: " << sz << " failed: "
<< strerror(errno) << endl);
sz = sz >> 1;
}
else {
VLOG_INF(VLOG << "Set RCVBUF sz to: " << sz << " for desc: " << desc << endl);
break;
}
}

return sz;
}//set_sock_wr_buffer_size


Ben


> Thanks and Regards,
> Jyotiraditya -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2004-06-02 06:12:49

by Ben Greear

[permalink] [raw]
Subject: Re: Select/Poll

David Schwartz wrote:

> Your issue has nothing to do with select or poll scalability, it has to do
> with the fact that UDP is unreliable and you must provide your own send
> timing. A UDP server or client cannot just send 100 messages in one shot and
> expect the other end to get all of them. They probably won't all even make
> it to the wire, so the recipient can't solve the problem.

You can check that they get to the wire in (almost?) all cases by watching
the return value for the sendto call. And, if you have decent buffers on
the receive side, and a clean transport, then you can send at very high speeds
w/out dropping any significant number of packets, even when using select/poll and
non-blocking sockets...

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2004-06-02 06:38:06

by bert hubert

[permalink] [raw]
Subject: Re: Select/Poll

> You can check that they get to the wire in (almost?) all cases by watching

QoS settings may drop your packet before actually hitting the wire, but
after being enqueued to the kernel.

--
http://www.PowerDNS.com Open source, database driven DNS Software
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO

2004-06-02 07:05:32

by Vadim Lobanov

[permalink] [raw]
Subject: Re: Select/Poll

On Tue, 1 Jun 2004 [email protected] wrote:

> Hello All,
>
> In one of the threads named: "Linux's implementation of poll() not
> scalable?'
> Linus has stated the following:
> **************
> Neither poll() nor select() have this problem: they don't get more
> expensive as you have more and more events - their expense is the number
> of file descriptors, not the number of events per se. In fact, both poll()
> and select() tend to perform _better_ when you have pending events, as
> they are both amenable to optimizations when there is no need for waiting,
> and scanning the arrays can use early-out semantics.
> **************
>
> Please help me understand the above.. I'm using select in a server to read
> on multiple FDs and the clients are dumping messages (of fixed size) in a
> loop on these FDs and the server maintainig those FDs is not able to get all
> the messages.. Some of the last messages sent by each client are lost.
> If the number of clients and hence the number of FDs (in the server) is
> increased the loss of data is proportional.
> eg: 5 clients send messages (100 each) to 1 server and server receives
> 96 messages from each client.
> 10 clients send messages (100 by each) to 1 server and server again
> receives 96 from each client.
>
> If a small sleep in introduced between sending messages the loss of data
> decreases.
> Also please explain the algorithm select uses to read messages on FDs and
> how does it perform better when number of FDs increases.
>
> Thanks and Regards,
> Jyotiraditya

I think everyone else already hit on the main points of UDP, so I'll pass
on to the second question. :)

I believe that there is some confusion between the phrases "events" and
"FDs". As far as I know, both poll() and select() scale O(n) (in other
words, linearly) with the number of watched FDs, but scale O(1) (in other
words, no effect) with the number of received events. Let's put this into
more concrete terms:

Suppose you select/poll on an array of 100 FDs, which currently have no
pending events. What the kernel will do for you, in essence, is go into an
infinite loop, querying each of the 100 FDs in turn, whether it has
received new events or not. If one of those has received an event, then
select/poll will return that FD. But in the end, it reduces to a simple
loop over the FDs to determine when events arrive, and it is exactly this
loop that gives it O(n) behavior.

However, if by the time that select/poll are called, there are already
pending events upon the FD set, then that syscall can return immediately
with the events already present. In this case, you will not need to begin
looping over the FDs, and hence you will not observe the O(n) behavior.
Notice that this favorable scenario is more likely to occur when you have
more events coming in. I think that this is what Linus meant when he said
that select/poll like to have events waiting, for a faster return time.

As a very quick and very much simplistic summary, for select/poll, the
more incoming events you get, and the less FDs you watch, the better off
you are. But in your case, I do not think you have to worry about
scalability much. If you _really_ want to, however, check epoll - should
be standardized on the 2.6.x kernels (though my glibc still has VERY big
issues with it).

And as a final word, I have no doubts that someone out there who is more
knowledgeable can correct me wherever it may be needed. Such corrections
are welcome, since I get to learn something new in that case. :)

-VadimL

2004-06-02 14:11:10

by Davide Libenzi

[permalink] [raw]
Subject: Re: Select/Poll

On Wed, 2 Jun 2004, Vadim Lobanov wrote:

> scalability much. If you _really_ want to, however, check epoll - should
> be standardized on the 2.6.x kernels (though my glibc still has VERY big
> issues with it).

(s/should/is/)
The "very BIG issues" statement is kinda hard to debug. Did you try to
report you issues here (if kernel related) or to the glibc mailing list?



- Davide

2004-06-02 15:28:35

by Amit

[permalink] [raw]
Subject: Re: Select/Poll

Hello,
Can you give more details - Like which machine which vendor etc.,
On a sony vaio pcg frv31 laptop/ redhat 9.0/ after firing some 36,000+ request
my select multiplexed server used to fail. With select I believe you not get
any packet loss...

- Amit

PS. If you can post the code that will be great...


Quoting [email protected]:

> Hello All,
>
> In one of the threads named: "Linux's implementation of poll() not
> scalable?'
> Linus has stated the following:
> **************
> Neither poll() nor select() have this problem: they don't get more
> expensive as you have more and more events - their expense is the number
> of file descriptors, not the number of events per se. In fact, both poll()
> and select() tend to perform _better_ when you have pending events, as
> they are both amenable to optimizations when there is no need for waiting,
> and scanning the arrays can use early-out semantics.
> **************
>
> Please help me understand the above.. I'm using select in a server to read
> on multiple FDs and the clients are dumping messages (of fixed size) in a
> loop on these FDs and the server maintainig those FDs is not able to get all
> the messages.. Some of the last messages sent by each client are lost.
> If the number of clients and hence the number of FDs (in the server) is
> increased the loss of data is proportional.
> eg: 5 clients send messages (100 each) to 1 server and server receives
> 96 messages from each client.
> 10 clients send messages (100 by each) to 1 server and server again
> receives 96 from each client.
>
> If a small sleep in introduced between sending messages the loss of data
> decreases.
> Also please explain the algorithm select uses to read messages on FDs and
> how does it perform better when number of FDs increases.
>
> Thanks and Regards,
> Jyotiraditya
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


2004-06-03 15:24:22

by Mike Jagdis

[permalink] [raw]
Subject: Re: Select/Poll

On Wed, Jun 02, 2004 at 11:28:29AM -0400, [email protected] wrote:
> Hello,
> Can you give more details - Like which machine which vendor etc.,
> On a sony vaio pcg frv31 laptop/ redhat 9.0/ after firing some 36,000+ request
> my select multiplexed server used to fail. With select I believe you not get
> any packet loss...

Then you'd be wrong. Poll/select tell you when desriptors
are readable/writable. They do *not* impose any magic queuing
mechanism that guarantees the buffers won't overflow. If the
low level protocol is non-flow controlled like UDP you *have*
to read data faster than it arrives and not write data faster
than it is being transmitted.

Mike

--
Mike Jagdis Web: http://www.eris-associates.co.uk
Eris Associates Limited Tel: +44 7780 608 368
Reading, England Fax: +44 118 926 6974

2004-06-03 15:55:05

by Amit

[permalink] [raw]
Subject: Re: Select/Poll

I meant it in the context of TCP. I thought it was implicit enough, because if
he was using UDP then packet loss is expected. (not necessary that it will
happen)

- Amit Khandelwal

Quoting Mike Jagdis <[email protected]>:

> On Wed, Jun 02, 2004 at 11:28:29AM -0400, [email protected] wrote:
> > Hello,
> > Can you give more details - Like which machine which vendor etc.,
> > On a sony vaio pcg frv31 laptop/ redhat 9.0/ after firing some 36,000+
> request
> > my select multiplexed server used to fail. With select I believe you not
> get
> > any packet loss...
>
> Then you'd be wrong. Poll/select tell you when desriptors
> are readable/writable. They do *not* impose any magic queuing
> mechanism that guarantees the buffers won't overflow. If the
> low level protocol is non-flow controlled like UDP you *have*
> to read data faster than it arrives and not write data faster
> than it is being transmitted.
>
> Mike
>
> --
> Mike Jagdis Web: http://www.eris-associates.co.uk
> Eris Associates Limited Tel: +44 7780 608 368
> Reading, England Fax: +44 118 926 6974
>