2001-07-13 13:59:56

by Chuck Winters

[permalink] [raw]
Subject: Number of File descriptors

Hello All,
My interest has been peaked by a recent email. At one point, I heard two people speaking
about how some database guy wanted to have 2000 open files(or something crazy like that).
They said that he must be crazy because the kernel does a sequential search through the open
file descriptors. Anyway, I read a posting an a mail list that someone wanted select to
select on 3000 files. Alright, the question(Finally!):
To have select() select on 3000 file descriptors, they must be open. That's 3000 open
files. Will select be ultra slow trying to select on 3000 file descriptors? Also,
what is the clarification on the kernel doing a sequential search through the open
file descriptors?

Thanks In Advance,
Chuck Winters


2001-07-14 06:03:29

by David Schwartz

[permalink] [raw]
Subject: RE: Number of File descriptors


> Hello All,
> My interest has been peaked by a recent email. At one
> point, I heard two people speaking
> about how some database guy wanted to have 2000 open files(or
> something crazy like that).
> They said that he must be crazy because the kernel does a
> sequential search through the open
> file descriptors. Anyway, I read a posting an a mail list that
> someone wanted select to
> select on 3000 files. Alright, the question(Finally!):
> To have select() select on 3000 file descriptors,
> they must be open. That's 3000 open
> files. Will select be ultra slow trying to select
> on 3000 file descriptors? Also,
> what is the clarification on the kernel doing a
> sequential search through the open
> file descriptors?

Using 'select' on 3,000 file descriptors is not a problem. I have used
'poll' on 12,000 file descriptors with no problems at all. Performance is
not exactly stellar (you can use threads to improve it) but it's quite good.

DS

2001-07-14 15:02:39

by Dan Kegel

[permalink] [raw]
Subject: Re: Number of File descriptors

Chuck wrote:
> Will select be ultra slow trying to select on 3000 file descriptors?

Note that using select in a program that uses more than 1024 file
descriptors is not completely portable; you have to redefine __FD_SETSIZE.
I have heard people say that's easy, but as recently as March 2000,
Ulrich Depper advised against it:
http://sources.redhat.com/ml/bug-glibc/2000-03/msg00051.html
As far as I can see, it's not easy on Red Hat 6.2 or 7.1.

But that's ok; you can use poll() instead.

See http://www.kegel.com/dkftpbench/Poller_bench.html for
some measurements on the speed of select() and poll() for large numbers
of file descriptors. Here's an excerpt:

Time to select or poll n file descriptors, in microseconds,
on 650 MHz dual Pentium III with kernel 2.4.0-test10-pre4 smp:

file descriptors
100 1000 10000
select 52 - -
poll 49 1184 14660

So poll() is indeed slow at 1000 to 10000 file descriptors; whether
it's too slow depends on your application.

> Also, what is the clarification on the kernel doing a sequential
> search through the open file descriptors?

Yep, that's where the slowness comes from. Linux offers several ways
around it:
* RT signal stuff that comes standard with 2.4
* Provos' /dev/poll patch
* Vitaly Luban's enhanced RT signal patch
* Davide Libenzi's enhanced /dev/epoll patch
You can read about all of these at http://www.kegel.com/c10k.html#nb

The first has the advantage of being part of the 2.4 kernel already,
but is a bit of a pain to use (you have to handle signal overflow).
The second still has one linear scan in it, so it doesn't scale well.
The third and fourth are the top contenders for 'fastest replacement for
select()' on Linux, but aren't part of the standard kernel yet.

- Dan

2001-07-17 13:47:15

by Chuck Winters

[permalink] [raw]
Subject: Re: Number of File descriptors

Thank You all for the information

Chuck

On Fri, Jul 13, 2001 at 09:59:34AM -0400, Chuck Winters wrote:
> Hello All,
> My interest has been peaked by a recent email. At one point, I heard two people speaking
> about how some database guy wanted to have 2000 open files(or something crazy like that).
> They said that he must be crazy because the kernel does a sequential search through the open
> file descriptors. Anyway, I read a posting an a mail list that someone wanted select to
> select on 3000 files. Alright, the question(Finally!):
> To have select() select on 3000 file descriptors, they must be open. That's 3000 open
> files. Will select be ultra slow trying to select on 3000 file descriptors? Also,
> what is the clarification on the kernel doing a sequential search through the open
> file descriptors?
>
> Thanks In Advance,
> Chuck Winters
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-07-18 13:11:42

by Patrick O'Rourke

[permalink] [raw]
Subject: Re: Number of File descriptors

Abhishek Chandra and David Mosberger wrote an interesting paper on
this subject which was presented at this year's Usenix conference.

See "Scalability of Linux Event-Dispatch Mechanisms" at (you must
be a member of Usenix to download it):

http://www.usenix.org/publications/library/proceedings/usenix01/technical.html

Pat

--
Patrick O'Rourke
[email protected]