2008-11-01 16:38:42

by Olaf van der Spek

[permalink] [raw]
Subject: epoll behaviour after running out of descriptors

Hi,

I noticed some strange behaviour of epoll after running out of descriptors.
I've registered a listen socket to epoll with edge triggering. On the
client-side I use an app that simply keeps opening connections.
When accept returns EMFILE, I call epoll_wait and accept and it
returns with another EMFILE.
This happens 10 times or so, after that epoll_wait no longer returns
with the listen socket ready.
I then close all file descriptors, but epoll_wait will still not return.
So my question is, why does it 'only' happen 10 times and what is the
expected behaviour?
And how should an app handle this?

The example in the epoll man page doesn't seem to handle this.

An idea I had was for epoll_wait to only return with accept / EMFILE
once. Then after a descriptor becomes available, epoll_wait would
return again.

See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502901

Hi,

I've written a web app that should be able to handle a lot of new
connections per second (1000+). On multiple servers I've hit a bug.
After running out of descriptors, then closing descriptors, epoll_wait
doesn't return anymore for the listen socket.
I've attached code to reproduce the issue. And an strace log. Even
before closing the descriptors you see epoll_wait already stops returning.

On the other side, I used a self-written app that just opens tons of
connections. Is there a standard utility to do that?

#include <arpa/inet.h>
#include <cassert>
#include <ctime>
#include <errno.h>
#include <netinet/in.h>
#include <sys/epoll.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <vector>

using namespace std;

int main()
{
int l = socket(AF_INET, SOCK_STREAM, 0);
unsigned long p = true;
ioctl(l, FIONBIO, &p);
sockaddr_in a = {0};
a.sin_family = AF_INET;
a.sin_addr.s_addr = INADDR_ANY;
a.sin_port = htons(2710);
bind(l, reinterpret_cast<sockaddr*>(&a), sizeof(sockaddr_in));
listen(l, SOMAXCONN);
int fd = epoll_create(1 << 10);
epoll_event e;
e.data.fd = l;
e.events = EPOLLIN | EPOLLOUT | EPOLLPRI | EPOLLERR | EPOLLHUP
| EPOLLET;
epoll_ctl(fd, EPOLL_CTL_ADD, l, &e);
const int c_events = 64;
epoll_event events[c_events];
typedef vector<int> sockets_t;
sockets_t sockets;
time_t t = time(NULL);
while (1)
{
int r = epoll_wait(fd, events, c_events, 5000);
if (r == -1)
continue;
if (!r && time(NULL) - t > 30)
{
for (int i = 0; i < sockets.size(); i++)
close(sockets[i]);
sockets.clear();
t = INT_MAX;
}
for (int i = 0; i < r; i++)
{
if (events[i].data.fd == l)
{
while (1)
{
int s = accept(l, NULL, NULL);
if (s == -1)
{
if (errno == EAGAIN)
break;
break; // continue;
}
sockets.push_back(s);
}
}
else
assert(false);
}
}
return 0;
}

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
ioctl(3, FIONBIO, [1]) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(2710),
sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(3, 128) = 0
epoll_create(1024) = 4
epoll_ctl(4, EPOLL_CTL_ADD, 3,
{EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLERR|EPOLLHUP|EPOLLET, {u32=3,
u64=13806959039201935363}}) = 0
time(NULL) = 1224527442
epoll_wait(4, {}, 64, 5000) = 0
time(NULL) = 1224527447
epoll_wait(4, {{EPOLLIN, {u32=3, u64=13806959039201935363}}}, 64, 5000) = 1
accept(3, 0, NULL) = 5
brk(0) = 0x804c000
brk(0x806d000) = 0x806d000
accept(3, 0, NULL) = 6
accept(3, 0, NULL) = 7
accept(3, 0, NULL) = 8
accept(3, 0, NULL) = -1 EAGAIN (Resource
temporarily unavailable)
epoll_wait(4, {{EPOLLIN, {u32=3, u64=13806959039201935363}}}, 64, 5000) = 1
accept(3, 0, NULL) = 9
...
accept(3, 0, NULL) = 85
accept(3, 0, NULL) = -1 EAGAIN (Resource
temporarily unavailable)
epoll_wait(4, {{EPOLLIN, {u32=3, u64=13806959039201935363}}}, 64, 5000) = 1
accept(3, 0, NULL) = 86
...
accept(3, 0, NULL) = 1023
accept(3, 0, NULL) = -1 EMFILE (Too many open files)
epoll_wait(4, {{EPOLLIN, {u32=3, u64=13806959039201935363}}}, 64, 5000) = 1
accept(3, 0, NULL) = -1 EMFILE (Too many open files)
epoll_wait(4, {{EPOLLIN, {u32=3, u64=13806959039201935363}}}, 64, 5000) = 1
...
epoll_wait(4, {{EPOLLIN, {u32=3, u64=13806959039201935363}}}, 64, 5000) = 1
accept(3, 0, NULL) = -1 EMFILE (Too many open files)
epoll_wait(4, {}, 64, 5000) = 0
time(NULL) = 1224527454
epoll_wait(4, {}, 64, 5000) = 0
time(NULL) = 1224527459
epoll_wait(4, {}, 64, 5000) = 0
time(NULL) = 1224527464
epoll_wait(4, {}, 64, 5000) = 0
time(NULL) = 1224527469
epoll_wait(4, {}, 64, 5000) = 0
time(NULL) = 1224527474
close(5) = 0
...
close(1023) = 0
epoll_wait(4, {}, 64, 5000) = 0
time(NULL) = 1224527479
epoll_wait(4, {}, 64, 5000) = 0
time(NULL) = 1224527484
epoll_wait(4, {}, 64, 5000) = 0
time(NULL) = 1224527489
epoll_wait(4, {}, 64, 5000) = 0
time(NULL) = 1224527494
epoll_wait(4, {}, 64, 5000) = 0
time(NULL) = 1224527499
epoll_wait(4, {}, 64, 5000) = 0
time(NULL) = 1224527504

-- Package-specific info:
** Version:
Linux version 2.6.24-etchnhalf.1-686 (Debian 2.6.24-6~etchnhalf.5)
([email protected]) (gcc version 4.1.2 20061115 (prerelease) (Debian
4.1.1-21)) #1 SMP Mon Sep 8 06:19:11 UTC 2008

** Command line:
root=/dev/sda1 ro

** Not tainted


2008-11-02 18:25:45

by Davide Libenzi

[permalink] [raw]
Subject: Re: epoll behaviour after running out of descriptors

On Sat, 1 Nov 2008, Olaf van der Spek wrote:

> Hi,
>
> I noticed some strange behaviour of epoll after running out of descriptors.
> I've registered a listen socket to epoll with edge triggering. On the
> client-side I use an app that simply keeps opening connections.
> When accept returns EMFILE, I call epoll_wait and accept and it
> returns with another EMFILE.
> This happens 10 times or so, after that epoll_wait no longer returns
> with the listen socket ready.
> I then close all file descriptors, but epoll_wait will still not return.
> So my question is, why does it 'only' happen 10 times and what is the
> expected behaviour?
> And how should an app handle this?
>
> The example in the epoll man page doesn't seem to handle this.
>
> An idea I had was for epoll_wait to only return with accept / EMFILE
> once. Then after a descriptor becomes available, epoll_wait would
> return again.
>
> See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502901
>
> Hi,
>
> I've written a web app that should be able to handle a lot of new
> connections per second (1000+). On multiple servers I've hit a bug.
> After running out of descriptors, then closing descriptors, epoll_wait
> doesn't return anymore for the listen socket.
> I've attached code to reproduce the issue. And an strace log. Even
> before closing the descriptors you see epoll_wait already stops returning.

A bug? For starters, epoll_wait does NOT create new files, so no EMFILE
can come out from there.
You are saturating the port space, and your whole code logic is rather (at
least) buggy. Try a `netstat -n -t | grep TIME_WAIT | wc -l`



- Davide

2008-11-02 18:33:19

by Olaf van der Spek

[permalink] [raw]
Subject: Re: epoll behaviour after running out of descriptors

On Sun, Nov 2, 2008 at 7:25 PM, Davide Libenzi <[email protected]> wrote:
> A bug? For starters, epoll_wait does NOT create new files, so no EMFILE
> can come out from there.

It's accept that returns EMFILE.

> You are saturating the port space, and your whole code logic is rather (at
> least) buggy. Try a `netstat -n -t | grep TIME_WAIT | wc -l`

What makes you think I'm saturating the port space?
That space is way bigger than 1 k AFAIK.

EMFILE The per-process limit of open file descriptors has been reached.

And what part of my code logic is buggy?

Olaf

2008-11-02 18:55:00

by Davide Libenzi

[permalink] [raw]
Subject: Re: epoll behaviour after running out of descriptors

On Sun, 2 Nov 2008, Olaf van der Spek wrote:

> On Sun, Nov 2, 2008 at 7:25 PM, Davide Libenzi <[email protected]> wrote:
> > A bug? For starters, epoll_wait does NOT create new files, so no EMFILE
> > can come out from there.
>
> It's accept that returns EMFILE.
>
> > You are saturating the port space, and your whole code logic is rather (at
> > least) buggy. Try a `netstat -n -t | grep TIME_WAIT | wc -l`
>
> What makes you think I'm saturating the port space?
> That space is way bigger than 1 k AFAIK.

Why don't you grep for TIME_WAIT?



- Davide

2008-11-02 18:55:49

by Olaf van der Spek

[permalink] [raw]
Subject: Re: epoll behaviour after running out of descriptors

On Sun, Nov 2, 2008 at 7:48 PM, Davide Libenzi <[email protected]> wrote:
> Why don't you grep for TIME_WAIT?

Because I don't have access to the test environment at the moment.

2008-11-02 19:10:54

by Eric Dumazet

[permalink] [raw]
Subject: Re: epoll behaviour after running out of descriptors

Olaf van der Spek a écrit :
> On Sun, Nov 2, 2008 at 7:48 PM, Davide Libenzi <[email protected]> wrote:
>> Why don't you grep for TIME_WAIT?
>
> Because I don't have access to the test environment at the moment.

Hello Olaf

If your application calls accept() and accept() returns EMFILE, its a nullop.

On listen queue, socket is still ready for an accept().


Since you use edge trigered epoll, you'll only reveive new notification.

You probably had in you app a : listen(sock, 10), so after 10 notifications,
your listen queue is full and TCP stack refuses to handle new connections.

In order to cope with this kind of thing the trick I personnally use is to always keep
around a *free* fd, that is :

At start of program, reserve an "emergency fd"
free_fd = open("/dev/null", O_RDONLY)

Then later :

newfd = accept(...)
if (newfd == -1 && errno == EMFILE) {
/* emergency action : clean listen queue */
close(free_fd);
newfd = accept(...);
close(newfd); /* forget this incoming connection, we dont have enough fd */
free_fd = open("/dev/null"; O_RDONLY);
}

Of course, if your application is multi-threaded, you might adapt (and eventually reserve
one emergency fd per thread)

2008-11-02 19:17:39

by Davide Libenzi

[permalink] [raw]
Subject: Re: epoll behaviour after running out of descriptors

On Sun, 2 Nov 2008, Olaf van der Spek wrote:

> On Sun, Nov 2, 2008 at 7:48 PM, Davide Libenzi <[email protected]> wrote:
> > Why don't you grep for TIME_WAIT?
>
> Because I don't have access to the test environment at the moment.

Here:

http://tinyurl.com/5ay86v



- Davide

2008-11-02 19:20:35

by Olaf van der Spek

[permalink] [raw]
Subject: Re: epoll behaviour after running out of descriptors

On Sun, Nov 2, 2008 at 8:17 PM, Davide Libenzi <[email protected]> wrote:
> On Sun, 2 Nov 2008, Olaf van der Spek wrote:
>
>> On Sun, Nov 2, 2008 at 7:48 PM, Davide Libenzi <[email protected]> wrote:
>> > Why don't you grep for TIME_WAIT?
>>
>> Because I don't have access to the test environment at the moment.
>
> Here:
>
> http://tinyurl.com/5ay86v

I know what TIME_WAIT is. I just think it's not applicable to this situation.

2008-11-02 19:27:49

by Davide Libenzi

[permalink] [raw]
Subject: Re: epoll behaviour after running out of descriptors

On Sun, 2 Nov 2008, Olaf van der Spek wrote:

> On Sun, Nov 2, 2008 at 8:17 PM, Davide Libenzi <[email protected]> wrote:
> > On Sun, 2 Nov 2008, Olaf van der Spek wrote:
> >
> >> On Sun, Nov 2, 2008 at 7:48 PM, Davide Libenzi <[email protected]> wrote:
> >> > Why don't you grep for TIME_WAIT?
> >>
> >> Because I don't have access to the test environment at the moment.
> >
> > Here:
> >
> > http://tinyurl.com/5ay86v
>
> I know what TIME_WAIT is. I just think it's not applicable to this situation.

It is. You are saturating the port space, so no new POLLIN/accept events
are sent (until some TIME_WAIT clears), so epoll_wait() returns nothing
(or does not return, if INF timeo).
Keeping only 1K (if this is what you meant with your *only* 1K)
connections *alive*, does not mean the trail that does moving 1K
connections leave, is free.
If you ever played with things like httperf, you should know what I'm
talking about.



- Davide

2008-11-02 19:45:20

by Olaf van der Spek

[permalink] [raw]
Subject: Re: epoll behaviour after running out of descriptors

On Sun, Nov 2, 2008 at 8:10 PM, Eric Dumazet <[email protected]> wrote:
> On listen queue, socket is still ready for an accept().

True, but not handy.

> Since you use edge trigered epoll, you'll only reveive new notification.

The strace shows I receive 10+.
If a return with EMFILE is indeed a no-op, I should receive only one.

> You probably had in you app a : listen(sock, 10), so after 10 notifications,
> your listen queue is full and TCP stack refuses to handle new connections.

I've got listen(l, SOMAXCONN);
IIRC SOMAXCONN is 128.

> close(newfd); /* forget this incoming connection, we dont have enough fd */

Why not keep them in the queue until you do have enough descriptors?

> Of course, if your application is multi-threaded, you might adapt (and
> eventually reserve
> one emergency fd per thread)

Sounds like a great recipe for race conditions. ;)

2008-11-02 20:41:18

by Olaf van der Spek

[permalink] [raw]
Subject: Re: epoll behaviour after running out of descriptors

On Sun, Nov 2, 2008 at 8:27 PM, Davide Libenzi <[email protected]> wrote:
>> I know what TIME_WAIT is. I just think it's not applicable to this situation.
>
> It is. You are saturating the port space, so no new POLLIN/accept events
> are sent (until some TIME_WAIT clears), so epoll_wait() returns nothing
> (or does not return, if INF timeo).
> Keeping only 1K (if this is what you meant with your *only* 1K)
> connections *alive*, does not mean the trail that does moving 1K
> connections leave, is free.
> If you ever played with things like httperf, you should know what I'm
> talking about.

Wouldn't the port space require about 20+ k connects? This issue
happens after 1 k.

2008-11-02 21:17:24

by Davide Libenzi

[permalink] [raw]
Subject: Re: epoll behaviour after running out of descriptors

On Sun, 2 Nov 2008, Olaf van der Spek wrote:

> On Sun, Nov 2, 2008 at 8:27 PM, Davide Libenzi <[email protected]> wrote:
> >> I know what TIME_WAIT is. I just think it's not applicable to this situation.
> >
> > It is. You are saturating the port space, so no new POLLIN/accept events
> > are sent (until some TIME_WAIT clears), so epoll_wait() returns nothing
> > (or does not return, if INF timeo).
> > Keeping only 1K (if this is what you meant with your *only* 1K)
> > connections *alive*, does not mean the trail that does moving 1K
> > connections leave, is free.
> > If you ever played with things like httperf, you should know what I'm
> > talking about.
>
> Wouldn't the port space require about 20+ k connects? This issue
> happens after 1 k.

The reason for "When accept returns EMFILE, I call epoll_wait and accept
and it returns with another EMFILE." is because your sockets-close logic
is broken. You get an event for the listening fd, you go call accept(2)
and in one or two passes you fill up the avail fd space, then you go back
calling epoll_wait(), and yet back to accept(2). This w/out triggering the
file-close-relief code (yes, you fill up 1K fds *before* 30 seconds). Of
course you get another EMFILE. When after a little while the close-loop
triggers, likely the client quit trying, or the kernel accept backlog is
full and no new events (remember, you chose ET) are triggered.
EMFILE is not EAGAIN, and it means that the fd can still have something
for you. Going back to sleep with (EMFILE && ET) is bad mojo.
This is more food for linux-userspace than linux-kernel though.



- Davide

2008-11-02 22:17:21

by Olaf van der Spek

[permalink] [raw]
Subject: Re: epoll behaviour after running out of descriptors

On Sun, Nov 2, 2008 at 10:17 PM, Davide Libenzi <[email protected]> wrote:
>> Wouldn't the port space require about 20+ k connects? This issue
>> happens after 1 k.
>
> The reason for "When accept returns EMFILE, I call epoll_wait and accept
> and it returns with another EMFILE." is because your sockets-close logic
> is broken.

It's not broken, it's designed that way. It's designed to hit the
descriptor limit and then close all sockets some time after.

> You get an event for the listening fd, you go call accept(2)
> and in one or two passes you fill up the avail fd space, then you go back
> calling epoll_wait(), and yet back to accept(2). This w/out triggering the
> file-close-relief code (yes, you fill up 1K fds *before* 30 seconds). Of
> course you get another EMFILE.

The second EMFILE doesn't make sense, epoll_wait shouldn't signal the
socket as ready again, right?

2008-11-02 22:49:56

by Davide Libenzi

[permalink] [raw]
Subject: Re: epoll behaviour after running out of descriptors

On Sun, 2 Nov 2008, Olaf van der Spek wrote:

> On Sun, Nov 2, 2008 at 10:17 PM, Davide Libenzi <[email protected]> wrote:
> >> Wouldn't the port space require about 20+ k connects? This issue
> >> happens after 1 k.
> >
> > The reason for "When accept returns EMFILE, I call epoll_wait and accept
> > and it returns with another EMFILE." is because your sockets-close logic
> > is broken.
>
> It's not broken, it's designed that way. It's designed to hit the
> descriptor limit and then close all sockets some time after.
>
> > You get an event for the listening fd, you go call accept(2)
> > and in one or two passes you fill up the avail fd space, then you go back
> > calling epoll_wait(), and yet back to accept(2). This w/out triggering the
> > file-close-relief code (yes, you fill up 1K fds *before* 30 seconds). Of
> > course you get another EMFILE.
>
> The second EMFILE doesn't make sense, epoll_wait shouldn't signal the
> socket as ready again, right?

At the time of the first EMFILE, you've filled up the fd space, but not
the kernel listen backlog. Additions to the backlog, triggers new events,
that you see after the first EMFILE. At a given point, the backlog is
full, so no new half connections are dropped in there, so no new events
are generated.
Again, sleeping on (EMFILE && ET) is bad mojo, and nowhere is written that
events should be generated in the EMFILE->no-EMFILE transitions.



- Davide

2008-11-03 08:07:50

by Olaf van der Spek

[permalink] [raw]
Subject: Re: epoll behaviour after running out of descriptors

On Sun, Nov 2, 2008 at 11:49 PM, Davide Libenzi <[email protected]> wrote:
> At the time of the first EMFILE, you've filled up the fd space, but not
> the kernel listen backlog. Additions to the backlog, triggers new events,

Shouldn't ET only fire again *after* you drained the queue? When
accept returns EMFILE, you did not drain the queue.

> that you see after the first EMFILE. At a given point, the backlog is
> full, so no new half connections are dropped in there, so no new events
> are generated.

The backlog is 128 entries though, I don't see that many EMFILEs.

> Again, sleeping on (EMFILE && ET) is bad mojo,

It's not always best to free up descriptors right away.

> and nowhere is written that
> events should be generated in the EMFILE->no-EMFILE transitions.

That's true, but I'm saying that this might be handy to have.