LinuxLists.cc - epoll (was Re: [PATCH] async poll for 2.5)

2002-10-17 17:38:06

Subject: epoll (was Re: [PATCH] async poll for 2.5)

Dan Kegel wrote:

> As long as we agree that the kernel may provide spurious readiness
> notifications on occasion, I agree.

Great! We agree! Progress!

>>> while (read() == EAGAIN)
>>> wait(POLLIN);
>>>
>> Assuming registration of interest is inside wait(), this has a race.
>> If the file becomes readable between the time that read() returns and
>> the time that wait() can register interest, the connection will hang.
>
>
> Shouldn't the should be rearmed inside read() when it returns EAGAIN?

The key phrase is "assuming registration of interest is inside wait()."
The code fragment didn't cover when registration of interest occurs.
If registration of interest occurs before the read() or if registration
of interest while the fd is ready generates an event, there is no race.
If registration of interest occurs after the read() and registration of
interest while the fd is ready does not generate an event, there is a race.

Attachments:

smime.p7s (3.45 kB)
S/MIME Cryptographic Signature

2002-10-17 17:40:26

by John Myers

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

Davide Libenzi wrote:

>The poll()-like code :
>
>int my_io(...) {
>
> if (poll(...))
> do_io(...);
>
>}
>
>
This is not my example of a correct code scheme. You're made a strawman
argument, which proves nothing.

Attachments:

smime.p7s (3.45 kB)
S/MIME Cryptographic Signature

2002-10-17 18:00:18

by John Myers

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

Davide Libenzi wrote:

>>Nonsense. If you wish to make such a claim, you need to provide an
>>example of a situation in which it won't work.
>>
>>
>
>Your welcome. This is your code :
>
>for (;;) {
> fd = event_wait(...);
> while (do_io(fd) != EAGAIN);
>}
>
>If the I/O space is not exhausted when you call event_wait(...); you'll
>never receive the event because you'll be waiting a 0->1 transaction
>without bringing the signal to 0 ( I/O space exhausted ).
>
My code above does exhaust the I/O space.

> That one is a
>typical use of poll() - select() - /dev/poll and you showed pretty clearly
>that you do not seem to understand edge triggered event APIs. If you code
>your I/O function like :
>
>int my_io(...) {
>
> if (event_wait(...))
> do_io(...);
>
>}
>
This is not how my example is coded.

while (do_io(...) != EAGAIN);

is not equivalent to:

do_io(...);

The former is guaranteed to exhaust the I/O space, the latter is not.

You're spouting nonsense.

Attachments:

smime.p7s (3.45 kB)
S/MIME Cryptographic Signature

2002-10-17 18:19:02

by Davide Libenzi

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

On Thu, 17 Oct 2002, John Gardiner Myers wrote:

> Davide Libenzi wrote:
>
> >>Nonsense. If you wish to make such a claim, you need to provide an
> >>example of a situation in which it won't work.
> >>
> >>
> >
> >Your welcome. This is your code :
> >
> >for (;;) {
> > fd = event_wait(...);
> > while (do_io(fd) != EAGAIN);
> >}
> >
> >If the I/O space is not exhausted when you call event_wait(...); you'll
> >never receive the event because you'll be waiting a 0->1 transaction
> >without bringing the signal to 0 ( I/O space exhausted ).
> >
> My code above does exhaust the I/O space.

Look, I'm usually very polite but you're really wasting my time. You
should know that an instruction at line N is usually executed before an
instruction at line N+1. Now this IS your code :

[N-1] for (;;) {
[N ] fd = event_wait(...);
[N+1] while (do_io(fd) != EAGAIN);
[N+2} }

I will leave you as an exercise to understand what happens when you call
the first event_wait(...); and there is still data to be read/write on the
file descriptor. The reason you're asking /dev/epoll to drop an event at
fd insertion time shows very clearly that you're going to use the API is
the WRONG way and that you do not understand how such APIs works. And the
fact that there're users currently using the rt-sig and epoll APIs means
that either those guys are genius or you're missing something.

- Davide

2002-10-18 17:39:43

by Mark Mielke

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

> >>> while (read() == EAGAIN)
> >>> wait(POLLIN);

I find myself still not understanding this thread. Lots of examples of
code that should or should not be used, but I would always choose:

... ensure file descriptor is blocking ...
for (;;) {
int nread = read(...);
...
}

Over the above, or any derivative of the above.

What would be the point of using an event notification mechanism for
synchronous reads with no other multiplexed options?

A 'proper' event loop is significantly more complicated. Since everybody
here knows this... I'm still confused...

mark

--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

2002-10-18 18:57:22

by John Myers

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

Davide Libenzi wrote:

>Look, I'm usually very polite but you're really wasting my time. You
>should know that an instruction at line N is usually executed before an
>instruction at line N+1. Now this IS your code :
>
>[N-1] for (;;) {
>[N ] fd = event_wait(...);
>[N+1] while (do_io(fd) != EAGAIN);
>[N+2} }
>
>I will leave you as an exercise to understand what happens when you call
>the first event_wait(...); and there is still data to be read/write on the
>file descriptor.
>
Your claim was that even if the API will drop an event at registration
time, my code scheme would not work. Thus, we can take "the API will
drop an event at registration time" as postulated. That being
postulated, if there is still data to be read/written on the file
descriptor then the first event_wait will return immediately.

In fact, given that postulate and the appropriate axioms about the
behavior of event_wait() and do_io(), one can prove that my code scheme
is equivalent to yours. The logical conclusion from that and your claim
would be that you don't understand how edge triggered APIs have to be used.

>The reason you're asking /dev/epoll to drop an event at
>fd insertion time shows very clearly that you're going to use the API is
>the WRONG way and that you do not understand how such APIs works.
>
The wrong way as defined by what? Having /dev/epoll drop appropriate
events at registration time permits a useful simplification/optimization
and makes the system significantly less prone to subtle progamming errors.

I do understand how such APIs work, to the extent that I am pointing out
a flaw in their current models.

>And the fact that there're users currently using the rt-sig and epoll APIs means
>that either those guys are genius or you're missing something.
>
>
Nonsense. People are able to use flawed APIs all of the time.

2002-10-18 19:38:19

by Davide Libenzi

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

On Fri, 18 Oct 2002, John Gardiner Myers wrote:

> Your claim was that even if the API will drop an event at registration
> time, my code scheme would not work. Thus, we can take "the API will
> drop an event at registration time" as postulated. That being
> postulated, if there is still data to be read/written on the file
> descriptor then the first event_wait will return immediately.
>
> In fact, given that postulate and the appropriate axioms about the
> behavior of event_wait() and do_io(), one can prove that my code scheme
> is equivalent to yours. The logical conclusion from that and your claim
> would be that you don't understand how edge triggered APIs have to be used.

No, the concept of edge triggered APIs is that you have to use the fd
until EAGAIN. It's a very simple concept. That means that after a
connect()/accept() you have to start using the fd because I/O space might
be available for read()/write(). Dropping an event is an attempt of using
the API like poll() & Co., where after an fd born, it is put inside the
set to be later wake up. You're basically saying "the kernel should drop an
event at creation time" and I'm saying that, to keep the API usage
consistent to "use the fd until EAGAIN", you have to use the fd as soon as
it'll become available.

> >The reason you're asking /dev/epoll to drop an event at
> >fd insertion time shows very clearly that you're going to use the API is
> >the WRONG way and that you do not understand how such APIs works.
> >
> The wrong way as defined by what? Having /dev/epoll drop appropriate
> events at registration time permits a useful simplification/optimization
> and makes the system significantly less prone to subtle progamming errors.
>
> I do understand how such APIs work, to the extent that I am pointing out
> a flaw in their current models.

I'm sorry but why do you want to sell your mistakes for API flaws ?

- Davide

2002-10-18 21:00:27

by Charlie Krasic

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

> >[N-1] for (;;) {
> >[N ] fd = event_wait(...);
> >[N+1] while (do_io(fd) != EAGAIN);
> >[N+2} }

I'm getting confused over what minute details are being disputed here.

This debate might get clearer, to me anyway, if the example code
fragments were more concrete.

So if anybody still cares at this point, here is my stab at clarifying
some things.

PART I: THE RACE

Suppose we have the following:

1 for(;;) {
2 fd = event_wait(...);
3 if(fd == my_listen_fd) {
4 /* new connections */
5 while((new_fd = my_accept(my_listen_fd, ...) != EAGAIN))
6 epoll_addf(new_fd, ...);
7 } else {
8 /* established connections */
9 while(do_io(fd) != EAGAIN)
10 }
11 }

With the current epoll/rtsig semantics, there is a race condition
above. I think this essentially the same race condition as the
snippet at the top of this message.

Just to be clear, I walk completely through the steps in the race
scenario, as follows.

We start with our application blocked in line 2.

A new connection is initiated by the application on other side.

The kernels exchange SYNs, causing the connection to be established.

The kernel on our side queues the new connection, waiting for the
application on this side to call accept(). In the process it fires an
edge POLLIN on the listen_fd, which wakes up the kernel side of line
2. However, some time may pass before we actually wake up.

Meanwhile, the other side immediately sends some application level
data. The other side is going to wait for us to read the application
level data and respond. So it is now blocked.

All of this happens before our application runs line 5 to pick up the
new connection from the kernel.

Here comes the race:

Before we reach line 6, new_fd is not in epoll mode, so packet
arrivals do not trigger a POLLIN edge notfication on new_fd.

After line 6, there will be no data from the other side, so there will
still be no POLLIN edge notification for new_fd.

Therefore, line 2 will never yield a POLLIN event for new_fd, and the
new connection is now deadlocked.

Is this the kind of race we're talking about?

If so, we proceed as follows.

PART 2: SOLUTIONS

A race free alternative to write the code above is as follows. Only
one new line (marked with *) is added.

1 for(;;) {
2 fd = event_wait(...);
3 if(fd == my_listen_fd) {
4 /* new connections */
5 while((new_fd = my_accept(my_listen_fd, ...) != EAGAIN)) {
6 epoll_addf(new_fd, ...);
7* while(do_io(new_fd) != EAGAIN);
8 }
9 } else {
10 /* established connections */
11 while(do_io(fd) != EAGAIN)
12 }
13 }

The example above works with current epoll and rtsig semantics. This
is just rephrasing what Davide has been saying: "Never call event_wait
without first ensuring that IO space is definitively exhausted".

Or we could have (to make John happier?):

1 for(;;) {
2 fd = event_wait(...);
3 if(fd == my_listen_fd) {
4 /* new connections */
5 while((new_fd = my_accept(my_listen_fd, ...) != EAGAIN)) {
6* epoll_addf(new_fd, &pfd, ...);
7* if(pfd.revents & POLLIN) {
7* while(do_io(new_fd) != EAGAIN);
8* }
8 }
9 } else {
10 /* established connections */
11 while(do_io(fd) != EAGAIN)
12 }
13 }

Here, epoll_addf primitive has been modified to return the initial
status. Presumably so we avoid the first call to do_io if there is
nothing to do yet.

If it's easy to do (change add primitive that is), why not?

The first solution works either way.

-- Buck

2002-10-18 21:19:14

by Davide Libenzi

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

On 18 Oct 2002, Charles 'Buck' Krasic wrote:

> I'm getting confused over what minute details are being disputed here.
>
> This debate might get clearer, to me anyway, if the example code
> fragments were more concrete.
>
> So if anybody still cares at this point, here is my stab at clarifying
> some things.
>
> PART I: THE RACE
>
> Suppose we have the following:
>
> 1 for(;;) {
> 2 fd = event_wait(...);
> 3 if(fd == my_listen_fd) {
> 4 /* new connections */
> 5 while((new_fd = my_accept(my_listen_fd, ...) != EAGAIN))
> 6 epoll_addf(new_fd, ...);
> 7 } else {
> 8 /* established connections */
> 9 while(do_io(fd) != EAGAIN)
> 10 }
> 11 }
>
> With the current epoll/rtsig semantics, there is a race condition
> above. I think this essentially the same race condition as the
> snippet at the top of this message.
>
> Just to be clear, I walk completely through the steps in the race
> scenario, as follows.
>
> We start with our application blocked in line 2.
>
> A new connection is initiated by the application on other side.
>
> The kernels exchange SYNs, causing the connection to be established.
>
> The kernel on our side queues the new connection, waiting for the
> application on this side to call accept(). In the process it fires an
> edge POLLIN on the listen_fd, which wakes up the kernel side of line
> 2. However, some time may pass before we actually wake up.
>
> Meanwhile, the other side immediately sends some application level
> data. The other side is going to wait for us to read the application
> level data and respond. So it is now blocked.
>
> All of this happens before our application runs line 5 to pick up the
> new connection from the kernel.
>
> Here comes the race:
>
> Before we reach line 6, new_fd is not in epoll mode, so packet
> arrivals do not trigger a POLLIN edge notfication on new_fd.
>
> After line 6, there will be no data from the other side, so there will
> still be no POLLIN edge notification for new_fd.
>
> Therefore, line 2 will never yield a POLLIN event for new_fd, and the
> new connection is now deadlocked.
>
> Is this the kind of race we're talking about?

Exactly, you're going to wait for an event w/out having consumed the
possibly available I/O space.

> If so, we proceed as follows.
>
> PART 2: SOLUTIONS
>
> A race free alternative to write the code above is as follows. Only
> one new line (marked with *) is added.
>
> 1 for(;;) {
> 2 fd = event_wait(...);
> 3 if(fd == my_listen_fd) {
> 4 /* new connections */
> 5 while((new_fd = my_accept(my_listen_fd, ...) != EAGAIN)) {
> 6 epoll_addf(new_fd, ...);
> 7* while(do_io(new_fd) != EAGAIN);
> 8 }
> 9 } else {
> 10 /* established connections */
> 11 while(do_io(fd) != EAGAIN)
> 12 }
> 13 }

Exactly, this is the sketchy solution ( but event_wait() return more than
one fd though ).

- Davide

2002-10-19 00:49:28

by John Myers

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

Davide Libenzi wrote:

>No, the concept of edge triggered APIs is that you have to use the fd
>until EAGAIN.
>
Which my code does, given the postulate.

>It's a very simple concept. That means that after a
>connect()/accept() you have to start using the fd because I/O space might
>be available for read()/write(). Dropping an event is an attempt of using
>the API like poll() & Co., where after an fd born, it is put inside the
>set to be later wake up. You're basically saying "the kernel should drop an
>event at creation time" and I'm saying that, to keep the API usage
>consistent to "use the fd until EAGAIN", you have to use the fd as soon as
>it'll become available.
>
Here's where your argument is inconsistent with the Linux philosophy.

Linux has a strong philosophy of practicality. The goal of Linux is to
do useful things, including provide applications with the semantics they
need to do useful things. The criteria for deciding what goes into
Linux is heavily weighted towards what works best in practice.

Whether or not some API matches someone's Platonic ideal of of an OS
interface is not a criterion. In Linux, APIs are judged by their
practical merits. This is why Linux does not have such things as
message passing and separate address spaces for drivers.

So whether or not a proposed set of epoll semantics is consistent with
your Platonic ideal of "use the fd until EAGAIN" is simply not an issue.
What matters is what works best in practice.

2002-10-19 00:59:25

by John Myers

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

Charles 'Buck' Krasic wrote:

>Or we could have (to make John happier?):
>
>1 for(;;) {
>2 fd = event_wait(...);
>3 if(fd == my_listen_fd) {
>4 /* new connections */
>5 while((new_fd = my_accept(my_listen_fd, ...) != EAGAIN)) {
>6* epoll_addf(new_fd, &pfd, ...);
>7* if(pfd.revents & POLLIN) {
>7* while(do_io(new_fd) != EAGAIN);
>8* }
>8 }
>9 } else {
>10 /* established connections */
>11 while(do_io(fd) != EAGAIN)
>12 }
>13 }
>
>
Close. What we would have is a modification of the epoll_addf()
semantics such that it would have an additional postcondition that if
the new_fd is in the ready state (has data available) then at least one
notification has been generated. In the code above, the three lines
comprising the if statement labeled "7*" would be removed.

2002-10-19 01:21:22

by Tervel Atanassov

[permalink] [raw]

Subject: RE: epoll (was Re: [PATCH] async poll for 2.5)

I am just joining your discussion today for the fist time. I come from
a Windows implementation of async I/O, so please don't hold it against
me. I can't say that I am following 100% percent, but I think you guys
are talking about what the user API will look like, correct?

Assuming the answer is yes. Here are my two cents. The code you have
below seems a bit awkward -- the line while(do_io(fd) != EAGAIN) appears
twice. I think the reason for that is that you're trying to do too many
things at once, namely, you're trying to handle both the initial
accept/setup of the socket and its steady state servicing. I don't see
any benefit to that -- it definitely doesn't make for cleaner code. Why
not do things separately.

1. Have a setup phase which more or less does:

* listen()
* accept()
* add the new fd/socket to an "event" which all the worker threads are
waiting on.

2. Have the worker tread/steady state operation be:

* event_wait() which returns the fd, some descriptor of what exactly
happened (read/write), the number of bytes transferred.
* based upon the return from event wait the user updates his state, and
posts the next operation (read/write).

Thanks,

Tervel Atanassov

-----Original Message-----
From: [email protected] [mailto:[email protected]] On
Behalf Of John Myers
Sent: Friday, October 18, 2002 6:05 PM
To: Charles 'Buck' Krasic
Cc: Davide Libenzi; Benjamin LaHaise; Dan Kegel; Shailabh Nagar;
linux-kernel; linux-aio; Andrew Morton; David Miller; Linus Torvalds;
Stephen Tweedie
Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

Charles 'Buck' Krasic wrote:

>Or we could have (to make John happier?):
>
>1 for(;;) {
>2 fd = event_wait(...);
>3 if(fd == my_listen_fd) {
>4 /* new connections */
>5 while((new_fd = my_accept(my_listen_fd, ...) != EAGAIN)) {
>6* epoll_addf(new_fd, &pfd, ...);
>7* if(pfd.revents & POLLIN) {
>7* while(do_io(new_fd) != EAGAIN);
>8* }
>8 }
>9 } else {
>10 /* established connections */
>11 while(do_io(fd) != EAGAIN)
>12 }
>13 }
>
>
Close. What we would have is a modification of the epoll_addf()
semantics such that it would have an additional postcondition that if
the new_fd is in the ready state (has data available) then at least one
notification has been generated. In the code above, the three lines
comprising the if statement labeled "7*" would be removed.

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to [email protected]. For more info on Linux AIO,
see: http://www.kvack.org/aio/

2002-10-19 04:02:12

by Charlie Krasic

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

[email protected] (John Myers) writes:

> Close. What we would have is a modification of the epoll_addf()
> semantics such that it would have an additional postcondition that if
> the new_fd is in the ready state (has data available) then at least
> one notification has been generated. In the code above, the three
> lines comprising the if statement labeled "7*" would be removed.

I see.

I assume the kernel implementation is no big deal: epoll_addf() has to
call the kernel internal equivalent to poll() with a zero timeout.

This wouldn't break the first "solution" in my earlier post, but it
would cause every new connection to experience one extra EAGAIN.

I see three possibilities:

1) keep the current epoll_addf()
2) modify it as John suggests, posting the initial ready state in
the next epoll_getevents()
3) both: add an option to epoll_addf() that says which of 1 or 2 is desired.

-- Buck

How hard would it be to modify the current epoll code to work that
way? I'd assume it's just a matter having epoll_addf call the legacy
poll() code to check the condition (with a zero timeout).

-- Buck

2002-10-19 05:26:17

by Davide Libenzi

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

On Fri, 18 Oct 2002, John Myers wrote:

> >It's a very simple concept. That means that after a
> >connect()/accept() you have to start using the fd because I/O space might
> >be available for read()/write(). Dropping an event is an attempt of using
> >the API like poll() & Co., where after an fd born, it is put inside the
> >set to be later wake up. You're basically saying "the kernel should drop an
> >event at creation time" and I'm saying that, to keep the API usage
> >consistent to "use the fd until EAGAIN", you have to use the fd as soon as
> >it'll become available.
> >
> Here's where your argument is inconsistent with the Linux philosophy.
>
> Linux has a strong philosophy of practicality. The goal of Linux is to
> do useful things, including provide applications with the semantics they
> need to do useful things. The criteria for deciding what goes into
> Linux is heavily weighted towards what works best in practice.
>
> Whether or not some API matches someone's Platonic ideal of of an OS
> interface is not a criterion. In Linux, APIs are judged by their
> practical merits. This is why Linux does not have such things as
> message passing and separate address spaces for drivers.
>
> So whether or not a proposed set of epoll semantics is consistent with
> your Platonic ideal of "use the fd until EAGAIN" is simply not an issue.
> What matters is what works best in practice.

Luckily enough, being the only one that wasted my time in those couple of
days arguing against the API semantic, you pretty much down in the list of
people that are able to decide what "works best in practice".

- Davide

2002-10-19 06:53:09

by Mark Mielke

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

On Fri, Oct 18, 2002 at 05:55:21PM -0700, John Myers wrote:
> So whether or not a proposed set of epoll semantics is consistent with
> your Platonic ideal of "use the fd until EAGAIN" is simply not an issue.
> What matters is what works best in practice.

>From this side of the fence: One vote for "use the fd until EAGAIN" being
flawed. If I wanted a method of monopolizing the event loop with real time
priorities, I would implement real time priorities within the event loop.

mark

--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

2002-10-19 17:11:59

by Davide Libenzi

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

On Sat, 19 Oct 2002, Mark Mielke wrote:

> On Fri, Oct 18, 2002 at 05:55:21PM -0700, John Myers wrote:
> > So whether or not a proposed set of epoll semantics is consistent with
> > your Platonic ideal of "use the fd until EAGAIN" is simply not an issue.
> > What matters is what works best in practice.
>
> >From this side of the fence: One vote for "use the fd until EAGAIN" being
> flawed. If I wanted a method of monopolizing the event loop with real time
> priorities, I would implement real time priorities within the event loop.

You don't need to "use the fd until EAGAIN", you can consume even only
byte out of 10000 and stop using the fd. As long as you keep such fd in
your ready-list. As soon as you receive an EAGAIN from that fd, you remove
it from your ready-list and the next time you'll go to fish for events it
will reemerge as soon as it'll have something for you. The concept is very
simple, "you don't have to go waiting for events for a given fd before
having consumed its I/O space".

- Davide

2002-10-19 18:46:48

by John Myers

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

On Friday, October 18, 2002, at 06:27 PM, Tervel Atanassov wrote:
> The code you have
> below seems a bit awkward -- the line while(do_io(fd) != EAGAIN)
> appears
> twice. I think the reason for that is that you're trying to do too
> many
> things at once, namely, you're trying to handle both the initial
> accept/setup of the socket and its steady state servicing. I don't see
> any benefit to that -- it definitely doesn't make for cleaner code.
> Why
> not do things separately.

If you carefully reread the message you replied to, you will see that
this is exactly what I am proposing. The redundant copy of the line
you consider awkward would be removed.

2002-10-22 19:29:13

by John Myers

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

Dan Kegel wrote:

> The choice I see is between:
> 1. re-arming the one-shot notification when the user gets EAGAIN
> 2. re-arming the one-shot notification when the user reads all the data
> that was waiting (such that the very next read would return EGAIN).
>
> #1 is what Davide wants; I think John and Mark are arguing for #2.

No, this is not what I'm arguing. Once an event arrives for a fd, my
proposed semantics are no different than Mr. Libenzi's. The only
difference is what happens upon registration of interest for a fd. With
my semantics, the kernel guarantees that if the fd is ready then at
least one event has been generated. With Mr Libenzi's semantics, there
is no such guarantee and the application is required to behave as if an
event had been generated upon registration.

Attachments:

smime.p7s (3.62 kB)
S/MIME Cryptographic Signature

2002-10-22 19:51:39

by Davide Libenzi

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

On Tue, 22 Oct 2002, John Gardiner Myers wrote:

>
>
> Dan Kegel wrote:
>
> > The choice I see is between:
> > 1. re-arming the one-shot notification when the user gets EAGAIN
> > 2. re-arming the one-shot notification when the user reads all the data
> > that was waiting (such that the very next read would return EGAIN).
> >
> > #1 is what Davide wants; I think John and Mark are arguing for #2.
>
> No, this is not what I'm arguing. Once an event arrives for a fd, my
> proposed semantics are no different than Mr. Libenzi's. The only
> difference is what happens upon registration of interest for a fd. With
> my semantics, the kernel guarantees that if the fd is ready then at
> least one event has been generated. With Mr Libenzi's semantics, there
> is no such guarantee and the application is required to behave as if an
> event had been generated upon registration.

sed s/Mr. Libenzi/Davide/g ... I'm not that old :)
There're a couple of reason's why the drop of the initial event is a waste
of time :

1) The I/O write space is completely available at fd creation
2) For sockets it's very likely that the first packet brought something
more than the SYN == The I/O read space might have something for you

I strongly believe that the concept "use the fd until EAGAIN" should be
applied even at creation time, w/out making exceptions to what is the
API's rule to follow.

- Davide

2002-10-22 21:49:40

by Erich Nahum

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

Davide Libenzi writes:
> On Tue, 22 Oct 2002, John Gardiner Myers wrote:
>
> > > 1. re-arming the one-shot notification when the user gets EAGAIN
> > > 2. re-arming the one-shot notification when the user reads all the data
> > > that was waiting (such that the very next read would return EGAIN).
> > >
> > > #1 is what Davide wants; I think John and Mark are arguing for #2.
> >
> > No, this is not what I'm arguing. Once an event arrives for a fd, my
> > proposed semantics are no different than Mr. Libenzi's. The only
> > difference is what happens upon registration of interest for a fd. With
> > my semantics, the kernel guarantees that if the fd is ready then at
> > least one event has been generated. With Mr Libenzi's semantics, there
> > is no such guarantee and the application is required to behave as if an
> > event had been generated upon registration.
>
> There're a couple of reason's why the drop of the initial event is a waste
> of time :
>
> 1) The I/O write space is completely available at fd creation
> 2) For sockets it's very likely that the first packet brought something
> more than the SYN == The I/O read space might have something for you
>
> I strongly believe that the concept "use the fd until EAGAIN" should be
> applied even at creation time, w/out making exceptions to what is the
> API's rule to follow.

There is a third way, described in the original Banga/Mogul/Druschel
paper, available via Dan Kegel's web site: extend the accept() call to
return whether an event has already happened on that FD. That way you
can service a ready FD without reading /dev/epoll or calling
sigtimedwait, and you don't have to waste a read() call on the socket
only to find out you got EAGAIN.

Of course, this changes the accept API, which is another matter. But
if we're talking a new API then there's no problem.

-Erich

2002-10-22 22:10:41

by Davide Libenzi

[permalink] [raw]

Subject: Re: epoll (was Re: [PATCH] async poll for 2.5)

On Tue, 22 Oct 2002, Erich Nahum wrote:

> There is a third way, described in the original Banga/Mogul/Druschel
> paper, available via Dan Kegel's web site: extend the accept() call to
> return whether an event has already happened on that FD. That way you
> can service a ready FD without reading /dev/epoll or calling
> sigtimedwait, and you don't have to waste a read() call on the socket
> only to find out you got EAGAIN.
>
> Of course, this changes the accept API, which is another matter. But
> if we're talking a new API then there's no problem.

Why differentiate between connect and accept. At that point you should
also handle connect as a particular case, that's the point. And that's why
I like the API's rule to be consistent and I would not like to put inside
the kernel source code explicit event dispatch inside accept/connect.

- Davide