2005-02-28 04:26:07

by Andrea Arcangeli

[permalink] [raw]
Subject: two pipe bugfixes

Hello,

This testcase from Thomas Crhak:

#include <unistd.h>
#include <sys/select.h>

int
main(int argv, char **argc) {
int inout[2];
fd_set readfds, writefds, exceptfds;
struct timeval timeout;

timeout.tv_sec = 0;
timeout.tv_usec = 0;

FD_ZERO(&readfds);
FD_ZERO(&writefds);
FD_ZERO(&exceptfds);

pipe(inout);
write(inout[1], "qqqqqqqq", 5);

FD_SET(inout[1], &readfds);
FD_SET(inout[1], &writefds);
FD_SET(inout[1], &exceptfds);

select(inout[1] + 1, &readfds, &writefds, &exceptfds, &timeout);

close(inout[0]);

write(inout[1], "qqqqqqqq", 5);
return 0;
}

was returning this in 2.6.9:

pipe([3, 4]) = 0
write(4, "qqqqq", 5) = 5
select(5, [4], [4], [4], {0, 0}) = 1 (in [4], left {0, 0})
close(3) = 0
write(4, "qqqqq", 5) = -1 EPIPE (Broken pipe)

and it started to return this since 2.6.11-rc:

pipe([3, 4]) = 0
write(4, "qqqqq", 5) = 5
select(5, [4], [4], [4], {0, 0}) = 2 (in [4], out [4], left {0,
0})
close(3) = 0
write(4, "qqqqq", 5) = 5

There are two separate bugs: the first is that it doesn't return
-EPIPE because the input side has been closed already, the second is
that it returns POLLIN|POLLOUT. I heard POLLIN|POLLOUT is sometime
detected as a special case that means the input side has disconnected.
But anyway POLLIN|POLLOUT at the same time seems wrong to me (at least I
didn't find anything in SUS specs mentioning this magic for pipes) and
2.6.11-rc definitely breaks python-twisted (which apparently understand
the magic I've heard). But clearly the change in 2.6.11-rc that sets
POLLOUT if not all buffers are full makes plenty of sense for
performance reasons.

IMHO the really wrong thing is that we always set POLLIN (even for
output filedescriptors that will never allow any data to be read).

So now I check if the file is open in read or write mode, and I return
POLLIN or POLLOUT and never both of them at the same time. I've no idea
if this is the correct semantics that all applications expects, but
since we use the same pipe_poll callback for all read/write/rdwr cases,
I believe this is the most correct one and it certainly looks better
than the 2.6.11-rc code that breaks twisted, and this new logic still
allows the optimizations in the 2.6.11-rc to work.

Current output of strace with this patch applied is this:

pipe([3, 4]) = 0
write(4, "qqqqq", 5) = 5
select(5, [4], [4], [4], {0, 0}) = 1 (out [4], left {0, 0})
close(3) = 0
write(4, "qqqqq", 5) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
+++ killed by SIGPIPE +++

which is different from 2.6.9 but it makes a lot more sense to me for a
"write-only" fd IMHO, and I had no pratical problems with this patch
yet (not that I've run any big stress test though, just normal misc
usage with all sort of desktop apps and twisted). Comments welcome thanks!

Patch is against 2.6.11-rc5.

Signed-off-by: Andrea Arcangeli <[email protected]>

--- xx/fs/pipe.c.~1~ 2005-02-28 00:43:42.000000000 +0100
+++ xx/fs/pipe.c 2005-02-28 04:47:26.000000000 +0100
@@ -235,6 +235,12 @@ pipe_writev(struct file *filp, const str
down(PIPE_SEM(*inode));
info = inode->i_pipe;

+ if (!PIPE_READERS(*inode)) {
+ send_sig(SIGPIPE, current, 0);
+ ret = -EPIPE;
+ goto out;
+ }
+
/* We try to merge small writes */
if (info->nrbufs && total_len < PAGE_SIZE) {
int lastbuf = (info->curbuf + info->nrbufs - 1) & (PIPE_BUFFERS-1);
@@ -398,8 +404,15 @@ pipe_poll(struct file *filp, poll_table

/* Reading only -- no need for acquiring the semaphore. */
nrbufs = info->nrbufs;
- mask = (nrbufs > 0) ? POLLIN | POLLRDNORM : 0;
- mask |= (nrbufs < PIPE_BUFFERS) ? POLLOUT | POLLWRNORM : 0;
+ mask = 0;
+ /*
+ * Returning POLLIN|POLLOUT for a output channel has special semantics
+ * and it's used by some app to detect when the input has been closed.
+ */
+ if (filp->f_mode & FMODE_READ && nrbufs > 0)
+ mask |= POLLIN | POLLRDNORM;
+ if (filp->f_mode & FMODE_WRITE && nrbufs < PIPE_BUFFERS)
+ mask |= POLLOUT | POLLWRNORM;

if (!PIPE_WRITERS(*inode) && filp->f_version != PIPE_WCOUNTER(*inode))
mask |= POLLHUP;

PS. this still can return POLLIN|POLLOUT if somebody opens a fifo in RDWRITE
mode, but I guess that's ok and the above beahviour still should make sense
for such a corner case usage too.


2005-02-28 16:24:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: two pipe bugfixes



On Mon, 28 Feb 2005, Andrea Arcangeli wrote:
>
> and it started to return this since 2.6.11-rc:
>
> pipe([3, 4]) = 0
> write(4, "qqqqq", 5) = 5
> select(5, [4], [4], [4], {0, 0}) = 2 (in [4], out [4], left {0,
> 0})
> close(3) = 0
> write(4, "qqqqq", 5) = 5

Ahh, yes. The write merging code doesn't check for no readers.

> IMHO the really wrong thing is that we always set POLLIN (even for
> output filedescriptors that will never allow any data to be read).

However, that has always been true. Look at the old code: it would set
POLLIN for a non-empty pipe for both readers and writers (and do POLLOUT
for empty pipes both for readers and writers). In fact, your very own
original strace shows that - it shows "in [4]" even though fd 4 is a
write-only fd.

The new code does nothing really different. POLLIN is still there for a
non-empty pipe, just like it was before. It's just that when you have
multiple buffers, POLLOUT can _also_ be true, since even if you have
_some_ data in the pipe, you can still do a write of a full PIPE_BUF.

So the difference is not at all the one you're talking about, and the
"bug" you claim to fix was there before too.

The fact is that if this broke python-twisted, then it just happened to
work before by mistake. And python-twisted is just plain bogus.

That said, I agree with the fact that it's probably not the right thing to
do, and never was. And if fixing it makes a difference to python-twisted,
then hey, that's a benefit, but not a reason for the patch.

I don't agree with your patch, though - I don't like your lack of
parenthesis ;)

Linus

2005-02-28 19:08:13

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: two pipe bugfixes

> > IMHO the really wrong thing is that we always set POLLIN (even for
> > output filedescriptors that will never allow any data to be read).
>
On Mon, Feb 28, 2005 at 08:25:07AM -0800, Linus Torvalds wrote:
> However, that has always been true. Look at the old code: it would set
> POLLIN for a non-empty pipe for both readers and writers (and do POLLOUT
> for empty pipes both for readers and writers). In fact, your very own
> original strace shows that - it shows "in [4]" even though fd 4 is a
> write-only fd.

Sure, that has always been true, I also wanted to say it wasn't a
mistake of the new code, but just a mistake of the old code that has
seen the light thanks to the recent optimizations.

> The new code does nothing really different. POLLIN is still there for a
> non-empty pipe, just like it was before. It's just that when you have
> multiple buffers, POLLOUT can _also_ be true, since even if you have
> _some_ data in the pipe, you can still do a write of a full PIPE_BUF.
>
> So the difference is not at all the one you're talking about, and the
> "bug" you claim to fix was there before too.
>
> The fact is that if this broke python-twisted, then it just happened to
> work before by mistake. [..]

Yes of course.

> [..] And python-twisted is just plain bogus.

What do you mean with this, could you elaborate? You mean it shouldn't
check for in/out set at the same time? I've no idea why it got confused
by out/in set at the same time, but I guess it could be some
compatibility thing with some other os.

Still my point is that such code should never trigger since pollin
should never be set for an output-pipe-fd.

> That said, I agree with the fact that it's probably not the right thing to
> do, and never was. And if fixing it makes a difference to python-twisted,
> then hey, that's a benefit, but not a reason for the patch.

Sure, I had no idea myself if it was going to work with python-twisted,
because I changed the behaviour compared to the 2.6.9 codebase, but I
tested it and it worked fine as well as the "old 2.6.9" behaviour.

I didn't write the patch to make python-twisted work but only to do
something that would remotely resemble the sus specs and it happened to
fix twisted as well in my testing.

> I don't agree with your patch, though - I don't like your lack of
> parenthesis ;)

;)

2005-02-28 19:22:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: two pipe bugfixes



On Mon, 28 Feb 2005, Andrea Arcangeli wrote:
>
> > [..] And python-twisted is just plain bogus.
>
> What do you mean with this, could you elaborate? You mean it shouldn't
> check for in/out set at the same time? I've no idea why it got confused
> by out/in set at the same time, but I guess it could be some
> compatibility thing with some other os.

I wonder. It migth just be a latent bug in python-twisted, rather than any
"designed behaviour".

> Still my point is that such code should never trigger since pollin
> should never be set for an output-pipe-fd.

Equally arguably, POLLERR should _always_ be set if you select a
write-only pipe for reading, and guess what? That would cause "select()"
to return readable. It's true too: select returns whether a read() would
return immediately, and it _would_ - with an error code.

The basic fact is that an application that asks whether the pipe that it
opened for writing is readable is doing something stupid, and the old
behaviour was at most surprising, but I'd argue it isn't really
necessarily a _bug_.

Of course, "surprising" is bad, even if it's not necessarily a bug. So
making the return value be "unsurprising" can in any case be considered an
improvement.

> > I don't agree with your patch, though - I don't like your lack of
> > parenthesis ;)
>
> ;)

I ended up editing it a bit more: the other bits (POLLHUP and POLLERR)
also really only make sense for only one side of the reader/writer
schenario, so logically they should be grouped the same way.

Of course, in those cases, you can't get the "wrong" answer anyway, since
those only trigger if there are no readers or no writers (and if you're
open as a reader, that in itself obviously guarantees that there _are_
readers, and POLLERR cannot happen according to either the old or the new
rules).

Anyway, I think I made the code look more logical while there.

Linus

2005-02-28 19:56:11

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: two pipe bugfixes

On Mon, Feb 28, 2005 at 11:22:18AM -0800, Linus Torvalds wrote:
> I wonder. It migth just be a latent bug in python-twisted, rather than any
> "designed behaviour".

Twisted is doing this for the process writer doRead operation:

def doRead(self):
"""The only way this pipe can become readable is at EOF, because the
child has closed it.
"""
fd = self.fd
r, w, x = select.select([fd], [fd], [], 0)
if r and w:
return CONNECTION_LOST

This apparently means it consider the connection lost when
POLLIN|POLLOUT are set at the same time (which will never happen anymore
with my patch and that was happening all the time with the new
optimizations). It could happen with 2.6.8 too infact, if the write
buffer was empty.

But it should never try to read a writeable fd as you said, so I'm not
so convinced that what broke twisted is really related to the above code
or something else, perhaps the above is an hack for some other OS.

The reactor never listens to writeable fds of course:

mask = 0
if reads.has_key(fd): mask = mask | select.POLLIN
if writes.has_key(fd): mask = mask | select.POLLOUT
if mask != 0:
poller.register(fd, mask)
else:
if selectables.has_key(fd): del selectables[fd]

So if there's a breakage, that could be just the process protocol, and
not everything else (the process protocol is a protocol implementation
based on popen but it's asynchronous with poll and it works the same as
the networking protocols that uses sockets).

I will ask to the proper lists, this is getting offtopic for l-k.

> Equally arguably, POLLERR should _always_ be set if you select a
> write-only pipe for reading, and guess what? That would cause "select()"
> to return readable. It's true too: select returns whether a read() would
> return immediately, and it _would_ - with an error code.
>
> The basic fact is that an application that asks whether the pipe that it
> opened for writing is readable is doing something stupid, and the old
> behaviour was at most surprising, but I'd argue it isn't really
> necessarily a _bug_.

My point is that if we're allowed to return "undefined" then we'd better
return -EINVAL instead ;)

> Of course, "surprising" is bad, even if it's not necessarily a bug. So
> making the return value be "unsurprising" can in any case be considered an
> improvement.

Agreed.

> I ended up editing it a bit more: the other bits (POLLHUP and POLLERR)
> also really only make sense for only one side of the reader/writer
> schenario, so logically they should be grouped the same way.
>
> Of course, in those cases, you can't get the "wrong" answer anyway, since
> those only trigger if there are no readers or no writers (and if you're
> open as a reader, that in itself obviously guarantees that there _are_
> readers, and POLLERR cannot happen according to either the old or the new
> rules).
>
> Anyway, I think I made the code look more logical while there.

Thanks, I'll check it in the next bk snapshot.

2005-02-28 20:28:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: two pipe bugfixes



On Mon, 28 Feb 2005, Andrea Arcangeli wrote:
>
> Thanks, I'll check it in the next bk snapshot.

Here's my version of the poll changes. The EPIPE one is just your original
first patch hunk (with a properly updated commit message).

Linus

-----

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
# 2005/02/28 08:36:14-08:00 [email protected]
# Make pipe "poll()" take direction of pipe into account.
#
# The pipe code has traditionally not cared about which end-point of the
# pipe you are polling, meaning that if you poll the write-only end of a
# pipe, it will still set the "this pipe is readable" bits if there is
# data to be read on the read side.
#
# That makes no sense, and together with the new bigger buffers breaks
# python-twisted.
#
# Based on debugging/patch by Andrea Arcangeli and testcase from
# Thomas Crhak
#
# fs/pipe.c
# 2005/02/28 08:36:06-08:00 [email protected] +11 -6
# Make pipe "poll()" take direction of pipe into account.
#
diff -Nru a/fs/pipe.c b/fs/pipe.c
--- a/fs/pipe.c 2005-02-28 12:28:35 -08:00
+++ b/fs/pipe.c 2005-02-28 12:28:35 -08:00
@@ -398,13 +398,18 @@

/* Reading only -- no need for acquiring the semaphore. */
nrbufs = info->nrbufs;
- mask = (nrbufs > 0) ? POLLIN | POLLRDNORM : 0;
- mask |= (nrbufs < PIPE_BUFFERS) ? POLLOUT | POLLWRNORM : 0;
+ mask = 0;
+ if (filp->f_mode & FMODE_READ) {
+ mask = (nrbufs > 0) ? POLLIN | POLLRDNORM : 0;
+ if (!PIPE_WRITERS(*inode) && filp->f_version != PIPE_WCOUNTER(*inode))
+ mask |= POLLHUP;
+ }

- if (!PIPE_WRITERS(*inode) && filp->f_version != PIPE_WCOUNTER(*inode))
- mask |= POLLHUP;
- if (!PIPE_READERS(*inode))
- mask |= POLLERR;
+ if (filp->f_mode & FMODE_WRITE) {
+ mask |= (nrbufs < PIPE_BUFFERS) ? POLLOUT | POLLWRNORM : 0;
+ if (!PIPE_READERS(*inode))
+ mask |= POLLERR;
+ }

return mask;
}