2023-06-23 22:57:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] pipe: Make a partially-satisfied blocking read wait for more

On Fri, 23 Jun 2023 at 15:34, David Howells <[email protected]> wrote:
>
> Can you consider merging something like the attached patch? Unfortunately,
> there are applications out there that depend on a read from pipe() waiting
> until the buffer is full under some circumstances. Patch a28c8b9db8a1
> removed the conditionality on there being an attached writer.

This patch seems actively wrong, in that now it's possibly waiting for
data that won't come, even if it's nonblocking.

What are these alleged broken apps? That commit a28c8b9db8a1 ("pipe:
remove 'waiting_writers' merging logic") is 3+ years old, and we
haven't heard people complain about it.

POSIX guarantees PIPE_BUF data, but that's 4kB. Your made-up test-case
is not valid, and never has been.

Yes, we used to do that write merging for performance reasons to avoid
extra system calls. And yes, we'll maintain semantics if people
actually end up having broken apps that depend on them, but I want to
know *what* broken app depends on this before I re-instate the write
merging.

And if we do re-instate it, I'm afraid we will have to do so with that
whole "waiting_writers" logic, so that we don't have the "reader waits
for data that might not come".

Because that patch of yours seems really broken. Nobody has *ever*
said "a read() on a pipe will always satisfy the whole buffer". That's
just completely bogus.

So let's name and shame the code that actually depended on it. And
maybe we'll have to revert commit a28c8b9db8a1, but after three+ years
of nobody reporting it I'm not really super-convinced.

Linus


2023-06-23 23:26:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] pipe: Make a partially-satisfied blocking read wait for more

On Fri, 23 Jun 2023 at 15:41, Linus Torvalds
<[email protected]> wrote:
>
> This patch seems actively wrong, in that now it's possibly waiting for
> data that won't come, even if it's nonblocking.

In fact, I'd expect that patch to fail immediately on a perfectly
normal program that passes a token around by doing a small write to a
pipe, and have the "token reader" do a bigger write.

Blocking on read(), waiting for more data, would be blocking forever.
The read already got the token, there isn't going to be anything else.

So I'm pretty sure that patch is completely wrong, and whatever
program is "fixed" by it is very very buggy.

Again - we do have the rule that regressions are regressions even for
buggy user space, but when it's been 3+ years and you don't even
mention the broken app, I am not impressed.

Linus

2023-06-23 23:53:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] pipe: Make a partially-satisfied blocking read wait for more

On Fri, 23 Jun 2023 at 16:08, Linus Torvalds
<[email protected]> wrote:
>
> In fact, I'd expect that patch to fail immediately on a perfectly
> normal program that passes a token around by doing a small write to a
> pipe, and have the "token reader" do a bigger write.

Bigger _read_, of course.

This might be hidden by such programs typically doing a single byte
write and a single byte read, but I could easily imagine situations
where people actually depend on the POSIX atomicity guarantees, ie you
write a "token packet" that might be variable-sized, and the reader
then just does a maximally sized read, knowing that it will get a full
packet or nothing.

So a read() of a pipe absolutely has to return when it has gotten
*any* data. Except if it can know that there is a writer that is still
busy and still in the process of writing more data.

Which was that old 'pipe->waiting_writers' logic - it basically
counted "there are <N> active writers that still have more data to
write, but the buffer filled up".

That logic went back to ancient times, when our pipe buffer was just a
single page - so it helped throughput immensely if we had writers that
did big writes, and readers would continue to read even when the small
buffer was completely used up (rather than return data just one page
at a time for each read() system call).

Linus

2023-06-26 09:44:06

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] pipe: Make a partially-satisfied blocking read wait for more

From: Linus Torvalds
> Sent: 24 June 2023 00:32
>
> On Fri, 23 Jun 2023 at 16:08, Linus Torvalds
> <[email protected]> wrote:
> >
> > In fact, I'd expect that patch to fail immediately on a perfectly
> > normal program that passes a token around by doing a small write to a
> > pipe, and have the "token reader" do a bigger write.
>
> Bigger _read_, of course.
>
> This might be hidden by such programs typically doing a single byte
> write and a single byte read, but I could easily imagine situations
> where people actually depend on the POSIX atomicity guarantees, ie you
> write a "token packet" that might be variable-sized, and the reader
> then just does a maximally sized read, knowing that it will get a full
> packet or nothing.

There are definitely programs that just do a large read in order
to consume all the single byte 'wakeup' writes.

(The 'must check' on these reads is a right PITA.)

They ought to set the pipe non-blocking, but I suspect many
don't - because it all works anyway.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2023-06-26 10:08:02

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] pipe: Make a partially-satisfied blocking read wait for more

From: Linus Torvalds
> Sent: 23 June 2023 23:42
>
> On Fri, 23 Jun 2023 at 15:34, David Howells <[email protected]> wrote:
> >
> > Can you consider merging something like the attached patch? Unfortunately,
> > there are applications out there that depend on a read from pipe() waiting
> > until the buffer is full under some circumstances. Patch a28c8b9db8a1
> > removed the conditionality on there being an attached writer.
>
> This patch seems actively wrong, in that now it's possibly waiting for
> data that won't come, even if it's nonblocking.

I think it pretty much breaks:
command | tee file
where 'command' is careful to fflush(stdout).

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)