2019-02-07 15:47:26

by Slavomir Kaslev

[permalink] [raw]
Subject: [RFC PATCH] fs: Make splice() and tee() take into account O_NONBLOCK flag on pipes

The current implementation of splice() and tee() ignores O_NONBLOCK set on pipe
file descriptors and checks only the SPLICE_F_NONBLOCK flag for blocking on pipe
arguments. This is inconsistent since splice()-ing from/to non-pipe file
descriptors does take O_NONBLOCK into consideration.

Fix this by promoting O_NONBLOCK, when set on a pipe, to SPLICE_F_NONBLOCK.

Some context for how the current implementation of splice() leads to
inconsistent behavior. In the ongoing work[1] to add VM tracing capability to
trace-cmd we stream tracing data over named FIFOs or vsockets from guests back
to the host.

When we receive SIGINT from user to stop tracing, we set O_NONBLOCK on the input
file descriptor and set SPLICE_F_NONBLOCK for the next call to splice(). If
splice() was blocked waiting on data from the input FIFO, after SIGINT splice()
restarts with the same arguments (no SPLICE_F_NONBLOCK) and blocks again instead
of returning -EAGAIN when no data is available.

This differs from the splice() behavior when reading from a vsocket or when
we're doing a traditional read()/write() loop (trace-cmd's --nosplice argument).

With this patch applied we get the same behavior in all situations after setting
O_NONBLOCK which also matches the behavior of doing a read()/write() loop
instead of splice().

This change does have potential of breaking users who don't expect EAGAIN from
splice() when SPLICE_F_NONBLOCK is not set. OTOH programs that set O_NONBLOCK
and don't anticipate EAGAIN are arguably buggy[2].

[1] https://github.com/skaslev/trace-cmd/tree/vsock
[2] https://github.com/torvalds/linux/blob/d47e3da1759230e394096fd742aad423c291ba48/fs/read_write.c#L1425

Signed-off-by: Slavomir Kaslev <[email protected]>
---
fs/splice.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/fs/splice.c b/fs/splice.c
index de2ede048473..6a1761b74f8d 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1123,6 +1123,9 @@ static long do_splice(struct file *in, loff_t __user *off_in,
if (ipipe == opipe)
return -EINVAL;

+ if ((in->f_flags | out->f_flags) & O_NONBLOCK)
+ flags |= SPLICE_F_NONBLOCK;
+
return splice_pipe_to_pipe(ipipe, opipe, len, flags);
}

@@ -1148,6 +1151,9 @@ static long do_splice(struct file *in, loff_t __user *off_in,
if (unlikely(ret < 0))
return ret;

+ if (in->f_flags & O_NONBLOCK)
+ flags |= SPLICE_F_NONBLOCK;
+
file_start_write(out);
ret = do_splice_from(ipipe, out, &offset, len, flags);
file_end_write(out);
@@ -1172,6 +1178,9 @@ static long do_splice(struct file *in, loff_t __user *off_in,
offset = in->f_pos;
}

+ if (out->f_flags & O_NONBLOCK)
+ flags |= SPLICE_F_NONBLOCK;
+
pipe_lock(opipe);
ret = wait_for_space(opipe, flags);
if (!ret)
@@ -1717,6 +1726,9 @@ static long do_tee(struct file *in, struct file *out, size_t len,
* copying the data.
*/
if (ipipe && opipe && ipipe != opipe) {
+ if ((in->f_flags | out->f_flags) & O_NONBLOCK)
+ flags |= SPLICE_F_NONBLOCK;
+
/*
* Keep going, unless we encounter an error. The ipipe/opipe
* ordering doesn't really matter.
--
2.19.1



2019-02-13 15:22:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: Make splice() and tee() take into account O_NONBLOCK flag on pipes

On Thu, 7 Feb 2019 17:45:19 +0200
Slavomir Kaslev <[email protected]> wrote:

> The current implementation of splice() and tee() ignores O_NONBLOCK set on pipe
> file descriptors and checks only the SPLICE_F_NONBLOCK flag for blocking on pipe
> arguments. This is inconsistent since splice()-ing from/to non-pipe file
> descriptors does take O_NONBLOCK into consideration.
>
> Fix this by promoting O_NONBLOCK, when set on a pipe, to SPLICE_F_NONBLOCK.
>
> Some context for how the current implementation of splice() leads to
> inconsistent behavior. In the ongoing work[1] to add VM tracing capability to
> trace-cmd we stream tracing data over named FIFOs or vsockets from guests back
> to the host.
>
> When we receive SIGINT from user to stop tracing, we set O_NONBLOCK on the input
> file descriptor and set SPLICE_F_NONBLOCK for the next call to splice(). If
> splice() was blocked waiting on data from the input FIFO, after SIGINT splice()
> restarts with the same arguments (no SPLICE_F_NONBLOCK) and blocks again instead
> of returning -EAGAIN when no data is available.
>
> This differs from the splice() behavior when reading from a vsocket or when
> we're doing a traditional read()/write() loop (trace-cmd's --nosplice argument).
>
> With this patch applied we get the same behavior in all situations after setting
> O_NONBLOCK which also matches the behavior of doing a read()/write() loop
> instead of splice().
>
> This change does have potential of breaking users who don't expect EAGAIN from
> splice() when SPLICE_F_NONBLOCK is not set. OTOH programs that set O_NONBLOCK
> and don't anticipate EAGAIN are arguably buggy[2].

Reviewed-by: Steven Rostedt (VMware) <[email protected]>

Does anyone have any issues or comments about this patch?

Thanks!

-- Steve

>
> [1] https://github.com/skaslev/trace-cmd/tree/vsock
> [2] https://github.com/torvalds/linux/blob/d47e3da1759230e394096fd742aad423c291ba48/fs/read_write.c#L1425
>
> Signed-off-by: Slavomir Kaslev <[email protected]>
> ---
> fs/splice.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/fs/splice.c b/fs/splice.c
> index de2ede048473..6a1761b74f8d 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -1123,6 +1123,9 @@ static long do_splice(struct file *in, loff_t __user *off_in,
> if (ipipe == opipe)
> return -EINVAL;
>
> + if ((in->f_flags | out->f_flags) & O_NONBLOCK)
> + flags |= SPLICE_F_NONBLOCK;
> +
> return splice_pipe_to_pipe(ipipe, opipe, len, flags);
> }
>
> @@ -1148,6 +1151,9 @@ static long do_splice(struct file *in, loff_t __user *off_in,
> if (unlikely(ret < 0))
> return ret;
>
> + if (in->f_flags & O_NONBLOCK)
> + flags |= SPLICE_F_NONBLOCK;
> +
> file_start_write(out);
> ret = do_splice_from(ipipe, out, &offset, len, flags);
> file_end_write(out);
> @@ -1172,6 +1178,9 @@ static long do_splice(struct file *in, loff_t __user *off_in,
> offset = in->f_pos;
> }
>
> + if (out->f_flags & O_NONBLOCK)
> + flags |= SPLICE_F_NONBLOCK;
> +
> pipe_lock(opipe);
> ret = wait_for_space(opipe, flags);
> if (!ret)
> @@ -1717,6 +1726,9 @@ static long do_tee(struct file *in, struct file *out, size_t len,
> * copying the data.
> */
> if (ipipe && opipe && ipipe != opipe) {
> + if ((in->f_flags | out->f_flags) & O_NONBLOCK)
> + flags |= SPLICE_F_NONBLOCK;
> +
> /*
> * Keep going, unless we encounter an error. The ipipe/opipe
> * ordering doesn't really matter.


2019-02-19 17:13:23

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: Make splice() and tee() take into account O_NONBLOCK flag on pipes

On Wed, 13 Feb 2019 09:44:45 -0500
Steven Rostedt <[email protected]> wrote:

> Reviewed-by: Steven Rostedt (VMware) <[email protected]>
>
> Does anyone have any issues or comments about this patch?

I'm going to start pinging people once a week, looking for comments on
this patch ;-)

-- Steve

2019-03-04 16:16:39

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: Make splice() and tee() take into account O_NONBLOCK flag on pipes

On Tue, 19 Feb 2019 12:12:12 -0500
Steven Rostedt <[email protected]> wrote:

> On Wed, 13 Feb 2019 09:44:45 -0500
> Steven Rostedt <[email protected]> wrote:
>
> > Reviewed-by: Steven Rostedt (VMware) <[email protected]>
> >
> > Does anyone have any issues or comments about this patch?
>
> I'm going to start pinging people once a week, looking for comments on
> this patch ;-)
>

I'm still just hearing crickets about this? Should I just send this
directly to Linus or Andrew?

-- Steve

2019-03-05 00:05:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: Make splice() and tee() take into account O_NONBLOCK flag on pipes

On Mon, Mar 4, 2019 at 7:58 AM Steven Rostedt <[email protected]> wrote:
>
> I'm still just hearing crickets about this? Should I just send this
> directly to Linus or Andrew?

I'll take it. It looks sane.

Of course, if it turns out that this breaks something that assumes
that splice blocks purely based on the SPLICE_F_NONBLOCK flag, we'll
have to revert it. Looking at the history of splice, it does look like
it has always ignored O_NONBLOCK.

Linus

2019-03-05 00:11:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: Make splice() and tee() take into account O_NONBLOCK flag on pipes

On Mon, Mar 4, 2019 at 4:04 PM Linus Torvalds
<[email protected]> wrote:
>
> Of course, if it turns out that this breaks something that assumes
> that splice blocks purely based on the SPLICE_F_NONBLOCK flag, we'll
> have to revert it. Looking at the history of splice, it does look like
> it has always ignored O_NONBLOCK.

Note that the "arguably buggy" argument is not an actual real argument
in the presence of regressions. It's more a "I wish reality wasn't
that way" argument, but it doesn't actually _change_ reality.

Of course, in the case of sendfile() (which is where that comment is),
I don't think we ever really even tested it either way.

Linus

2019-03-05 01:56:14

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC PATCH] fs: Make splice() and tee() take into account O_NONBLOCK flag on pipes


[ Replying again, but without Cc'ing Trump due to a bug in claws-mail ]

On Mon, 4 Mar 2019 16:09:22 -0800
Linus Torvalds <[email protected]> wrote:
>
> Note that the "arguably buggy" argument is not an actual real argument

Note that "arguably buggy" just means that you can argue that it is
buggy. It doesn't actually mean that it *is* buggy ;-)

-- Steve