2023-12-14 18:46:14

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: [PATCH RERESEND 10/11] splice: file->pipe: -EINVAL for non-regular files w/o FMODE_NOWAIT

We request non-blocking I/O in the generic implementation, but some
files ‒ ttys ‒ only check O_NONBLOCK. Refuse them here, lest we
risk sleeping with the pipe locked for indeterminate lengths of
time.

This also masks inconsistent wake-ups (usually every second line)
when splicing from ttys in icanon mode.

Regular files don't /have/ a distinct O_NONBLOCK mode,
because they always behave non-blockingly, and for them FMODE_NOWAIT is
used in the purest sense of
/* File is capable of returning -EAGAIN if I/O will block */
which is not set by the vast majority of filesystems,
and it's not the semantic we want here.

Link: https://lore.kernel.org/linux-fsdevel/qk6hjuam54khlaikf2ssom6custxf5is2ekkaequf4hvode3ls@zgf7j5j4ubvw/t/#u
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
---
fs/splice.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/splice.c b/fs/splice.c
index 9d29664f23ee..81788bf7daa1 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1300,6 +1300,8 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out,
} else if (opipe) {
if (off_out)
return -ESPIPE;
+ if (!((in->f_mode & FMODE_NOWAIT) || S_ISREG(in->f_inode->i_mode)))
+ return -EINVAL;
if (off_in) {
if (!(in->f_mode & FMODE_PREAD))
return -EINVAL;
--
2.39.2


Attachments:
(No filename) (1.31 kB)
signature.asc (849.00 B)
Download all attachments

2023-12-15 15:47:33

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH RERESEND 10/11] splice: file->pipe: -EINVAL for non-regular files w/o FMODE_NOWAIT

On 12/14/23 11:45 AM, Ahelenia Ziemia?ska wrote:
> We request non-blocking I/O in the generic implementation, but some
> files ? ttys ? only check O_NONBLOCK. Refuse them here, lest we
> risk sleeping with the pipe locked for indeterminate lengths of
> time.

A worthy goal here is ensuring that _everybody_ honors IOCB_NOWAIT,
rather than just rely on O_NONBLOCK. This does involve converting to
->read_iter/->write_iter if the driver isn't already using it, but some
of them already have that, yet don't check IOCB_NOWAIT or treat it the
same as O_NONBLOCK.

Adding special checks like this is not a good idea, imho.

> This also masks inconsistent wake-ups (usually every second line)
> when splicing from ttys in icanon mode.
>
> Regular files don't /have/ a distinct O_NONBLOCK mode,
> because they always behave non-blockingly, and for them FMODE_NOWAIT is
> used in the purest sense of
> /* File is capable of returning -EAGAIN if I/O will block */
> which is not set by the vast majority of filesystems,
> and it's not the semantic we want here.

The main file systems do very much set it, like btrfs, ext4, and xfs. If
you look at total_file_systems / ones_flagging_it the ratio may be high,
but in terms of installed userbase, the majority definitely will have
it. Also see comment on cover letter for addressing this IOCB_NOWAIT
confusion.

--
Jens Axboe


2023-12-16 05:36:27

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: Re: [PATCH RERESEND 10/11] splice: file->pipe: -EINVAL for non-regular files w/o FMODE_NOWAIT

On Fri, Dec 15, 2023 at 08:47:15AM -0700, Jens Axboe wrote:
> On 12/14/23 11:45 AM, Ahelenia Ziemiańska wrote:
> > We request non-blocking I/O in the generic implementation, but some
> > files ‒ ttys ‒ only check O_NONBLOCK. Refuse them here, lest we
> > risk sleeping with the pipe locked for indeterminate lengths of
> > time.
> A worthy goal here is ensuring that _everybody_ honors IOCB_NOWAIT,
> rather than just rely on O_NONBLOCK. This does involve converting to
> ->read_iter/->write_iter if the driver isn't already using it, but some
> of them already have that, yet don't check IOCB_NOWAIT or treat it the
> same as O_NONBLOCK.
This doesn't really mean much to me, sorry.

> Adding special checks like this is not a good idea, imho.
That's what Linus said I should do so that's what I did
https://lore.kernel.org/linux-fsdevel/CAHk-=wimmqG_wvSRtMiKPeGGDL816n65u=Mq2+H3-=uM2U6FmA@mail.gmail.com/

I can't fix the tty layer :/

> > This also masks inconsistent wake-ups (usually every second line)
> > when splicing from ttys in icanon mode.
> >
> > Regular files don't /have/ a distinct O_NONBLOCK mode,
> > because they always behave non-blockingly, and for them FMODE_NOWAIT is
> > used in the purest sense of
> > /* File is capable of returning -EAGAIN if I/O will block */
> > which is not set by the vast majority of filesystems,
> > and it's not the semantic we want here.
>
> The main file systems do very much set it, like btrfs, ext4, and xfs. If
> you look at total_file_systems / ones_flagging_it the ratio may be high,
> but in terms of installed userbase, the majority definitely will have
> it. Also see comment on cover letter for addressing this IOCB_NOWAIT
> confusion.
Reassessing
[1] https://lore.kernel.org/linux-fsdevel/5osglsw36dla3mubtpsmdwdid4fsdacplyd6acx2igo4atogdg@yur3idyim3cc/
I see FMODE_NOWAIT in
blockdevs
/dev/{null,zero,random,urandom}
btrfs/ext4/f2fs/ocfs2/xfs
eventfd
pipes
sockets
tun/tap
which means that vfat/fuse/nfs/tmpfs/ramfs/procfs/sysfs don't.
(zfs also doesn't, but that's not for this list.)

I don't know if that's actually a "majority" in a meaningful sense,
I agree, but I think I primarily committed to this exclusion because
tmpfs/ramfs didn't.

I s'pose ramfs can already be tagged since it already returns
-EAGAIN when I/O would block (never).

tmpfs not being spliceable is also questionable.

But this'd also mean effectively deleting
afs_file_splice_read
ceph_splice_read
coda_file_splice_read
ecryptfs_splice_read_update_atime
fuse_dev_splice_read
nfs_file_splice_read
orangefs_file_splice_read
shmem_file_splice_read
v9fs_file_splice_read
(not to mention the many others (adfs/affs/bfs/bcachefs/cramfs/erofs/fat/hfs*/hostfs/hpfs/jffs2/jfs/minix/nilfs/ntfs/omfs/reiserfs/isofs/sysv/ubifs/udf/ufs/vboxsf/squashfs/romfs)
which just use the filemap impl verbatim).

There's no point to restricting splice access on a per-filesystem level
(which this'd do), since to mount a malicious network filesystem you
need to be root.

A denial of service attack makes no sense if you're already root.

(Maybe except for fuse, which people typically run suid;
that I could see potentially making sense to disable..)


I have indeed managed to confuse myself into the NOWAIT hole,
but this is actually about
"not letting unprivileged users escalate into
hanging system daemons by writing to a pipe"
rather than
"if we ever hold the pipe lock for >2µs we die instantly".

O_NONBLOCK filtered by FMODE_NOWAIT is used as a semantic proxy for
the 10 different types of files anyone can create that we know are safe.

Anyone can open a socket and not write to it, so we must refuse to
splice from a socket with no data in it.
But only root can mount filesystems, so a regular file is always safe.

And, actually defining a slightly-heuristic per-file policy in the syscall
itself is stupid, you've talked me out of this.
This check only actually applies to the generic copy_splice_read()
implementation, since the "real"/non-generic splices
(fiemap_splice_read/per-filesystem ‒
all the others that this patchset touches)
are already known to be safe
(and aren't reads so FMODE_NOWAIT doesn't factor in at all).

I've dropped this patch and have instead added this to 01/11:
diff --git a/fs/splice.c b/fs/splice.c
index f8bfc9cf8cdc..6d369d7d56d5 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -331,0 +332,7 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos,
+ /*
+ * This generic implementation is only safe w.r.t. the pipe lock
+ * if the file actually respects IOCB_NOWAIT, which ttys don't.
+ */
+ if (!(in->f_mode & FMODE_NOWAIT))
+ return -EINVAL;

(Indeed, in many ways, Linus' post to which I reply in [1] pretty much
says this explcitly. Actually he literally says this. I just don't
realise and instead of adding the snippet to copy_splice_read(),
which he already diffed and talks about, I copied it to the syscall.)

Now I just need to re-consider the prose in a way
that avoids this deeply embarrassing IOCB_NOWAIT/regular-file nonsense,
and this series ends up being just "fixing splice implementations"
without also special-casing the syscall itself.

Thanks for asking the right questions.
Sorry for longposting.


Attachments:
(No filename) (5.26 kB)
signature.asc (849.00 B)
Download all attachments