2023-12-21 03:09:09

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion

Hi!

As it stands, splice(file -> pipe):
1. locks the pipe,
2. does a read from the file,
3. unlocks the pipe.

When the file resides on a normal filesystem, this isn't an issue
because the filesystem has been defined as trusted by root having
mounted it.

But when the file is actually IPC (FUSE) or is just IPC (sockets)
or is a tty, this means that the pipe lock will be held for an
attacker-controlled length of time, and during that time every
process trying to read from, write to, open, or close the pipe
enters an uninterruptible sleep, and will only exit it if the
splicing process is killed.

This trivially denies service to:
* any hypothetical pipe-based log collexion system
* all nullmailer installations
* me, personally, when I'm pasting stuff into qemu -serial chardev:pipe

A symmetric situation happens for splicing(pipe -> socket):
the pipe is locked for as long as the socket is full.

This follows:
1. https://lore.kernel.org/linux-fsdevel/qk6hjuam54khlaikf2ssom6custxf5is2ekkaequf4hvode3ls@zgf7j5j4ubvw/t/#u
2. a security@ thread rooted in
<irrrblivicfc7o3lfq7yjm2lrxq35iyya4gyozlohw24gdzyg7@azmluufpdfvu>
3. https://nabijaczleweli.xyz/content/blogn_t/011-linux-splice-exclusion.html
4. https://lore.kernel.org/lkml/[email protected]/t/#u (v1)
https://lore.kernel.org/lkml/[email protected]/t/#u (resend)
https://lore.kernel.org/lkml/[email protected]/t/#u (reresend)
5. https://lore.kernel.org/lkml/dtexwpw6zcdx7dkx3xj5gyjp5syxmyretdcbcdtvrnukd4vvuh@tarta.nabijaczleweli.xyz/t/#u
(relay_file_splice_read removal)

1-7/11 request MSG_DONTWAIT (sockets)/IOCB_NOWAIT (generic) on the read

8/11 removes splice_read from tty completely

9/11 removes splice_read from FUSE filesystems
(except virtiofs which has normal mounting security semantics,
but is handled via FUSE code)

10/11 allows splice_read from FUSE filesystems mounted by real root
(this matches the blessing received by non-FUSE network filesystems)

11/11 requests MSG_DONTWAIT for splice(pipe -> socket).

12/11 has the man-pages patch with draft wording.

All but 5/11 (AF_SMC) have been tested and embed shell programs to
repro them. AIUI I'd need an s390 machine for it? It's trivial.

6/11 (AF_KCM) also fixes kcm_splice_read() passing SPLICE_F_*-style
flags to skb_recv_datagram(), which takes MSG_*-style flags. I don't
think they did anything anyway? But.

There are two implementations that definitely sleep all the time
and I didn't touch them:
tracing_splice_read_pipe
tracing_buffers_splice_read (dropped in v2, v1 4/11)
the semantics are lost on me, but they're in debugfs/tracefs, so
it doesn't matter if they block so long as they work, and presumably
they do right now.

There is also relay_file_splice_read (dropped in v2, v1 5/11),
which isn't an implementation at all because it's dead code, broken,
and removed in -mm.

The diffs in 1-7,11/11 are unchanged, save for a rebase in 7/11.
8/11 replaces the file type test in v1 10/11.
9/11 and 10/11 are new in v2.

Ahelenia Ziemiańska (11):
splice: copy_splice_read: do the I/O with IOCB_NOWAIT
af_unix: unix_stream_splice_read: always request MSG_DONTWAIT
fuse: fuse_dev_splice_read: use nonblocking I/O
net/smc: smc_splice_read: always request MSG_DONTWAIT
kcm: kcm_splice_read: always request MSG_DONTWAIT
tls/sw: tls_sw_splice_read: always request non-blocking I/O
net/tcp: tcp_splice_read: always do non-blocking reads
tty: splice_read: disable
fuse: file: limit splice_read to virtiofs
fuse: allow splicing from filesystems mounted by real root
splice: splice_to_socket: always request MSG_DONTWAIT

drivers/tty/tty_io.c | 2 --
fs/fuse/dev.c | 10 ++++++----
fs/fuse/file.c | 17 ++++++++++++++++-
fs/fuse/fuse_i.h | 4 ++++
fs/fuse/inode.c | 2 ++
fs/fuse/virtio_fs.c | 1 +
fs/splice.c | 5 ++---
net/ipv4/tcp.c | 32 +++-----------------------------
net/kcm/kcmsock.c | 2 +-
net/smc/af_smc.c | 6 +-----
net/tls/tls_sw.c | 5 ++---
net/unix/af_unix.c | 5 +----
12 files changed, 39 insertions(+), 52 deletions(-)

base-commit: 2cf4f94d8e8646803f8fb0facf134b0cd7fb691a
--
2.39.2


Attachments:
(No filename) (4.29 kB)
signature.asc (849.00 B)
Download all attachments

2023-12-21 03:09:20

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: [PATCH v2 01/11] splice: copy_splice_read: do the I/O with IOCB_NOWAIT

Otherwise we risk sleeping with the pipe locked for indeterminate
lengths of time ‒ given:
cat > udp.c <<^D
#define _GNU_SOURCE
#include <fcntl.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/udp.h>
int main()
{
int s = socket(AF_INET, SOCK_DGRAM, 0);
bind(s,
&(struct sockaddr_in){ .sin_family = AF_INET,
.sin_addr.s_addr = htonl(INADDR_ANY) },
sizeof(struct sockaddr_in));
for (;;)
splice(s, 0, 1, 0, 128 * 1024 * 1024, 0);
}
^D
cc udp.c -o udp
mkfifo fifo
./udp > fifo &
read -r _ < fifo &
sleep 0.1
echo zupa > fifo
udp used to sleep in splice and the shell used to enter an
uninterruptible sleep in open("fifo");
now the splice returns -EAGAIN and the whole program completes.

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
---
fs/splice.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/splice.c b/fs/splice.c
index d983d375ff11..9d29664f23ee 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -361,6 +361,7 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos,
iov_iter_bvec(&to, ITER_DEST, bv, npages, len);
init_sync_kiocb(&kiocb, in);
kiocb.ki_pos = *ppos;
+ kiocb.ki_flags |= IOCB_NOWAIT;
ret = call_read_iter(in, &kiocb, &to);

if (ret > 0) {
--
2.39.2


Attachments:
(No filename) (1.30 kB)
signature.asc (849.00 B)
Download all attachments

2023-12-21 03:09:44

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: [PATCH v2 02/11] af_unix: unix_stream_splice_read: always request MSG_DONTWAIT

Otherwise we risk sleeping with the pipe locked for indeterminate
lengths of time ‒ given:
cat > unix.c <<^D
#define _GNU_SOURCE
#include <fcntl.h>
#include <sys/socket.h>
#include <sys/un.h>
int main()
{
int sp[2];
socketpair(AF_UNIX, SOCK_STREAM, 0, sp);
for (;;)
splice(sp[0], 0, 1, 0, 128 * 1024 * 1024, 0);
}
^D
cc unix.c -o unix
mkfifo fifo
./unix > fifo &
read -r _ < fifo &
sleep 0.1
echo zupa > fifo
unix used to sleep in splice and the shell used to enter an
uninterruptible sleep in open("fifo");
now the splice returns -EAGAIN and the whole program completes.

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
---
net/unix/af_unix.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index ac1f2bc18fc9..bae84552bf58 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2921,15 +2921,12 @@ static ssize_t unix_stream_splice_read(struct socket *sock, loff_t *ppos,
.pipe = pipe,
.size = size,
.splice_flags = flags,
+ .flags = MSG_DONTWAIT,
};

if (unlikely(*ppos))
return -ESPIPE;

- if (sock->file->f_flags & O_NONBLOCK ||
- flags & SPLICE_F_NONBLOCK)
- state.flags = MSG_DONTWAIT;
-
return unix_stream_read_generic(&state, false);
}

--
2.39.2


Attachments:
(No filename) (1.33 kB)
signature.asc (849.00 B)
Download all attachments

2023-12-21 03:10:05

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: [PATCH v2 03/11] fuse: fuse_dev_splice_read: use nonblocking I/O

Otherwise we risk sleeping with the pipe locked for indeterminate
lengths of time ‒ since FUSE is usually installed with the fusermount
helper suid, given
cat > fusedev.c <<^D
#define _GNU_SOURCE
#include <fcntl.h>
#define FUSE_USE_VERSION 30
#include <fuse.h>
static void *fop_init(struct fuse_conn_info *conn, struct fuse_config *cfg)
{
for (;;)
splice(3, 0, 4, 0, 128 * 1024 * 1024, 0);
}
static const struct fuse_operations fops = { .init = fop_init };
int main(int argc, char **argv)
{
return fuse_main(argc, argv, &fops, NULL);
}
^D
cc nullsleep.c $(pkg-config fuse3 --cflags --libs) -o nullsleep
mkfifo fifo
mkdir dir
./nullsleep dir 4>fifo &
read -r _ < fifo &
sleep 0.1
echo zupa > fifo
nullsleep used to sleep in splice and the shell used to enter an
uninterruptible sleep in open("fifo");
now the splice returns -EAGAIN and the whole program completes.

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
---
fs/fuse/dev.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 1a8f82f478cb..4e8caf66c01e 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1202,7 +1202,8 @@ __releases(fiq->lock)
* the 'sent' flag.
*/
static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
- struct fuse_copy_state *cs, size_t nbytes)
+ struct fuse_copy_state *cs, size_t nbytes,
+ bool nonblock)
{
ssize_t err;
struct fuse_conn *fc = fud->fc;
@@ -1238,7 +1239,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
break;
spin_unlock(&fiq->lock);

- if (file->f_flags & O_NONBLOCK)
+ if (nonblock)
return -EAGAIN;
err = wait_event_interruptible_exclusive(fiq->waitq,
!fiq->connected || request_pending(fiq));
@@ -1364,7 +1365,8 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, struct iov_iter *to)

fuse_copy_init(&cs, 1, to);

- return fuse_dev_do_read(fud, file, &cs, iov_iter_count(to));
+ return fuse_dev_do_read(fud, file, &cs, iov_iter_count(to),
+ file->f_flags & O_NONBLOCK);
}

static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos,
@@ -1388,7 +1390,7 @@ static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos,
fuse_copy_init(&cs, 1, NULL);
cs.pipebufs = bufs;
cs.pipe = pipe;
- ret = fuse_dev_do_read(fud, in, &cs, len);
+ ret = fuse_dev_do_read(fud, in, &cs, len, true);
if (ret < 0)
goto out;

--
2.39.2


Attachments:
(No filename) (2.47 kB)
signature.asc (849.00 B)
Download all attachments

2023-12-21 03:10:20

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: [PATCH v2 04/11] net/smc: smc_splice_read: always request MSG_DONTWAIT

Otherwise we risk sleeping with the pipe locked for indeterminate
lengths of time ‒ this meant that splice(smc -> pipe) with no data
would hold the pipe lock, and any open/read/write/close on the pipe
would enter uninterruptible sleep.

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
---
net/smc/af_smc.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 73eebddbbf41..a11a966d031a 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -3248,12 +3248,8 @@ static ssize_t smc_splice_read(struct socket *sock, loff_t *ppos,
rc = -ESPIPE;
goto out;
}
- if (flags & SPLICE_F_NONBLOCK)
- flags = MSG_DONTWAIT;
- else
- flags = 0;
SMC_STAT_INC(smc, splice_cnt);
- rc = smc_rx_recvmsg(smc, NULL, pipe, len, flags);
+ rc = smc_rx_recvmsg(smc, NULL, pipe, len, MSG_DONTWAIT);
}
out:
release_sock(sk);
--
2.39.2


Attachments:
(No filename) (955.00 B)
signature.asc (849.00 B)
Download all attachments

2023-12-21 03:10:40

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: [PATCH v2 05/11] kcm: kcm_splice_read: always request MSG_DONTWAIT

Otherwise we risk sleeping with the pipe locked for indeterminate
lengths of time ‒ given:
cat > kcm.c <<^D
#define _GNU_SOURCE
#include <fcntl.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <linux/kcm.h>
int main()
{
int kcm = socket(AF_KCM, SOCK_SEQPACKET, KCMPROTO_CONNECTED);
for (;;)
splice(kcm, 0, 1, 0, 128 * 1024 * 1024, 0);
}
^D
cc kcm.c -o kcm
mkfifo fifo
./kcm > fifo &
read -r _ < fifo &
sleep 0.1
echo zupa > fifo
kcm used to sleep in splice and the shell used to enter an
uninterruptible sleep in open("fifo");
now the splice returns -EAGAIN and the whole program completes.

Also: don't pass the SPLICE_F_*-style flags argument to
skb_recv_datagram(), which expects MSG_*-style flags.
This fixes SPLICE_F_NONBLOCK not having worked.

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
---
net/kcm/kcmsock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index 65d1f6755f98..ccfc46f31891 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -1028,7 +1028,7 @@ static ssize_t kcm_splice_read(struct socket *sock, loff_t *ppos,

/* Only support splice for SOCKSEQPACKET */

- skb = skb_recv_datagram(sk, flags, &err);
+ skb = skb_recv_datagram(sk, MSG_DONTWAIT, &err);
if (!skb)
goto err_out;

--
2.39.2


Attachments:
(No filename) (1.37 kB)
signature.asc (849.00 B)
Download all attachments

2023-12-21 03:10:57

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: [PATCH v2 06/11] tls/sw: tls_sw_splice_read: always request non-blocking I/O

Otherwise we risk sleeping with the pipe locked for indeterminate
lengths of time ‒ given
cat > tls_sw.c <<^D
#define _GNU_SOURCE
#include <fcntl.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <linux/tls.h>
int main()
{
int s = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in addr = {
.sin_family = AF_INET,
.sin_addr = { htonl(INADDR_LOOPBACK) },
.sin_port = htons(getpid() % (0xFFFF - 1000) + 1000)
};
bind(s, &addr, sizeof(addr));
listen(s, 1);
if (!fork()) {
connect(socket(AF_INET, SOCK_STREAM, 0), &addr, sizeof(addr));
sleep(100);
return 0;
}

s = accept(s, NULL, NULL);
setsockopt(s, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
setsockopt(s, SOL_TLS, TLS_RX,
&(struct tls12_crypto_info_aes_gcm_128){
.info.version = TLS_1_2_VERSION,
.info.cipher_type = TLS_CIPHER_AES_GCM_128 },
sizeof(struct tls12_crypto_info_aes_gcm_128));

for (;;)
splice(s, 0, 1, 0, 128 * 1024 * 1024, 0);
}
^D
cc tls_sw.c -o tls_sw
mkfifo fifo
./tls_sw > fifo &
read -r _ < fifo &
sleep 0.1
echo zupa > fifo
tls_sw used to sleep in splice and the shell used to enter an
uninterruptible sleep in open("fifo");
now the splice returns -EAGAIN and the whole program completes.

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
---
net/tls/tls_sw.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index e37b4d2e2acd..3f474deed94d 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2157,7 +2157,7 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
int chunk;
int err;

- err = tls_rx_reader_lock(sk, ctx, flags & SPLICE_F_NONBLOCK);
+ err = tls_rx_reader_lock(sk, ctx, true);
if (err < 0)
return err;

@@ -2166,8 +2166,7 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
} else {
struct tls_decrypt_arg darg;

- err = tls_rx_rec_wait(sk, NULL, flags & SPLICE_F_NONBLOCK,
- true);
+ err = tls_rx_rec_wait(sk, NULL, true, true);
if (err <= 0)
goto splice_read_end;

--
2.39.2


Attachments:
(No filename) (2.18 kB)
signature.asc (849.00 B)
Download all attachments

2023-12-21 03:11:41

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: [PATCH v2 07/11] net/tcp: tcp_splice_read: always do non-blocking reads

Otherwise we risk sleeping with the pipe locked for indeterminate
lengths of time ‒ given:
cat > tcp.c <<^D
#define _GNU_SOURCE
#include <fcntl.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <linux/tls.h>
int main()
{
int s = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in addr = {
.sin_family = AF_INET,
.sin_addr = { htonl(INADDR_LOOPBACK) },
.sin_port = htons(getpid() % (0xFFFF - 1000) + 1000)
};
bind(s, &addr, sizeof(addr));
listen(s, 1);
if (!fork()) {
connect(socket(AF_INET, SOCK_STREAM, 0), &addr, sizeof(addr));
sleep(100);
return 0;
}

s = accept(s, NULL, NULL);
for (;;)
splice(s, 0, 1, 0, 128 * 1024 * 1024, 0);
}
^D
cc tcp.c -o tcp
mkfifo fifo
./tcp > fifo &
read -r _ < fifo &
sleep 0.1
echo zupa > fifo
tcp used to sleep in splice and the shell used to enter an
uninterruptible sleep in open("fifo");
now the splice returns -EAGAIN and the whole program completes.

sock_rcvtimeo() returns 0 if the second argument is true, so the
explicit re-try loop for empty read conditions can be removed
entirely.

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
---
net/ipv4/tcp.c | 32 +++-----------------------------
1 file changed, 3 insertions(+), 29 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ff6838ca2e58..17a0e2a766b7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -782,7 +782,6 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
.len = len,
.flags = flags,
};
- long timeo;
ssize_t spliced;
int ret;

@@ -797,7 +796,6 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,

lock_sock(sk);

- timeo = sock_rcvtimeo(sk, sock->file->f_flags & O_NONBLOCK);
while (tss.len) {
ret = __tcp_splice_read(sk, &tss);
if (ret < 0)
@@ -821,37 +819,13 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
ret = -ENOTCONN;
break;
}
- if (!timeo) {
- ret = -EAGAIN;
- break;
- }
- /* if __tcp_splice_read() got nothing while we have
- * an skb in receive queue, we do not want to loop.
- * This might happen with URG data.
- */
- if (!skb_queue_empty(&sk->sk_receive_queue))
- break;
- ret = sk_wait_data(sk, &timeo, NULL);
- if (ret < 0)
- break;
- if (signal_pending(current)) {
- ret = sock_intr_errno(timeo);
- break;
- }
- continue;
+ ret = -EAGAIN;
+ break;
}
tss.len -= ret;
spliced += ret;

- if (!tss.len || !timeo)
- break;
- release_sock(sk);
- lock_sock(sk);
-
- if (sk->sk_err || sk->sk_state == TCP_CLOSE ||
- (sk->sk_shutdown & RCV_SHUTDOWN) ||
- signal_pending(current))
- break;
+ break;
}

release_sock(sk);
--
2.39.2


Attachments:
(No filename) (2.78 kB)
signature.asc (849.00 B)
Download all attachments

2023-12-21 03:12:04

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: [PATCH v2 08/11] tty: splice_read: disable

We request non-blocking I/O in the generic copy_splice_read, but
"the tty layer doesn't actually honor the IOCB_NOWAIT flag for
various historical reasons.". This means that a tty->pipe splice
will happily sleep with the pipe locked forever, and any process
trying to take it (due to an open/read/write/&c.) will enter
uninterruptible sleep.

This also masks inconsistent wake-ups (usually every second line)
when splicing from ttys in icanon mode.

Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wimmqG_wvSRtMiKPeGGDL816n65u=Mq2+H3-=uM2U6FmA@mail.gmail.com/
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
---
drivers/tty/tty_io.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 06414e43e0b5..50c2957a9c7f 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -465,7 +465,6 @@ static const struct file_operations tty_fops = {
.llseek = no_llseek,
.read_iter = tty_read,
.write_iter = tty_write,
- .splice_read = copy_splice_read,
.splice_write = iter_file_splice_write,
.poll = tty_poll,
.unlocked_ioctl = tty_ioctl,
@@ -480,7 +479,6 @@ static const struct file_operations console_fops = {
.llseek = no_llseek,
.read_iter = tty_read,
.write_iter = redirected_tty_write,
- .splice_read = copy_splice_read,
.splice_write = iter_file_splice_write,
.poll = tty_poll,
.unlocked_ioctl = tty_ioctl,
--
2.39.2


Attachments:
(No filename) (1.43 kB)
signature.asc (849.00 B)
Download all attachments

2023-12-21 03:12:06

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: [PATCH v2 09/11] fuse: file: limit splice_read to virtiofs

Potentially-blocking splice_reads are allowed for normal filesystems
like NFS because they're blessed by root.

FUSE is commonly used suid-root, and allows anyone to trivially create
a file that, when spliced from, will just sleep forever with the pipe
lock held.

The only way IPC to the fusing process could be avoided is if
!(ff->open_flags & FOPEN_DIRECT_IO) and the range was already cached
and we weren't past the end. Just refuse it.

virtiofs behaves like a normal filesystem and can only be mounted
by root, it's unaffected by use of a new "trusted" connection flag.
This may be extended to include real FUSE mounts by processes which
aren't suid, to match the semantics for normal filesystems.

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
---
fs/fuse/file.c | 17 ++++++++++++++++-
fs/fuse/fuse_i.h | 3 +++
fs/fuse/virtio_fs.c | 1 +
3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index a660f1f21540..20bb16ddfcc9 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3200,6 +3200,21 @@ static ssize_t fuse_copy_file_range(struct file *src_file, loff_t src_off,
return ret;
}

+static long fuse_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags)
+{
+ struct inode *inode = file_inode(in);
+
+ if (fuse_is_bad(inode))
+ return -EIO;
+
+ if (get_fuse_conn(inode)->trusted)
+ return filemap_splice_read(in, ppos, pipe, len, flags);
+
+ return -EINVAL;
+}
+
static const struct file_operations fuse_file_operations = {
.llseek = fuse_file_llseek,
.read_iter = fuse_file_read_iter,
@@ -3212,7 +3227,7 @@ static const struct file_operations fuse_file_operations = {
.lock = fuse_file_lock,
.get_unmapped_area = thp_get_unmapped_area,
.flock = fuse_file_flock,
- .splice_read = filemap_splice_read,
+ .splice_read = fuse_splice_read,
.splice_write = iter_file_splice_write,
.unlocked_ioctl = fuse_file_ioctl,
.compat_ioctl = fuse_file_compat_ioctl,
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 1df83eebda92..463c5d4ad8b4 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -818,6 +818,9 @@ struct fuse_conn {
/* Is statx not implemented by fs? */
unsigned int no_statx:1;

+ /* Do we trust this connection to always respond? */
+ bool trusted:1;
+
/** The number of requests waiting for completion */
atomic_t num_waiting;

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 5f1be1da92ce..fce0fe24899a 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -1448,6 +1448,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc)
fc->delete_stale = true;
fc->auto_submounts = true;
fc->sync_fs = true;
+ fc->trusted = true;

/* Tell FUSE to split requests that exceed the virtqueue's size */
fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit,
--
2.39.2


Attachments:
(No filename) (2.92 kB)
signature.asc (849.00 B)
Download all attachments

2023-12-21 03:12:37

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: [PATCH v2 10/11] fuse: allow splicing from filesystems mounted by real root

FUSE tends to be installed suid 0: this allows normal users to mount
anything, including a program whose read implementation consists
of for(;;) sleep(1);, which, if splice were allowed, would sleep
forever with the pipe lock held.

Normal filesystems can only be mounted by root, and are thus deemed
safe. Extend this to when root mounts a FUSE filesystem with an
explicit check.

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
---
fs/fuse/fuse_i.h | 1 +
fs/fuse/inode.c | 2 ++
2 files changed, 3 insertions(+)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 463c5d4ad8b4..a9ceaf10c1d2 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -532,6 +532,7 @@ struct fuse_fs_context {
bool no_control:1;
bool no_force_umount:1;
bool legacy_opts_show:1;
+ bool trusted:1;
enum fuse_dax_mode dax_mode;
unsigned int max_read;
unsigned int blksize;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 2a6d44f91729..91108ba9acec 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1779,6 +1779,7 @@ static int fuse_get_tree(struct fs_context *fsc)

fuse_conn_init(fc, fm, fsc->user_ns, &fuse_dev_fiq_ops, NULL);
fc->release = fuse_free_conn;
+ fc->trusted = ctx->trusted;

fsc->s_fs_info = fm;

@@ -1840,6 +1841,7 @@ static int fuse_init_fs_context(struct fs_context *fsc)
ctx->max_read = ~0;
ctx->blksize = FUSE_DEFAULT_BLKSIZE;
ctx->legacy_opts_show = true;
+ ctx->trusted = uid_eq(current_uid(), GLOBAL_ROOT_UID);

#ifdef CONFIG_BLOCK
if (fsc->fs_type == &fuseblk_fs_type) {
--
2.39.2


Attachments:
(No filename) (1.57 kB)
signature.asc (849.00 B)
Download all attachments

2023-12-21 03:12:51

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: [PATCH v2 11/11] splice: splice_to_socket: always request MSG_DONTWAIT

The pipe is locked at the top of the function, so sock_sendmsg
sleeps for space with the pipe lock held ‒ given:
cat > to_socket.c <<^D
#define _GNU_SOURCE
#include <fcntl.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>
int main()
{
int sp[2];
socketpair(AF_UNIX, SOCK_STREAM | SOCK_NONBLOCK, 0, sp);
while(write(sp[1], sp, 1) == 1)
;
fcntl(sp[1], F_SETFL, 0);
for (;;)
splice(0, 0, sp[1], 0, 128 * 1024 * 1024, 0);
}
^D
cc to_socket.c -o to_socket
mkfifo fifo
sleep 10 > fifo &
./to_socket < fifo &
echo zupa > fifo
to_socket used to sleep in splice and the shell used to enter an
uninterruptible sleep in closing the fifo in dup2(10, 1);
now the splice returns -EAGAIN and the whole program completes.

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
---
fs/splice.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 9d29664f23ee..2871c6f9366f 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -869,13 +869,11 @@ ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
if (!bc)
break;

- msg.msg_flags = MSG_SPLICE_PAGES;
+ msg.msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT;
if (flags & SPLICE_F_MORE)
msg.msg_flags |= MSG_MORE;
if (remain && pipe_occupancy(pipe->head, tail) > 0)
msg.msg_flags |= MSG_MORE;
- if (out->f_flags & O_NONBLOCK)
- msg.msg_flags |= MSG_DONTWAIT;

iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, bvec, bc,
len - remain);
--
2.39.2


Attachments:
(No filename) (1.55 kB)
signature.asc (849.00 B)
Download all attachments

2023-12-21 03:13:00

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: [PATCH v2 12/11 man-pages] splice.2: document 6.8 blocking behaviour

Hypothetical text that matches v2.

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
---
man2/splice.2 | 47 +++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 43 insertions(+), 4 deletions(-)

diff --git a/man2/splice.2 b/man2/splice.2
index e5d05a05c..d2c7ac8d5 100644
--- a/man2/splice.2
+++ b/man2/splice.2
@@ -139,10 +139,11 @@ .SH ERRORS
.B EAGAIN
.B SPLICE_F_NONBLOCK
was specified in
-.I flags
-or one of the file descriptors had been marked as nonblocking
-.RB ( O_NONBLOCK ) ,
-and the operation would block.
+.IR flags ,
+one of the file descriptors had been marked as nonblocking
+.RB ( O_NONBLOCK )
+and the operation would block,
+or splicing from an untrusted IPC mechanism and no data was available (see HISTORY below).
.TP
.B EBADF
One or both file descriptors are not valid,
@@ -192,6 +193,44 @@ .SH HISTORY
Since Linux 2.6.31,
.\" commit 7c77f0b3f9208c339a4b40737bb2cb0f0319bb8d
both arguments may refer to pipes.
+.P
+Between Linux 4.9 and 6.7,
+.\" commit 8924feff66f35fe22ce77aafe3f21eb8e5cff881
+splicing from a non-pipe to a pipe without
+.B SPLICE_F_NONBLOCK
+would hold the pipe lock and wait for data on the non-pipe.
+This isn't an issue for files, but if the non-pipe is a tty,
+or an IPC mechanism like a socket or a
+.BR fuse (4)
+filesystem, this means that a thread attempting any operation (like
+.BR open (2)/ read (2)/ write (2)/ close (2))
+on the pipe would enter uninterruptible sleep until data appeared,
+which may never happen.
+The same applies to splicing from a pipe to a full socket.
+.P
+Since Linux 6.8,
+.\" commit TBD
+splicing from ttys is disabled
+.RB ( EINVAL ),
+reads done when splicing from sockets happen in non-blocking mode
+(as-if
+.BR MSG_DONTWAIT ,
+returning
+.B EAGAIN
+if no data is available),
+and splicing from
+.BR fuse (4)
+filesystems is only allowed if they were mounted by
+root in the initial user namespace
+(this matches security semantics for normal filesystems).
+If a splice implementation is devised that doesn't need to lock the pipe
+while waiting for data, this may be reversed in a future version.
+Writes when splicing to sockets are also done non-blockingly
+(as-if
+.BR MSG_DONTWAIT ,
+returning
+.B EAGAIN
+if the socket is full).
.SH NOTES
The three system calls
.BR splice (),
--
2.39.2


Attachments:
(No filename) (2.35 kB)
signature.asc (849.00 B)
Download all attachments

2023-12-21 08:11:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 08/11] tty: splice_read: disable

On Thu, Dec 21, 2023 at 04:09:10AM +0100, Ahelenia Ziemiańska wrote:
> We request non-blocking I/O in the generic copy_splice_read, but
> "the tty layer doesn't actually honor the IOCB_NOWAIT flag for
> various historical reasons.". This means that a tty->pipe splice
> will happily sleep with the pipe locked forever, and any process
> trying to take it (due to an open/read/write/&c.) will enter
> uninterruptible sleep.
>
> This also masks inconsistent wake-ups (usually every second line)
> when splicing from ttys in icanon mode.
>
> Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wimmqG_wvSRtMiKPeGGDL816n65u=Mq2+H3-=uM2U6FmA@mail.gmail.com/
> Signed-off-by: Ahelenia Ziemiańska <[email protected]>
> ---

Acked-by: Greg Kroah-Hartman <[email protected]>

2023-12-21 08:27:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 01/11] splice: copy_splice_read: do the I/O with IOCB_NOWAIT

On Thu, Dec 21, 2023 at 04:08:45AM +0100, Ahelenia Ziemiańska wrote:
> Otherwise we risk sleeping with the pipe locked for indeterminate

You can't just assume that any ->read_iter support IOCB_NOWAIT.


2023-12-21 16:32:13

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: Re: [PATCH v2 01/11] splice: copy_splice_read: do the I/O with IOCB_NOWAIT

On Thu, Dec 21, 2023 at 12:27:12AM -0800, Christoph Hellwig wrote:
> On Thu, Dec 21, 2023 at 04:08:45AM +0100, Ahelenia Ziemiańska wrote:
> > Otherwise we risk sleeping with the pipe locked for indeterminate
> You can't just assume that any ->read_iter support IOCB_NOWAIT.
Let's see.

zero_fops drivers/char/mem.c: .splice_read = copy_splice_read,
full_fops drivers/char/mem.c: .splice_read = copy_splice_read,

random_fops drivers/char/random.c: .splice_read = copy_splice_read,
random_read_iter checks
urandom_fops drivers/char/random.c: .splice_read = copy_splice_read,
urandom_read_iter returns instantly

if ((in->f_flags & O_DIRECT) || IS_DAX(in->f_mapping->host))
fs/splice.c: return copy_splice_read(in, ppos, pipe, len, flags);
FMODE_CAN_ODIRECT is set by filesystems and blockdevs, so trusted

fs/9p/vfs_file.c: return copy_splice_read(in, ppos, pipe, len, flags);
fs/ceph/file.c: return copy_splice_read(in, ppos, pipe, len, flags);
fs/ceph/file.c: return copy_splice_read(in, ppos, pipe, len, flags);
fs/gfs2/file.c: .splice_read = copy_splice_read,
fs/gfs2/file.c: .splice_read = copy_splice_read,
fs/kernfs/file.c: .splice_read = copy_splice_read,
fs/smb/client/cifsfs.c: .splice_read = copy_splice_read,
fs/smb/client/cifsfs.c: .splice_read = copy_splice_read,
fs/proc/inode.c: .splice_read = copy_splice_read,
fs/proc/inode.c: .splice_read = copy_splice_read,
fs/proc/proc_sysctl.c: .splice_read = copy_splice_read,
fs/proc_namespace.c: .splice_read = copy_splice_read,
fs/proc_namespace.c: .splice_read = copy_splice_read,
fs/proc_namespace.c: .splice_read = copy_splice_read,
filesystems => trusted

tracing_fops
kernel/trace/trace.c: .splice_read = copy_splice_read,
used in /sys/kernel/debug/tracing/per_cpu/cpu*/trace
and /sys/kernel/debug/tracing/trace
which are seq_read_iter and even if they did block,
it's in tracefs so same logic as tracing_buffers_splice_read applies.

net/socket.c: return copy_splice_read(file, ppos, pipe, len, flags);
this is the default implementation for protocols without explicit
splice_reads, and sock_read_iter translates IOCB_NOWAIT into
MSG_DONTAIT.


So I think I can, because the ~three implementations that we want
to constrain do support it. If anything, this hints to me that to
yield a more consistent API that doesn't arbitrarily distinguish
between O_DIRECT files with and without IOCB_NOWAIT support, something
to the effect of the following diff may be used.

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 11cd8d23f6f2..dc42837ee0af 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -392,7 +392,7 @@ static ssize_t v9fs_file_splice_read(struct file *in, loff_t *ppos,
fid->fid, len, *ppos);

if (fid->mode & P9L_DIRECT)
- return copy_splice_read(in, ppos, pipe, len, flags);
+ return copy_splice_read_sleepok(in, ppos, pipe, len, flags);
return filemap_splice_read(in, ppos, pipe, len, flags);
}

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 3b5aae29e944..9a4679013135 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -2209,7 +2209,7 @@ static ssize_t ceph_splice_read(struct file *in, loff_t *ppos,

if (ceph_has_inline_data(ci) ||
(fi->flags & CEPH_F_SYNC))
- return copy_splice_read(in, ppos, pipe, len, flags);
+ return copy_splice_read_sleepok(in, ppos, pipe, len, flags);

ceph_start_io_read(inode);

@@ -2228,7 +2228,7 @@ static ssize_t ceph_splice_read(struct file *in, loff_t *ppos,

ceph_put_cap_refs(ci, got);
ceph_end_io_read(inode);
- return copy_splice_read(in, ppos, pipe, len, flags);
+ return copy_splice_read_sleepok(in, ppos, pipe, len, flags);
}

dout("splice_read %p %llx.%llx %llu~%zu got cap refs on %s\n",
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 4b66efc1a82a..5b0cbb6b95c4 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1581,7 +1581,7 @@ const struct file_operations gfs2_file_fops = {
.fsync = gfs2_fsync,
.lock = gfs2_lock,
.flock = gfs2_flock,
- .splice_read = copy_splice_read,
+ .splice_read = copy_splice_read_sleepok,
.splice_write = gfs2_file_splice_write,
.setlease = simple_nosetlease,
.fallocate = gfs2_fallocate,
@@ -1612,7 +1612,7 @@ const struct file_operations gfs2_file_fops_nolock = {
.open = gfs2_open,
.release = gfs2_release,
.fsync = gfs2_fsync,
- .splice_read = copy_splice_read,
+ .splice_read = copy_splice_read_sleepok,
.splice_write = gfs2_file_splice_write,
.setlease = generic_setlease,
.fallocate = gfs2_fallocate,
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index f0cb729e9a97..f0b6e85b2c5b 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -989,7 +989,7 @@ const struct file_operations kernfs_file_fops = {
.release = kernfs_fop_release,
.poll = kernfs_fop_poll,
.fsync = noop_fsync,
- .splice_read = copy_splice_read,
+ .splice_read = copy_splice_read_sleepok,
.splice_write = iter_file_splice_write,
};

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index b33e490e3fd9..7ec2f4653299 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -588,7 +588,7 @@ static const struct file_operations proc_iter_file_ops = {
.llseek = proc_reg_llseek,
.read_iter = proc_reg_read_iter,
.write = proc_reg_write,
- .splice_read = copy_splice_read,
+ .splice_read = copy_splice_read_sleepok,
.poll = proc_reg_poll,
.unlocked_ioctl = proc_reg_unlocked_ioctl,
.mmap = proc_reg_mmap,
@@ -614,7 +614,7 @@ static const struct file_operations proc_reg_file_ops_compat = {
static const struct file_operations proc_iter_file_ops_compat = {
.llseek = proc_reg_llseek,
.read_iter = proc_reg_read_iter,
- .splice_read = copy_splice_read,
+ .splice_read = copy_splice_read_sleepok,
.write = proc_reg_write,
.poll = proc_reg_poll,
.unlocked_ioctl = proc_reg_unlocked_ioctl,
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 8064ea76f80b..11d26fd14e7d 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -864,7 +864,7 @@ static const struct file_operations proc_sys_file_operations = {
.poll = proc_sys_poll,
.read_iter = proc_sys_read,
.write_iter = proc_sys_write,
- .splice_read = copy_splice_read,
+ .splice_read = copy_splice_read_sleepok,
.splice_write = iter_file_splice_write,
.llseek = default_llseek,
};
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 250eb5bf7b52..e9d19a856dd7 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -324,7 +324,7 @@ static int mountstats_open(struct inode *inode, struct file *file)
const struct file_operations proc_mounts_operations = {
.open = mounts_open,
.read_iter = seq_read_iter,
- .splice_read = copy_splice_read,
+ .splice_read = copy_splice_read_sleepok,
.llseek = seq_lseek,
.release = mounts_release,
.poll = mounts_poll,
@@ -333,7 +333,7 @@ const struct file_operations proc_mounts_operations = {
const struct file_operations proc_mountinfo_operations = {
.open = mountinfo_open,
.read_iter = seq_read_iter,
- .splice_read = copy_splice_read,
+ .splice_read = copy_splice_read_sleepok,
.llseek = seq_lseek,
.release = mounts_release,
.poll = mounts_poll,
@@ -342,7 +342,7 @@ const struct file_operations proc_mountinfo_operations = {
const struct file_operations proc_mountstats_operations = {
.open = mountstats_open,
.read_iter = seq_read_iter,
- .splice_read = copy_splice_read,
+ .splice_read = copy_splice_read_sleepok,
.llseek = seq_lseek,
.release = mounts_release,
};
diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 2131638f26d0..5f9fb3ce3bcb 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -1561,7 +1561,7 @@ const struct file_operations cifs_file_direct_ops = {
.fsync = cifs_fsync,
.flush = cifs_flush,
.mmap = cifs_file_mmap,
- .splice_read = copy_splice_read,
+ .splice_read = copy_splice_read_sleepok,
.splice_write = iter_file_splice_write,
.unlocked_ioctl = cifs_ioctl,
.copy_file_range = cifs_copy_file_range,
@@ -1615,7 +1615,7 @@ const struct file_operations cifs_file_direct_nobrl_ops = {
.fsync = cifs_fsync,
.flush = cifs_flush,
.mmap = cifs_file_mmap,
- .splice_read = copy_splice_read,
+ .splice_read = copy_splice_read_sleepok,
.splice_write = iter_file_splice_write,
.unlocked_ioctl = cifs_ioctl,
.copy_file_range = cifs_copy_file_range,
diff --git a/fs/splice.c b/fs/splice.c
index 2871c6f9366f..90ebcf236c05 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -298,12 +298,14 @@ void splice_shrink_spd(struct splice_pipe_desc *spd)
}

/**
- * copy_splice_read - Copy data from a file and splice the copy into a pipe
+ * __copy_splice_read - Copy data from a file and splice the copy into a pipe
* @in: The file to read from
* @ppos: Pointer to the file position to read from
* @pipe: The pipe to splice into
* @len: The amount to splice
* @flags: The SPLICE_F_* flags
+ * @sleepok: Set if splicing from a trusted filesystem,
+ * don't set if splicing from an IPC mechanism
*
* This function allocates a bunch of pages sufficient to hold the requested
* amount of data (but limited by the remaining pipe capacity), passes it to
@@ -317,10 +319,11 @@ void splice_shrink_spd(struct splice_pipe_desc *spd)
* if the pipe has insufficient space, we reach the end of the data or we hit a
* hole.
*/
-ssize_t copy_splice_read(struct file *in, loff_t *ppos,
- struct pipe_inode_info *pipe,
- size_t len, unsigned int flags)
+static ssize_t __copy_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe,
+ size_t len, unsigned int flags, bool sleepok)
{
+ printk("__copy_splice_read(%d)\n", sleepok);
struct iov_iter to;
struct bio_vec *bv;
struct kiocb kiocb;
@@ -361,7 +364,8 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos,
iov_iter_bvec(&to, ITER_DEST, bv, npages, len);
init_sync_kiocb(&kiocb, in);
kiocb.ki_pos = *ppos;
- kiocb.ki_flags |= IOCB_NOWAIT;
+ if (!sleepok)
+ kiocb.ki_flags |= IOCB_NOWAIT;
ret = call_read_iter(in, &kiocb, &to);

if (ret > 0) {
@@ -399,8 +403,21 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos,
kfree(bv);
return ret;
}
+
+ssize_t copy_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe,
+ size_t len, unsigned int flags) {
+ return __copy_splice_read(in, ppos, pipe, len, flags, false);
+}
EXPORT_SYMBOL(copy_splice_read);

+ssize_t copy_splice_read_sleepok(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe,
+ size_t len, unsigned int flags) {
+ return __copy_splice_read(in, ppos, pipe, len, flags, true);
+}
+EXPORT_SYMBOL(copy_splice_read_sleepok);
+
const struct pipe_buf_operations default_pipe_buf_ops = {
.release = generic_pipe_buf_release,
.try_steal = generic_pipe_buf_try_steal,
@@ -988,7 +1005,7 @@ long vfs_splice_read(struct file *in, loff_t *ppos,
* buffer, copy into it and splice that into the pipe.
*/
if ((in->f_flags & O_DIRECT) || IS_DAX(in->f_mapping->host))
- return copy_splice_read(in, ppos, pipe, len, flags);
+ return copy_splice_read_sleepok(in, ppos, pipe, len, flags);
return in->f_op->splice_read(in, ppos, pipe, len, flags);
}
EXPORT_SYMBOL_GPL(vfs_splice_read);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 98b7a7a8c42e..0980bf6ba8fd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2989,6 +2989,9 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
ssize_t copy_splice_read(struct file *in, loff_t *ppos,
struct pipe_inode_info *pipe,
size_t len, unsigned int flags);
+ssize_t copy_splice_read_sleepok(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe,
+ size_t len, unsigned int flags);
extern ssize_t iter_file_splice_write(struct pipe_inode_info *,
struct file *, loff_t *, size_t, unsigned int);
extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,


Attachments:
(No filename) (11.95 kB)
signature.asc (849.00 B)
Download all attachments

2023-12-24 05:02:14

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: [PATCH v2 14/11] fuse: allow splicing to trusted mounts only

FUSE tends to be installed suid 0: this allows normal users to mount
anything, including a program whose write implementation consists
of for(;;) sleep(1);, which, if splice were allowed, would sleep
forever with the pipe lock held.

Normal filesystems can only be mounted by root, and are thus deemed
safe. Extend this to when root mounts a FUSE filesystem and to
virtiofs, mirroring the splice_read "trusted" logic.

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
---
fs/fuse/file.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 20bb16ddfcc9..62308af13396 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3215,6 +3215,21 @@ static long fuse_splice_read(struct file *in, loff_t *ppos,
return -EINVAL;
}

+static ssize_t
+fuse_splice_write(struct pipe_inode_info *pipe, struct file *out,
+ loff_t *ppos, size_t len, unsigned int flags)
+{
+ struct inode *inode = file_inode(out);
+
+ if (fuse_is_bad(inode))
+ return -EIO;
+
+ if (get_fuse_conn(inode)->trusted)
+ return iter_file_splice_write(pipe, out, ppos, len, flags);
+
+ return -EINVAL;
+}
+
static const struct file_operations fuse_file_operations = {
.llseek = fuse_file_llseek,
.read_iter = fuse_file_read_iter,
@@ -3228,7 +3243,7 @@ static const struct file_operations fuse_file_operations = {
.get_unmapped_area = thp_get_unmapped_area,
.flock = fuse_file_flock,
.splice_read = fuse_splice_read,
- .splice_write = iter_file_splice_write,
+ .splice_write = fuse_splice_write,
.unlocked_ioctl = fuse_file_ioctl,
.compat_ioctl = fuse_file_compat_ioctl,
.poll = fuse_file_poll,
--
2.39.2


Attachments:
(No filename) (1.69 kB)
signature.asc (849.00 B)
Download all attachments

2023-12-24 05:02:19

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: [PATCH v2 13/11] tty: splice_write: disable

Given:
cat > ttyW.c <<^D
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdlib.h>
int main()
{
int pt = posix_openpt(O_RDWR);
grantpt(pt);
unlockpt(pt);
int cl = open(ptsname(pt), O_WRONLY);
for (;;)
splice(0, 0, cl, 0, 128 * 1024 * 1024, 0);
}
^D
cc ttyW.c -o ttyW
mkfifo fifo
truncate 32M 32M
./ttyW < fifo &
cp 32M fifo &
sleep 0.1
read -r _ < fifo
ttyW used to sleep in splice and the shell used to enter an
uninterruptible sleep in open("fifo");
now the splice returns -EINVAL and the whole program completes.

This is also symmetric with the splice_read removal.

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
---
It hit me that I should actually probably exhaustively re-evaluate
splice_write as well since re-evaluating splice_read went so well.

fs/fuse/dev.c: .splice_write = fuse_dev_splice_write,
drivers/char/virtio_console.c: .splice_write = port_fops_splice_write,
locks, takes some pages, unlocks, writes, so OK

drivers/char/mem.c: .splice_write = splice_write_null,
drivers/char/mem.c: .splice_write = splice_write_zero,
no-op

drivers/char/random.c: .splice_write = iter_file_splice_write,
drivers/char/random.c: .splice_write = iter_file_splice_write,
AFAICT write_pool_user is okay to invoke like this?

net/socket.c: .splice_write = splice_to_socket,
already dealt with in 11/11

drivers/tty/tty_io.c: .splice_write = iter_file_splice_write,
drivers/tty/tty_io.c: .splice_write = iter_file_splice_write,
they do lock the pipe and try the write with the lock held;
we already killed splice_read so just kill splice_write for symmetry
(13/11)

fs/fuse/file.c: .splice_write = iter_file_splice_write,
same logic as splice_read applies (14/11)

drivers/tty/tty_io.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 50c2957a9c7f..d931c34ddcbf 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -465,7 +465,6 @@ static const struct file_operations tty_fops = {
.llseek = no_llseek,
.read_iter = tty_read,
.write_iter = tty_write,
- .splice_write = iter_file_splice_write,
.poll = tty_poll,
.unlocked_ioctl = tty_ioctl,
.compat_ioctl = tty_compat_ioctl,
@@ -479,7 +478,6 @@ static const struct file_operations console_fops = {
.llseek = no_llseek,
.read_iter = tty_read,
.write_iter = redirected_tty_write,
- .splice_write = iter_file_splice_write,
.poll = tty_poll,
.unlocked_ioctl = tty_ioctl,
.compat_ioctl = tty_compat_ioctl,
--
2.39.2


Attachments:
(No filename) (2.54 kB)
signature.asc (849.00 B)
Download all attachments

2024-01-03 11:37:12

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH v2 08/11] tty: splice_read: disable

On 21. 12. 23, 4:09, Ahelenia Ziemiańska wrote:
> We request non-blocking I/O in the generic copy_splice_read, but
> "the tty layer doesn't actually honor the IOCB_NOWAIT flag for
> various historical reasons.". This means that a tty->pipe splice
> will happily sleep with the pipe locked forever, and any process
> trying to take it (due to an open/read/write/&c.) will enter
> uninterruptible sleep.
>
> This also masks inconsistent wake-ups (usually every second line)
> when splicing from ttys in icanon mode.
>
> Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wimmqG_wvSRtMiKPeGGDL816n65u=Mq2+H3-=uM2U6FmA@mail.gmail.com/
> Signed-off-by: Ahelenia Ziemiańska <[email protected]>
> ---
> drivers/tty/tty_io.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> index 06414e43e0b5..50c2957a9c7f 100644
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -465,7 +465,6 @@ static const struct file_operations tty_fops = {
> .llseek = no_llseek,
> .read_iter = tty_read,
> .write_iter = tty_write,
> - .splice_read = copy_splice_read,
> .splice_write = iter_file_splice_write,

This and the other patch effectively reverts dd78b0c483e33 and
9bb48c82aced0. I.e. it breaks "things". Especially:

commit 9bb48c82aced07698a2d08ee0f1475a6c4f6b266
Author: Linus Torvalds <[email protected]>
Date: Tue Jan 19 11:41:16 2021 -0800

tty: implement write_iter

This makes the tty layer use the .write_iter() function instead of the
traditional .write() functionality.

That allows writev(), but more importantly also makes it possible to
enable .splice_write() for ttys, reinstating the "splice to tty"
functionality that was lost in commit 36e2c7421f02 ("fs: don't allow
splice read/write without explicit ops").

Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without
explicit ops")


What are those "things" doing that "splice to tty", I don't recall and
the commit message above ^^^ does not spell that out. Linus?

thanks,
--
js


2024-01-03 19:15:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 08/11] tty: splice_read: disable

On Wed, 3 Jan 2024 at 03:36, Jiri Slaby <[email protected]> wrote:
>
> What are those "things" doing that "splice to tty", I don't recall and
> the commit message above ^^^ does not spell that out. Linus?

It's some annoying SSL VPN thing that splices to pppd:

https://lore.kernel.org/all/C8KER7U60WXE.25UFD8RE6QZQK@oguc/

and I'd be happy to try to limit splice to tty's to maybe just the one
case that pppd uses.

So I don't think we should remove splice_write for tty's entirely, but
maybe we can limit it to only the case that the VPN thing used.

I never saw the original issue personally, and I think only Oliver
reported it, so cc'ing Oliver.

Maybe that VPN thing already has the pty in non-blocking mode, for
example, and we could make the tty splicing fail for any blocking op?

Linus

2024-01-03 21:35:21

by Oliver Giles

[permalink] [raw]
Subject: Re: [PATCH v2 08/11] tty: splice_read: disable


On Wed, Jan 3 2024 at 11:14:59 -08:00:00, Linus Torvalds
<[email protected]> wrote:
>
> It's some annoying SSL VPN thing that splices to pppd:
>
> https://lore.kernel.org/all/C8KER7U60WXE.25UFD8RE6QZQK@oguc/

I'm happy to report that that particular SSL VPN tool is no longer
around.
And it had anyway grown a fall-back-to-read/write in case splice()
fails.
So at least from my perspective, no objections to splice-to-tty going
away
altogether.

> and I'd be happy to try to limit splice to tty's to maybe just the one
> case that pppd uses.

To be exact, pppd is just providing a pty with which other (now all
extinct?)
applications can do nefarious things.

> Maybe that VPN thing already has the pty in non-blocking mode, for
> example, and we could make the tty splicing fail for any blocking op?

FWIW, the SSL VPN tool did indeed have the pty in non-blocking mode.

Oliver




2024-01-03 22:06:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 08/11] tty: splice_read: disable

On Wed, 3 Jan 2024 at 13:34, Oliver Giles <[email protected]> wrote:
>
> I'm happy to report that that particular SSL VPN tool is no longer
> around.

Ahh, well that simplifies things and we can then just remove the tty
splice support again.

Of course, maybe then somebody else will report on some other odd
user, but ... fingers crossed.

Linus

2024-01-10 13:43:40

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH v2 09/11] fuse: file: limit splice_read to virtiofs

On Thu, 21 Dec 2023 at 04:09, Ahelenia Ziemiańska
<[email protected]> wrote:
>
> Potentially-blocking splice_reads are allowed for normal filesystems
> like NFS because they're blessed by root.
>
> FUSE is commonly used suid-root, and allows anyone to trivially create
> a file that, when spliced from, will just sleep forever with the pipe
> lock held.
>
> The only way IPC to the fusing process could be avoided is if
> !(ff->open_flags & FOPEN_DIRECT_IO) and the range was already cached
> and we weren't past the end. Just refuse it.

How is this not going to cause regressions out there?

We need to find an alternative to refusing splice, since this is not
going to fly, IMO.

Thanks,
Miklos

2024-01-10 15:19:48

by Ahelenia Ziemiańska

[permalink] [raw]
Subject: Re: [PATCH v2 09/11] fuse: file: limit splice_read to virtiofs

On Wed, Jan 10, 2024 at 02:43:04PM +0100, Miklos Szeredi wrote:
> On Thu, 21 Dec 2023 at 04:09, Ahelenia Ziemiańska
> <[email protected]> wrote:
> > Potentially-blocking splice_reads are allowed for normal filesystems
> > like NFS because they're blessed by root.
> >
> > FUSE is commonly used suid-root, and allows anyone to trivially create
> > a file that, when spliced from, will just sleep forever with the pipe
> > lock held.
> >
> > The only way IPC to the fusing process could be avoided is if
> > !(ff->open_flags & FOPEN_DIRECT_IO) and the range was already cached
> > and we weren't past the end. Just refuse it.
> How is this not going to cause regressions out there?
In "[PATCH v2 14/11] fuse: allow splicing to trusted mounts only"
splicing is re-enabled for mounts made by the real root.

> We need to find an alternative to refusing splice, since this is not
> going to fly, IMO.
The alternative is to not hold the lock. See the references in the
cover letter for why this wasn't done. IMO a potential slight perf
hit flies more than a total exclusion on the pipe.


Attachments:
(No filename) (1.09 kB)
signature.asc (849.00 B)
Download all attachments

2024-01-10 15:51:15

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH v2 09/11] fuse: file: limit splice_read to virtiofs

On Wed, 10 Jan 2024 at 16:19, Ahelenia Ziemiańska
<[email protected]> wrote:

> > We need to find an alternative to refusing splice, since this is not
> > going to fly, IMO.
> The alternative is to not hold the lock. See the references in the
> cover letter for why this wasn't done. IMO a potential slight perf
> hit flies more than a total exclusion on the pipe.

IDGI. This will make splice(2) return EINVAL for unprivileged fuse
files, right?

That would be a regression, not a perf hit, if the application is not
falling back to plain read; a reasonable scenario, considering splice
from files (including fuse) has worked on linux for a *long* time.

Thanks,
Mikos