REPO: https://github.com/smipi1/linux-tinification.git
BRANCH: tiny/config-syscall-splice
BACKGROUND: This patch-set forms part of the Linux Kernel Tinification effort (
https://tiny.wiki.kernel.org/).
GOAL: Support compiling out the splice family of syscalls (splice, vmsplice,
tee and sendfile) along with all supporting infrastructure if not needed.
Many embedded systems will not need the splice-family syscalls. Omitting them
saves space.
HISTORY:
PATCH v4:
- Drops __splice_p()
- Let nfsd fall back to non-splice support when splice is compiled out
- Style fixes
PATCH v3:
- Fixup commit logs so that they are consistent with patch strategy
- Style fixes
PATCH v2:
- Avoid the ifdef mess introduced in PATCH v1 by mocking out exported splice
functions.
STRATEGY:
a. With the goal of eventually compiling out fs/splice.c, several functions
that are only used in support of the the splice family of syscalls are moved
into fs/splice.c from fs/read_write.c. The kernel_write function that is not
used to support the splice syscalls is moved to fs/read_write.c.
b. Introduce an EXPERT kernel configuration option; CONFIG_SYSCALL_SPLICE; to
compile out the splice family of syscalls. This removes all userspace uses
of the splice infrastructure.
c. Splice exports an operations struct, nosteal_pipe_buf_ops. Eliminate the
use of this struct when CONFIG_SYSCALL_SPLICE is undefined, so that splice
can later be compiled out.
d. Let nfsd fall back to non-splice support when splice is compiled out.
e. Compile out fs/splice.c. Functions exported by fs/splice are mocked out with
failing static inlines. This is done so as to all but eliminate the
maintenance burden on file-system drivers.
RESULTS: A tinyconfig bloat-o-meter score for the entire patch-set:
add/remove: 0/41 grow/shrink: 5/7 up/down: 23/-8422 (-8399)
function old new delta
sys_pwritev 115 122 +7
sys_preadv 115 122 +7
fdput_pos 29 36 +7
sys_pwrite64 115 116 +1
sys_pread64 115 116 +1
pipe_to_null 4 - -4
generic_pipe_buf_nosteal 6 - -6
spd_release_page 10 - -10
fdput 11 - -11
PageUptodate 22 11 -11
lock_page 36 24 -12
signal_pending 39 26 -13
fdget 56 42 -14
page_cache_pipe_buf_release 16 - -16
user_page_pipe_buf_ops 20 - -20
splice_write_null 24 4 -20
page_cache_pipe_buf_ops 20 - -20
nosteal_pipe_buf_ops 20 - -20
default_pipe_buf_ops 20 - -20
generic_splice_sendpage 24 - -24
user_page_pipe_buf_steal 25 - -25
splice_shrink_spd 27 - -27
pipe_to_user 43 - -43
direct_splice_actor 47 - -47
default_file_splice_write 49 - -49
wakeup_pipe_writers 54 - -54
wakeup_pipe_readers 54 - -54
write_pipe_buf 71 - -71
page_cache_pipe_buf_confirm 80 - -80
splice_grow_spd 87 - -87
do_splice_to 87 - -87
ipipe_prep.part 92 - -92
splice_from_pipe 93 - -93
splice_from_pipe_next 107 - -107
pipe_to_sendpage 109 - -109
page_cache_pipe_buf_steal 114 - -114
opipe_prep.part 119 - -119
sys_sendfile 122 - -122
generic_file_splice_read 131 8 -123
sys_sendfile64 126 - -126
sys_vmsplice 137 - -137
do_splice_direct 148 - -148
vmsplice_to_user 205 - -205
__splice_from_pipe 246 - -246
splice_direct_to_actor 348 - -348
splice_to_pipe 371 - -371
do_sendfile 492 - -492
sys_tee 497 - -497
vmsplice_to_pipe 558 - -558
default_file_splice_read 688 - -688
iter_file_splice_write 702 4 -698
sys_splice 1075 - -1075
__generic_file_splice_read 1109 - -1109
Pieter Smith (7):
fs: move sendfile syscall into fs/splice
fs: moved kernel_write to fs/read_write
fs/splice: support compiling out splice-family syscalls
fs/fuse: support compiling out splice
fs/nfsd: support compiling out splice
net/core: support compiling out splice
fs/splice: full support for compiling out splice
fs/Makefile | 3 +-
fs/fuse/dev.c | 9 ++-
fs/read_write.c | 181 +++------------------------------------------
fs/splice.c | 194 +++++++++++++++++++++++++++++++++++++++++++++----
include/linux/fs.h | 26 +++++++
include/linux/skbuff.h | 10 +++
include/linux/splice.h | 42 +++++++++++
init/Kconfig | 10 +++
kernel/sys_ni.c | 8 ++
net/core/skbuff.c | 11 ++-
net/sunrpc/svc.c | 2 +-
11 files changed, 302 insertions(+), 194 deletions(-)
--
2.1.0
sendfile functionally forms part of the splice group of syscalls (splice,
vmsplice and tee). Grouping sendfile with splice paves the way to compiling out
the splice group of syscalls for embedded systems that do not need these.
add/remove: 0/0 grow/shrink: 7/2 up/down: 86/-61 (25)
function old new delta
file_start_write 34 68 +34
file_end_write 29 58 +29
sys_pwritev 115 122 +7
sys_preadv 115 122 +7
fdput_pos 29 36 +7
sys_pwrite64 115 116 +1
sys_pread64 115 116 +1
sys_tee 497 491 -6
sys_splice 1075 1020 -55
Signed-off-by: Pieter Smith <[email protected]>
---
fs/read_write.c | 175 -------------------------------------------------------
fs/splice.c | 178 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 178 insertions(+), 175 deletions(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index 7d9318c..d9451ba 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1191,178 +1191,3 @@ COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
}
#endif
-static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
- size_t count, loff_t max)
-{
- struct fd in, out;
- struct inode *in_inode, *out_inode;
- loff_t pos;
- loff_t out_pos;
- ssize_t retval;
- int fl;
-
- /*
- * Get input file, and verify that it is ok..
- */
- retval = -EBADF;
- in = fdget(in_fd);
- if (!in.file)
- goto out;
- if (!(in.file->f_mode & FMODE_READ))
- goto fput_in;
- retval = -ESPIPE;
- if (!ppos) {
- pos = in.file->f_pos;
- } else {
- pos = *ppos;
- if (!(in.file->f_mode & FMODE_PREAD))
- goto fput_in;
- }
- retval = rw_verify_area(READ, in.file, &pos, count);
- if (retval < 0)
- goto fput_in;
- count = retval;
-
- /*
- * Get output file, and verify that it is ok..
- */
- retval = -EBADF;
- out = fdget(out_fd);
- if (!out.file)
- goto fput_in;
- if (!(out.file->f_mode & FMODE_WRITE))
- goto fput_out;
- retval = -EINVAL;
- in_inode = file_inode(in.file);
- out_inode = file_inode(out.file);
- out_pos = out.file->f_pos;
- retval = rw_verify_area(WRITE, out.file, &out_pos, count);
- if (retval < 0)
- goto fput_out;
- count = retval;
-
- if (!max)
- max = min(in_inode->i_sb->s_maxbytes, out_inode->i_sb->s_maxbytes);
-
- if (unlikely(pos + count > max)) {
- retval = -EOVERFLOW;
- if (pos >= max)
- goto fput_out;
- count = max - pos;
- }
-
- fl = 0;
-#if 0
- /*
- * We need to debate whether we can enable this or not. The
- * man page documents EAGAIN return for the output at least,
- * and the application is arguably buggy if it doesn't expect
- * EAGAIN on a non-blocking file descriptor.
- */
- if (in.file->f_flags & O_NONBLOCK)
- fl = SPLICE_F_NONBLOCK;
-#endif
- file_start_write(out.file);
- retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
- file_end_write(out.file);
-
- if (retval > 0) {
- add_rchar(current, retval);
- add_wchar(current, retval);
- fsnotify_access(in.file);
- fsnotify_modify(out.file);
- out.file->f_pos = out_pos;
- if (ppos)
- *ppos = pos;
- else
- in.file->f_pos = pos;
- }
-
- inc_syscr(current);
- inc_syscw(current);
- if (pos > max)
- retval = -EOVERFLOW;
-
-fput_out:
- fdput(out);
-fput_in:
- fdput(in);
-out:
- return retval;
-}
-
-SYSCALL_DEFINE4(sendfile, int, out_fd, int, in_fd, off_t __user *, offset, size_t, count)
-{
- loff_t pos;
- off_t off;
- ssize_t ret;
-
- if (offset) {
- if (unlikely(get_user(off, offset)))
- return -EFAULT;
- pos = off;
- ret = do_sendfile(out_fd, in_fd, &pos, count, MAX_NON_LFS);
- if (unlikely(put_user(pos, offset)))
- return -EFAULT;
- return ret;
- }
-
- return do_sendfile(out_fd, in_fd, NULL, count, 0);
-}
-
-SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd, loff_t __user *, offset, size_t, count)
-{
- loff_t pos;
- ssize_t ret;
-
- if (offset) {
- if (unlikely(copy_from_user(&pos, offset, sizeof(loff_t))))
- return -EFAULT;
- ret = do_sendfile(out_fd, in_fd, &pos, count, 0);
- if (unlikely(put_user(pos, offset)))
- return -EFAULT;
- return ret;
- }
-
- return do_sendfile(out_fd, in_fd, NULL, count, 0);
-}
-
-#ifdef CONFIG_COMPAT
-COMPAT_SYSCALL_DEFINE4(sendfile, int, out_fd, int, in_fd,
- compat_off_t __user *, offset, compat_size_t, count)
-{
- loff_t pos;
- off_t off;
- ssize_t ret;
-
- if (offset) {
- if (unlikely(get_user(off, offset)))
- return -EFAULT;
- pos = off;
- ret = do_sendfile(out_fd, in_fd, &pos, count, MAX_NON_LFS);
- if (unlikely(put_user(pos, offset)))
- return -EFAULT;
- return ret;
- }
-
- return do_sendfile(out_fd, in_fd, NULL, count, 0);
-}
-
-COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd,
- compat_loff_t __user *, offset, compat_size_t, count)
-{
- loff_t pos;
- ssize_t ret;
-
- if (offset) {
- if (unlikely(copy_from_user(&pos, offset, sizeof(loff_t))))
- return -EFAULT;
- ret = do_sendfile(out_fd, in_fd, &pos, count, 0);
- if (unlikely(put_user(pos, offset)))
- return -EFAULT;
- return ret;
- }
-
- return do_sendfile(out_fd, in_fd, NULL, count, 0);
-}
-#endif
diff --git a/fs/splice.c b/fs/splice.c
index f5cb9ba..c1a2861 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -28,6 +28,7 @@
#include <linux/export.h>
#include <linux/syscalls.h>
#include <linux/uio.h>
+#include <linux/fsnotify.h>
#include <linux/security.h>
#include <linux/gfp.h>
#include <linux/socket.h>
@@ -2039,3 +2040,180 @@ SYSCALL_DEFINE4(tee, int, fdin, int, fdout, size_t, len, unsigned int, flags)
return error;
}
+
+static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
+ size_t count, loff_t max)
+{
+ struct fd in, out;
+ struct inode *in_inode, *out_inode;
+ loff_t pos;
+ loff_t out_pos;
+ ssize_t retval;
+ int fl;
+
+ /*
+ * Get input file, and verify that it is ok..
+ */
+ retval = -EBADF;
+ in = fdget(in_fd);
+ if (!in.file)
+ goto out;
+ if (!(in.file->f_mode & FMODE_READ))
+ goto fput_in;
+ retval = -ESPIPE;
+ if (!ppos) {
+ pos = in.file->f_pos;
+ } else {
+ pos = *ppos;
+ if (!(in.file->f_mode & FMODE_PREAD))
+ goto fput_in;
+ }
+ retval = rw_verify_area(READ, in.file, &pos, count);
+ if (retval < 0)
+ goto fput_in;
+ count = retval;
+
+ /*
+ * Get output file, and verify that it is ok..
+ */
+ retval = -EBADF;
+ out = fdget(out_fd);
+ if (!out.file)
+ goto fput_in;
+ if (!(out.file->f_mode & FMODE_WRITE))
+ goto fput_out;
+ retval = -EINVAL;
+ in_inode = file_inode(in.file);
+ out_inode = file_inode(out.file);
+ out_pos = out.file->f_pos;
+ retval = rw_verify_area(WRITE, out.file, &out_pos, count);
+ if (retval < 0)
+ goto fput_out;
+ count = retval;
+
+ if (!max)
+ max = min(in_inode->i_sb->s_maxbytes, out_inode->i_sb->s_maxbytes);
+
+ if (unlikely(pos + count > max)) {
+ retval = -EOVERFLOW;
+ if (pos >= max)
+ goto fput_out;
+ count = max - pos;
+ }
+
+ fl = 0;
+#if 0
+ /*
+ * We need to debate whether we can enable this or not. The
+ * man page documents EAGAIN return for the output at least,
+ * and the application is arguably buggy if it doesn't expect
+ * EAGAIN on a non-blocking file descriptor.
+ */
+ if (in.file->f_flags & O_NONBLOCK)
+ fl = SPLICE_F_NONBLOCK;
+#endif
+ file_start_write(out.file);
+ retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
+ file_end_write(out.file);
+
+ if (retval > 0) {
+ add_rchar(current, retval);
+ add_wchar(current, retval);
+ fsnotify_access(in.file);
+ fsnotify_modify(out.file);
+ out.file->f_pos = out_pos;
+ if (ppos)
+ *ppos = pos;
+ else
+ in.file->f_pos = pos;
+ }
+
+ inc_syscr(current);
+ inc_syscw(current);
+ if (pos > max)
+ retval = -EOVERFLOW;
+
+fput_out:
+ fdput(out);
+fput_in:
+ fdput(in);
+out:
+ return retval;
+}
+
+SYSCALL_DEFINE4(sendfile, int, out_fd, int, in_fd, off_t __user *, offset, size_t, count)
+{
+ loff_t pos;
+ off_t off;
+ ssize_t ret;
+
+ if (offset) {
+ if (unlikely(get_user(off, offset)))
+ return -EFAULT;
+ pos = off;
+ ret = do_sendfile(out_fd, in_fd, &pos, count, MAX_NON_LFS);
+ if (unlikely(put_user(pos, offset)))
+ return -EFAULT;
+ return ret;
+ }
+
+ return do_sendfile(out_fd, in_fd, NULL, count, 0);
+}
+
+SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd, loff_t __user *, offset, size_t, count)
+{
+ loff_t pos;
+ ssize_t ret;
+
+ if (offset) {
+ if (unlikely(copy_from_user(&pos, offset, sizeof(loff_t))))
+ return -EFAULT;
+ ret = do_sendfile(out_fd, in_fd, &pos, count, 0);
+ if (unlikely(put_user(pos, offset)))
+ return -EFAULT;
+ return ret;
+ }
+
+ return do_sendfile(out_fd, in_fd, NULL, count, 0);
+}
+
+#ifdef CONFIG_COMPAT
+COMPAT_SYSCALL_DEFINE4(sendfile, int, out_fd, int, in_fd,
+ compat_off_t __user *, offset, compat_size_t, count)
+{
+ loff_t pos;
+ off_t off;
+ ssize_t ret;
+
+ if (offset) {
+ if (unlikely(get_user(off, offset)))
+ return -EFAULT;
+ pos = off;
+ ret = do_sendfile(out_fd, in_fd, &pos, count, MAX_NON_LFS);
+ if (unlikely(put_user(pos, offset)))
+ return -EFAULT;
+ return ret;
+ }
+
+ return do_sendfile(out_fd, in_fd, NULL, count, 0);
+}
+
+COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd,
+ compat_loff_t __user *, offset, compat_size_t, count)
+{
+ loff_t pos;
+ ssize_t ret;
+
+ if (offset) {
+ if (unlikely(copy_from_user(&pos, offset, sizeof(loff_t))))
+ return -EFAULT;
+ ret = do_sendfile(out_fd, in_fd, &pos, count, 0);
+ if (unlikely(put_user(pos, offset)))
+ return -EFAULT;
+ return ret;
+ }
+
+ return do_sendfile(out_fd, in_fd, NULL, count, 0);
+}
+#endif
+
--
2.1.0
kernel_write shares infrastructure with the read_write translation unit but not
with the splice translation unit. Grouping kernel_write with the read_write
translation unit is more logical. It also paves the way to compiling out the
splice group of syscalls for embedded systems that do not need them.
Signed-off-by: Pieter Smith <[email protected]>
---
fs/read_write.c | 16 ++++++++++++++++
fs/splice.c | 16 ----------------
2 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index d9451ba..f4c8d8b 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1191,3 +1191,19 @@ COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
}
#endif
+ssize_t kernel_write(struct file *file, const char *buf, size_t count,
+ loff_t pos)
+{
+ mm_segment_t old_fs;
+ ssize_t res;
+
+ old_fs = get_fs();
+ set_fs(get_ds());
+ /* The cast to a user pointer is valid due to the set_fs() */
+ res = vfs_write(file, (__force const char __user *)buf, count, &pos);
+ set_fs(old_fs);
+
+ return res;
+}
+EXPORT_SYMBOL(kernel_write);
+
diff --git a/fs/splice.c b/fs/splice.c
index c1a2861..44b201b 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -583,22 +583,6 @@ static ssize_t kernel_readv(struct file *file, const struct iovec *vec,
return res;
}
-ssize_t kernel_write(struct file *file, const char *buf, size_t count,
- loff_t pos)
-{
- mm_segment_t old_fs;
- ssize_t res;
-
- old_fs = get_fs();
- set_fs(get_ds());
- /* The cast to a user pointer is valid due to the set_fs() */
- res = vfs_write(file, (__force const char __user *)buf, count, &pos);
- set_fs(old_fs);
-
- return res;
-}
-EXPORT_SYMBOL(kernel_write);
-
ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags)
--
2.1.0
To implement splice support, fs/fuse makes use of nosteal_pipe_buf_ops. This
struct is exported by fs/splice. The goal of the larger patch set is to
completely compile out fs/splice, so uses of the exported struct need to be
compiled out along with fs/splice.
This patch therefore compiles out splice support in fs/fuse when
CONFIG_SYSCALL_SPLICE is undefined.
Signed-off-by: Pieter Smith <[email protected]>
---
fs/fuse/dev.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ca88731..e984302 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1191,8 +1191,9 @@ __releases(fc->lock)
* request_end(). Otherwise add it to the processing list, and set
* the 'sent' flag.
*/
-static ssize_t fuse_dev_do_read(struct fuse_conn *fc, struct file *file,
- struct fuse_copy_state *cs, size_t nbytes)
+static ssize_t __maybe_unused
+fuse_dev_do_read(struct fuse_conn *fc, struct file *file,
+ struct fuse_copy_state *cs, size_t nbytes)
{
int err;
struct fuse_req *req;
@@ -1291,6 +1292,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, const struct iovec *iov,
return fuse_dev_do_read(fc, file, &cs, iov_length(iov, nr_segs));
}
+#ifdef CONFIG_SYSCALL_SPLICE
static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos,
struct pipe_inode_info *pipe,
size_t len, unsigned int flags)
@@ -1368,6 +1370,9 @@ out:
kfree(bufs);
return ret;
}
+#else /* CONFIG_SYSCALL_SPLICE */
+#define fuse_dev_splice_read NULL
+#endif
static int fuse_notify_poll(struct fuse_conn *fc, unsigned int size,
struct fuse_copy_state *cs)
--
2.1.0
The goal of the larger patch set is to completely compile out fs/splice, and
as a result, splice support for all file-systems. This patch ensures that
fs/nfsd falls back to non-splice fs support when CONFIG_SYSCALL_SPLICE is
undefined.
Signed-off-by: Pieter Smith <[email protected]>
---
net/sunrpc/svc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index ca8a795..6cacc37 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1084,7 +1084,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
goto err_short_len;
/* Will be turned off only in gss privacy case: */
- rqstp->rq_splice_ok = true;
+ rqstp->rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL);
/* Will be turned off only when NFSv4 Sessions are used */
rqstp->rq_usedeferral = true;
rqstp->rq_dropme = false;
--
2.1.0
Entirely compile out splice translation unit when the system is configured
without splice family of syscalls (i.e. CONFIG_SYSCALL_SPLICE is undefined).
Exported fs/splice functions are transparently mocked out with static inlines.
Because userspace support for splice has already been removed by this
patch-set, the exported functions cannot be called anyway. Mocking them out
prevents a maintenance burden on file system drivers.
The bloat score resulting from this patch given a tinyconfig is:
add/remove: 0/25 grow/shrink: 0/5 up/down: 0/-4845 (-4845)
function old new delta
pipe_to_null 4 - -4
generic_pipe_buf_nosteal 6 - -6
spd_release_page 10 - -10
PageUptodate 22 11 -11
lock_page 36 24 -12
page_cache_pipe_buf_release 16 - -16
splice_write_null 24 4 -20
page_cache_pipe_buf_ops 20 - -20
nosteal_pipe_buf_ops 20 - -20
default_pipe_buf_ops 20 - -20
generic_splice_sendpage 24 - -24
splice_shrink_spd 27 - -27
direct_splice_actor 47 - -47
default_file_splice_write 49 - -49
wakeup_pipe_writers 54 - -54
write_pipe_buf 71 - -71
page_cache_pipe_buf_confirm 80 - -80
splice_grow_spd 87 - -87
splice_from_pipe 93 - -93
splice_from_pipe_next 106 - -106
pipe_to_sendpage 109 - -109
page_cache_pipe_buf_steal 114 - -114
generic_file_splice_read 131 8 -123
do_splice_direct 148 - -148
__splice_from_pipe 246 - -246
splice_direct_to_actor 416 - -416
splice_to_pipe 417 - -417
default_file_splice_read 688 - -688
iter_file_splice_write 702 4 -698
__generic_file_splice_read 1109 - -1109
The bloat score for the entire CONFIG_SYSCALL_SPLICE patch-set is:
add/remove: 0/41 grow/shrink: 5/7 up/down: 23/-8422 (-8399)
function old new delta
sys_pwritev 115 122 +7
sys_preadv 115 122 +7
fdput_pos 29 36 +7
sys_pwrite64 115 116 +1
sys_pread64 115 116 +1
pipe_to_null 4 - -4
generic_pipe_buf_nosteal 6 - -6
spd_release_page 10 - -10
fdput 11 - -11
PageUptodate 22 11 -11
lock_page 36 24 -12
signal_pending 39 26 -13
fdget 56 42 -14
page_cache_pipe_buf_release 16 - -16
user_page_pipe_buf_ops 20 - -20
splice_write_null 24 4 -20
page_cache_pipe_buf_ops 20 - -20
nosteal_pipe_buf_ops 20 - -20
default_pipe_buf_ops 20 - -20
generic_splice_sendpage 24 - -24
user_page_pipe_buf_steal 25 - -25
splice_shrink_spd 27 - -27
pipe_to_user 43 - -43
direct_splice_actor 47 - -47
default_file_splice_write 49 - -49
wakeup_pipe_writers 54 - -54
wakeup_pipe_readers 54 - -54
write_pipe_buf 71 - -71
page_cache_pipe_buf_confirm 80 - -80
splice_grow_spd 87 - -87
do_splice_to 87 - -87
ipipe_prep.part 92 - -92
splice_from_pipe 93 - -93
splice_from_pipe_next 107 - -107
pipe_to_sendpage 109 - -109
page_cache_pipe_buf_steal 114 - -114
opipe_prep.part 119 - -119
sys_sendfile 122 - -122
generic_file_splice_read 131 8 -123
sys_sendfile64 126 - -126
sys_vmsplice 137 - -137
do_splice_direct 148 - -148
vmsplice_to_user 205 - -205
__splice_from_pipe 246 - -246
splice_direct_to_actor 348 - -348
splice_to_pipe 371 - -371
do_sendfile 492 - -492
sys_tee 497 - -497
vmsplice_to_pipe 558 - -558
default_file_splice_read 688 - -688
iter_file_splice_write 702 4 -698
sys_splice 1075 - -1075
__generic_file_splice_read 1109 - -1109
Signed-off-by: Pieter Smith <[email protected]>
---
fs/Makefile | 3 ++-
fs/splice.c | 2 --
include/linux/fs.h | 26 ++++++++++++++++++++++++++
include/linux/splice.h | 42 ++++++++++++++++++++++++++++++++++++++++++
4 files changed, 70 insertions(+), 3 deletions(-)
diff --git a/fs/Makefile b/fs/Makefile
index fb7646e..9395622 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -10,7 +10,7 @@ obj-y := open.o read_write.o file_table.o super.o \
ioctl.o readdir.o select.o dcache.o inode.o \
attr.o bad_inode.o file.o filesystems.o namespace.o \
seq_file.o xattr.o libfs.o fs-writeback.o \
- pnode.o splice.o sync.o utimes.o \
+ pnode.o sync.o utimes.o \
stack.o fs_struct.o statfs.o fs_pin.o
ifeq ($(CONFIG_BLOCK),y)
@@ -22,6 +22,7 @@ endif
obj-$(CONFIG_PROC_FS) += proc_namespace.o
obj-$(CONFIG_FSNOTIFY) += notify/
+obj-$(CONFIG_SYSCALL_SPLICE) += splice.o
obj-$(CONFIG_EPOLL) += eventpoll.o
obj-$(CONFIG_ANON_INODES) += anon_inodes.o
obj-$(CONFIG_SIGNALFD) += signalfd.o
diff --git a/fs/splice.c b/fs/splice.c
index 7c4c695..44b201b 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1316,7 +1316,6 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
return ret;
}
-#ifdef CONFIG_SYSCALL_SPLICE
static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
struct pipe_inode_info *opipe,
size_t len, unsigned int flags);
@@ -2201,5 +2200,4 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd,
return do_sendfile(out_fd, in_fd, NULL, count, 0);
}
#endif
-#endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a957d43..138107e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2444,6 +2444,7 @@ extern int blkdev_fsync(struct file *filp, loff_t start, loff_t end,
extern void block_sync_page(struct page *page);
/* fs/splice.c */
+#ifdef CONFIG_SYSCALL_SPLICE
extern ssize_t generic_file_splice_read(struct file *, loff_t *,
struct pipe_inode_info *, size_t, unsigned int);
extern ssize_t default_file_splice_read(struct file *, loff_t *,
@@ -2452,6 +2453,31 @@ extern ssize_t iter_file_splice_write(struct pipe_inode_info *,
struct file *, loff_t *, size_t, unsigned int);
extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe,
struct file *out, loff_t *, size_t len, unsigned int flags);
+#else
+static inline ssize_t generic_file_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len, unsigned int flags)
+{
+ return -EPERM;
+}
+
+static inline ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len, unsigned int flags)
+{
+ return -EPERM;
+}
+
+static inline ssize_t iter_file_splice_write(struct pipe_inode_info *pipe,
+ struct file *out, loff_t *ppos, size_t len, unsigned int flags)
+{
+ return -EPERM;
+}
+
+static inline ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe,
+ struct file *out, loff_t *ppos, size_t len, unsigned int flags)
+{
+ return -EPERM;
+}
+#endif
extern void
file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping);
diff --git a/include/linux/splice.h b/include/linux/splice.h
index da2751d..34570d8 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -65,6 +65,7 @@ typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *,
typedef int (splice_direct_actor)(struct pipe_inode_info *,
struct splice_desc *);
+#ifdef CONFIG_SYSCALL_SPLICE
extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *,
loff_t *, size_t, unsigned int,
splice_actor *);
@@ -74,13 +75,54 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
struct splice_pipe_desc *);
extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
splice_direct_actor *);
+#else
+static inline ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out,
+ loff_t *ppos, size_t len, unsigned int flags,
+ splice_actor *actor)
+{
+ return -EPERM;
+}
+
+static inline ssize_t __splice_from_pipe(struct pipe_inode_info *pipe, struct splice_desc *sd,
+ splice_actor *actor)
+{
+ return -EPERM;
+}
+
+static inline ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
+ struct splice_pipe_desc *spd)
+{
+ return -EPERM;
+}
+
+static inline ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
+ splice_direct_actor *actor)
+{
+ return -EPERM;
+}
+#endif
/*
* for dynamic pipe sizing
*/
+#ifdef CONFIG_SYSCALL_SPLICE
extern int splice_grow_spd(const struct pipe_inode_info *, struct splice_pipe_desc *);
extern void splice_shrink_spd(struct splice_pipe_desc *);
extern void spd_release_page(struct splice_pipe_desc *, unsigned int);
+#else
+static inline int splice_grow_spd(const struct pipe_inode_info *pipe, struct splice_pipe_desc *spd)
+{
+ return -EPERM;
+}
+
+static inline void splice_shrink_spd(struct splice_pipe_desc *spd)
+{
+}
+
+static inline void spd_release_page(struct splice_pipe_desc *spd, unsigned int i)
+{
+}
+#endif
extern const struct pipe_buf_operations page_cache_pipe_buf_ops;
#endif
--
2.1.0
To implement splice support, net/core makes use of nosteal_pipe_buf_ops. This
struct is exported by fs/splice. The goal of the larger patch set is to
completely compile out fs/splice, so uses of the exported struct need to be
compiled out along with fs/splice.
This patch therefore compiles out splice support in net/core when
CONFIG_SYSCALL_SPLICE is undefined. The compiled out function skb_splice_bits
is transparently mocked out with a static inline. The greater patch set removes
userspace splice support so it cannot be called anyway.
Signed-off-by: Pieter Smith <[email protected]>
---
include/linux/skbuff.h | 10 ++++++++++
net/core/skbuff.c | 11 +++++++----
2 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a59d934..5cd636b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2640,9 +2640,19 @@ int skb_copy_bits(const struct sk_buff *skb, int offset, void *to, int len);
int skb_store_bits(struct sk_buff *skb, int offset, const void *from, int len);
__wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset, u8 *to,
int len, __wsum csum);
+#ifdef CONFIG_SYSCALL_SPLICE
int skb_splice_bits(struct sk_buff *skb, unsigned int offset,
struct pipe_inode_info *pipe, unsigned int len,
unsigned int flags);
+#else
+static inline int
+skb_splice_bits(struct sk_buff *skb, unsigned int offset,
+ struct pipe_inode_info *pipe, unsigned int len,
+ unsigned int flags)
+{
+ return -EPERM;
+}
+#endif
void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
int skb_zerocopy(struct sk_buff *to, struct sk_buff *from,
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 61059a0..bb426d9 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1678,7 +1678,8 @@ EXPORT_SYMBOL(skb_copy_bits);
* Callback from splice_to_pipe(), if we need to release some pages
* at the end of the spd in case we error'ed out in filling the pipe.
*/
-static void sock_spd_release(struct splice_pipe_desc *spd, unsigned int i)
+static void __maybe_unused sock_spd_release(struct splice_pipe_desc *spd,
+ unsigned int i)
{
put_page(spd->pages[i]);
}
@@ -1781,9 +1782,9 @@ static bool __splice_segment(struct page *page, unsigned int poff,
* Map linear and fragment data from the skb to spd. It reports true if the
* pipe is full or if we already spliced the requested length.
*/
-static bool __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe,
- unsigned int *offset, unsigned int *len,
- struct splice_pipe_desc *spd, struct sock *sk)
+static bool __maybe_unused __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe,
+ unsigned int *offset, unsigned int *len,
+ struct splice_pipe_desc *spd, struct sock *sk)
{
int seg;
@@ -1821,6 +1822,7 @@ static bool __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe,
* the frag list, if such a thing exists. We'd probably need to recurse to
* handle that cleanly.
*/
+#ifdef CONFIG_SYSCALL_SPLICE
int skb_splice_bits(struct sk_buff *skb, unsigned int offset,
struct pipe_inode_info *pipe, unsigned int tlen,
unsigned int flags)
@@ -1876,6 +1878,7 @@ done:
return ret;
}
+#endif /* CONFIG_SYSCALL_SPLICE */
/**
* skb_store_bits - store bits from kernel buffer to skb
--
2.1.0
Many embedded systems will not need the splice-family syscalls (splice,
vmsplice, tee and sendfile). Omitting them saves space. This adds a new EXPERT
config option CONFIG_SYSCALL_SPLICE (default y) to support compiling them out.
The goal is to completely compile out fs/splice along with the syscalls. To
achieve this, the remaining patch-set will deal with fs/splice exports. As far
as possible, the impact on other device drivers will be minimized so as to
reduce the overal maintenance burden of CONFIG_SYSCALL_SPLICE.
The use of exported functions will be solved by transparently mocking them out
with static inlines. Uses of the exported pipe_buf_operations struct however
require direct modification in fs/fuse and net/core. The next two patches will
deal with this. A macro is defined that will assist with NULL'ing out callbacks
when CONFIG_SYSCALL_SPLICE is undefined: __splice_p().
Once all exports are solved, fs/splice can be compiled out.
The bloat benefit of this patch given a tinyconfig is:
add/remove: 0/16 grow/shrink: 2/5 up/down: 114/-3693 (-3579)
function old new delta
splice_direct_to_actor 348 416 +68
splice_to_pipe 371 417 +46
splice_from_pipe_next 107 106 -1
fdput 11 - -11
signal_pending 39 26 -13
fdget 56 42 -14
user_page_pipe_buf_ops 20 - -20
user_page_pipe_buf_steal 25 - -25
file_end_write 58 29 -29
file_start_write 68 34 -34
pipe_to_user 43 - -43
wakeup_pipe_readers 54 - -54
do_splice_to 87 - -87
ipipe_prep.part 92 - -92
opipe_prep.part 119 - -119
sys_sendfile 122 - -122
sys_sendfile64 126 - -126
sys_vmsplice 137 - -137
vmsplice_to_user 205 - -205
sys_tee 491 - -491
do_sendfile 492 - -492
vmsplice_to_pipe 558 - -558
sys_splice 1020 - -1020
Signed-off-by: Pieter Smith <[email protected]>
---
fs/splice.c | 2 ++
init/Kconfig | 10 ++++++++++
kernel/sys_ni.c | 8 ++++++++
3 files changed, 20 insertions(+)
diff --git a/fs/splice.c b/fs/splice.c
index 44b201b..7c4c695 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1316,6 +1316,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
return ret;
}
+#ifdef CONFIG_SYSCALL_SPLICE
static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
struct pipe_inode_info *opipe,
size_t len, unsigned int flags);
@@ -2200,4 +2201,5 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd,
return do_sendfile(out_fd, in_fd, NULL, count, 0);
}
#endif
+#endif
diff --git a/init/Kconfig b/init/Kconfig
index d811d5f..dec9819 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1571,6 +1571,16 @@ config NTP
system clock to an NTP server, you can disable this option to save
space.
+config SYSCALL_SPLICE
+ bool "Enable splice/vmsplice/tee/sendfile syscalls" if EXPERT
+ default y
+ help
+ This option enables the splice, vmsplice, tee and sendfile syscalls. These
+ are used by applications to: move data between buffers and arbitrary file
+ descriptors; "copy" data between buffers; or copy data from userspace into
+ buffers. If building an embedded system where no applications use these
+ syscalls, you can disable this option to save space.
+
config PCI_QUIRKS
default y
bool "Enable PCI quirk workarounds" if EXPERT
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index d2f5b00..25d5551 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -170,6 +170,14 @@ cond_syscall(sys_fstat);
cond_syscall(sys_stat);
cond_syscall(sys_uname);
cond_syscall(sys_olduname);
+cond_syscall(sys_vmsplice);
+cond_syscall(sys_splice);
+cond_syscall(sys_tee);
+cond_syscall(sys_sendfile);
+cond_syscall(sys_sendfile64);
+cond_syscall(compat_sys_vmsplice);
+cond_syscall(compat_sys_sendfile);
+cond_syscall(compat_sys_sendfile64);
/* arch-specific weak syscall entries */
cond_syscall(sys_pciconfig_read);
--
2.1.0
On Tue, Nov 25, 2014 at 12:01:02AM +0100, Pieter Smith wrote:
> Many embedded systems will not need the splice-family syscalls (splice,
> vmsplice, tee and sendfile). Omitting them saves space. This adds a new EXPERT
> config option CONFIG_SYSCALL_SPLICE (default y) to support compiling them out.
>
> The goal is to completely compile out fs/splice along with the syscalls. To
> achieve this, the remaining patch-set will deal with fs/splice exports. As far
> as possible, the impact on other device drivers will be minimized so as to
> reduce the overal maintenance burden of CONFIG_SYSCALL_SPLICE.
>
> The use of exported functions will be solved by transparently mocking them out
> with static inlines. Uses of the exported pipe_buf_operations struct however
> require direct modification in fs/fuse and net/core. The next two patches will
> deal with this. A macro is defined that will assist with NULL'ing out callbacks
> when CONFIG_SYSCALL_SPLICE is undefined: __splice_p().
This message needs updating, since the patch series doesn't introduce or
use __splice_p anymore.
> Once all exports are solved, fs/splice can be compiled out.
>
> The bloat benefit of this patch given a tinyconfig is:
>
> add/remove: 0/16 grow/shrink: 2/5 up/down: 114/-3693 (-3579)
> function old new delta
> splice_direct_to_actor 348 416 +68
> splice_to_pipe 371 417 +46
> splice_from_pipe_next 107 106 -1
> fdput 11 - -11
> signal_pending 39 26 -13
> fdget 56 42 -14
> user_page_pipe_buf_ops 20 - -20
> user_page_pipe_buf_steal 25 - -25
> file_end_write 58 29 -29
> file_start_write 68 34 -34
> pipe_to_user 43 - -43
> wakeup_pipe_readers 54 - -54
> do_splice_to 87 - -87
> ipipe_prep.part 92 - -92
> opipe_prep.part 119 - -119
> sys_sendfile 122 - -122
> sys_sendfile64 126 - -126
> sys_vmsplice 137 - -137
> vmsplice_to_user 205 - -205
> sys_tee 491 - -491
> do_sendfile 492 - -492
> vmsplice_to_pipe 558 - -558
> sys_splice 1020 - -1020
>
> Signed-off-by: Pieter Smith <[email protected]>
> ---
> fs/splice.c | 2 ++
> init/Kconfig | 10 ++++++++++
> kernel/sys_ni.c | 8 ++++++++
> 3 files changed, 20 insertions(+)
>
> diff --git a/fs/splice.c b/fs/splice.c
> index 44b201b..7c4c695 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -1316,6 +1316,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
> return ret;
> }
>
> +#ifdef CONFIG_SYSCALL_SPLICE
> static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
> struct pipe_inode_info *opipe,
> size_t len, unsigned int flags);
> @@ -2200,4 +2201,5 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd,
> return do_sendfile(out_fd, in_fd, NULL, count, 0);
> }
> #endif
> +#endif
>
> diff --git a/init/Kconfig b/init/Kconfig
> index d811d5f..dec9819 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1571,6 +1571,16 @@ config NTP
> system clock to an NTP server, you can disable this option to save
> space.
>
> +config SYSCALL_SPLICE
> + bool "Enable splice/vmsplice/tee/sendfile syscalls" if EXPERT
> + default y
> + help
> + This option enables the splice, vmsplice, tee and sendfile syscalls. These
> + are used by applications to: move data between buffers and arbitrary file
> + descriptors; "copy" data between buffers; or copy data from userspace into
> + buffers. If building an embedded system where no applications use these
> + syscalls, you can disable this option to save space.
> +
> config PCI_QUIRKS
> default y
> bool "Enable PCI quirk workarounds" if EXPERT
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index d2f5b00..25d5551 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -170,6 +170,14 @@ cond_syscall(sys_fstat);
> cond_syscall(sys_stat);
> cond_syscall(sys_uname);
> cond_syscall(sys_olduname);
> +cond_syscall(sys_vmsplice);
> +cond_syscall(sys_splice);
> +cond_syscall(sys_tee);
> +cond_syscall(sys_sendfile);
> +cond_syscall(sys_sendfile64);
> +cond_syscall(compat_sys_vmsplice);
> +cond_syscall(compat_sys_sendfile);
> +cond_syscall(compat_sys_sendfile64);
>
> /* arch-specific weak syscall entries */
> cond_syscall(sys_pciconfig_read);
> --
> 2.1.0
>
On Tue, Nov 25, 2014 at 12:00:59AM +0100, Pieter Smith wrote:
> REPO: https://github.com/smipi1/linux-tinification.git
>
> BRANCH: tiny/config-syscall-splice
>
> BACKGROUND: This patch-set forms part of the Linux Kernel Tinification effort (
> https://tiny.wiki.kernel.org/).
>
> GOAL: Support compiling out the splice family of syscalls (splice, vmsplice,
> tee and sendfile) along with all supporting infrastructure if not needed.
> Many embedded systems will not need the splice-family syscalls. Omitting them
> saves space.
>
> HISTORY:
> PATCH v4:
> - Drops __splice_p()
> - Let nfsd fall back to non-splice support when splice is compiled out
> - Style fixes
[...]
> RESULTS: A tinyconfig bloat-o-meter score for the entire patch-set:
>
> add/remove: 0/41 grow/shrink: 5/7 up/down: 23/-8422 (-8399)
I replied to one patch with a minor nit in the commit message. Other
than that, I don't see any obvious issues with this.
- Josh Triplett
From: Randy Dunlap <[email protected]>
Date: Tue, 25 Nov 2014 08:17:58 -0800
> Is the splice family of syscalls the only one that tiny has identified
> for optional building or can we expect similar treatment for other
> syscalls?
>
> Why will many embedded systems not need these syscalls? You know
> exactly what apps they run and you are positive that those apps do
> not use splice?
I think starting to compile out system calls is a very slippery
slope we should not begin the journey down.
This changes the forward facing interface to userspace.
On Tue, Nov 25, 2014 at 12:13:05PM -0500, David Miller wrote:
> From: Randy Dunlap <[email protected]>
> Date: Tue, 25 Nov 2014 08:17:58 -0800
>
> > Is the splice family of syscalls the only one that tiny has identified
> > for optional building or can we expect similar treatment for other
> > syscalls?
> >
> > Why will many embedded systems not need these syscalls? You know
> > exactly what apps they run and you are positive that those apps do
> > not use splice?
>
> I think starting to compile out system calls is a very slippery
> slope we should not begin the journey down.
>
> This changes the forward facing interface to userspace.
I certainly sympathize with this concern, given the importance of software
portability. However, the tiny-hardware alternative appears ot some sort
of special-purpose embedded OS, which most definitely will suffer from
software compatibility issues. I guess that the good news is that much
of the tiny hardware that used to be 8 or 16 bits is now 32 bits, which
means that it has at least some chance of running some form of Linux. ;-)
Thanx, Paul
From: "Paul E. McKenney" <[email protected]>
Date: Tue, 25 Nov 2014 10:10:32 -0800
> I certainly sympathize with this concern, given the importance of software
> portability. However, the tiny-hardware alternative appears ot some sort
> of special-purpose embedded OS, which most definitely will suffer from
> software compatibility issues. I guess that the good news is that much
> of the tiny hardware that used to be 8 or 16 bits is now 32 bits, which
> means that it has at least some chance of running some form of Linux. ;-)
And then if some fundamental part of userland (glibc, klibc, etc.) finds
a useful way to use splice for a fundamental operation, we're back to
square one.
I simply do not agree with modifying the user facing interface, especially
one with decades of precedence.
On Tue, Nov 25, 2014 at 12:13:05PM -0500, David Miller wrote:
> From: Randy Dunlap <[email protected]>
> Date: Tue, 25 Nov 2014 08:17:58 -0800
>
> > Is the splice family of syscalls the only one that tiny has identified
> > for optional building or can we expect similar treatment for other
> > syscalls?
> >
> > Why will many embedded systems not need these syscalls? You know
> > exactly what apps they run and you are positive that those apps do
> > not use splice?
>
> I think starting to compile out system calls is a very slippery
> slope we should not begin the journey down.
>
> This changes the forward facing interface to userspace.
It's not a "slippery slope"; it's been our standard practice for ages.
We started down that road long, long ago, when we first introduced
Kconfig and optional/modular features. /dev/* are user-facing
interfaces, yet you can compile them out or make them modular. /sys/*
and/proc/* are user-facing interfaces, yet you can compile part or all
of them out. Filesystem names passed to mount are user-facing
interfaces, yet you can compile them out. (Not just things like ext4;
think FUSE or overlayfs, which some applications will build upon and
require.) Some prctls are optional, new syscalls like BPF or inotify or
process_vm_{read,write}v are optional, hardware interfaces are optional,
control groups are optional, containers and namespaces are optional,
checkpoint/restart is optional, KVM is optional, kprobes are optional,
kmsg is optional, /dev/port is optional, ACL support is optional, USB
support (as used by libusb) is optional, sound interfaces are optional,
GPU interfaces are optional, even futexes are optional.
For every single one of those, userspace programs or libraries may
depend on that functionality, and summarily exit if it doesn't exist,
perhaps with a warning that you need to enable options in your kernel,
or perhaps with a simple "Function not implemented" or "No such file or
directory".
Out of the entire list above and the many more where that came from,
what makes syscalls unique? What's wildly different between
open("/dev/foo", ...) returning an error and sys_foo returning an error?
What makes syscalls so special out of the entire list above? We're not
breaking the ability to run old userspace on a new kernel, which *must*
be supported, and that includes not just syscalls but all user-facing
interfaces; we don't break userspace. But we've *never* guaranteed that
you can run old userspace on a new *allnoconfig* kernel.
All of these features will remain behind CONFIG_EXPERT, and all of them
warn that you can only use them if your userspace can cope.
I've actually been thinking of introducing a new CONFIG_ALL_SYSCALLS,
under which all the "enable support for foo syscall" can live, rather
than just piling all of them directly under CONFIG_EXPERT; that option
would then repeat in very clear terms the warning that if you disable
that option and then disable specific syscalls, you need to know exactly
what your target userspace uses. That would group together this whole
family of options, and make it clearer what the implications are.
- Josh Triplett
On Tue, Nov 25, 2014 at 01:24:45PM -0500, David Miller wrote:
>
> And then if some fundamental part of userland (glibc, klibc, etc.) finds
> a useful way to use splice for a fundamental operation, we're back to
> square one.
I'll note that the applications for these super-tiny kernels are
places where it's not likely they would be using glibc at all; think
very tiny embedded systems. The userspace tends to be highly
restricted for the same space reasons why there is an effort to make
the kernel as small as possible.
In these places, they are using Linux already, but they're using a 2.2
or 2.4 kernel because 3.0 is just too damned big. So the goal is to
try to provide them an alternative which allows them to use a modern,
but stripped down kernel. If glibc or klibc isn't going to work
without splice, then it's not going to work on a pre 2.6 kernel
anyway, so things are no worse with these systems anyway.
After all, if we can get these systems to using a 3.x kernel w/o
splice, that's surely better than their using a 2.2 or 2.4 kernel w/o
the splice system, isn't it?
Cheers,
- Ted
From: [email protected]
Date: Tue, 25 Nov 2014 10:53:10 -0800
> It's not a "slippery slope"; it's been our standard practice for ages.
We've never put an entire class of generic system calls behind
a config option.
From: Theodore Ts'o <[email protected]>
Date: Tue, 25 Nov 2014 13:58:06 -0500
> On Tue, Nov 25, 2014 at 01:24:45PM -0500, David Miller wrote:
>>
>> And then if some fundamental part of userland (glibc, klibc, etc.) finds
>> a useful way to use splice for a fundamental operation, we're back to
>> square one.
>
> I'll note that the applications for these super-tiny kernels are
> places where it's not likely they would be using glibc at all; think
> very tiny embedded systems. The userspace tends to be highly
> restricted for the same space reasons why there is an effort to make
> the kernel as small as possible.
This is why I mentioned klibc, in order to avoid replies like your's,
it seems I have failed.
David Miller <[email protected]> writes:
> From: [email protected]
> Date: Tue, 25 Nov 2014 10:53:10 -0800
>
>> It's not a "slippery slope"; it's been our standard practice for ages.
>
> We've never put an entire class of generic system calls behind
> a config option.
CONFIG_SYSVIPC has been in the kernel as long as I can remember.
I seem to remember a plan to remove that code once userspace had
finished migrating to more unixy interfaces to ipc. But in 20 years
that migration does does not seem to have finished, or even look
like it ever will.
But if we started a slippery slope it was long long ago.
Eric
From: [email protected] (Eric W. Biederman)
Date: Tue, 25 Nov 2014 13:16:44 -0600
> David Miller <[email protected]> writes:
>
>> From: [email protected]
>> Date: Tue, 25 Nov 2014 10:53:10 -0800
>>
>>> It's not a "slippery slope"; it's been our standard practice for ages.
>>
>> We've never put an entire class of generic system calls behind
>> a config option.
>
> CONFIG_SYSVIPC has been in the kernel as long as I can remember.
>
> I seem to remember a plan to remove that code once userspace had
> finished migrating to more unixy interfaces to ipc. But in 20 years
> that migration does does not seem to have finished, or even look
> like it ever will.
>
> But if we started a slippery slope it was long long ago.
Fair enough.
Would be amusing if these tiny systems have it enabled.
David Miller <[email protected]> writes:
> From: [email protected] (Eric W. Biederman)
> Date: Tue, 25 Nov 2014 13:16:44 -0600
>
>> David Miller <[email protected]> writes:
>>
>>> From: [email protected]
>>> Date: Tue, 25 Nov 2014 10:53:10 -0800
>>>
>>>> It's not a "slippery slope"; it's been our standard practice for ages.
>>>
>>> We've never put an entire class of generic system calls behind
>>> a config option.
>>
>> CONFIG_SYSVIPC has been in the kernel as long as I can remember.
>>
>> I seem to remember a plan to remove that code once userspace had
>> finished migrating to more unixy interfaces to ipc. But in 20 years
>> that migration does does not seem to have finished, or even look
>> like it ever will.
>>
>> But if we started a slippery slope it was long long ago.
>
> Fair enough.
>
> Would be amusing if these tiny systems have it enabled.
It would.
In practice when I was playing in that space I had a hard time
justifying CONFIG_NET and CONFIG_INET. Despite writing a network
bootloader to use with kexec.
Eric
On Tue, Nov 25, 2014 at 02:04:41PM -0500, David Miller wrote:
> From: [email protected]
> Date: Tue, 25 Nov 2014 10:53:10 -0800
>
> > It's not a "slippery slope"; it's been our standard practice for ages.
>
> We've never put an entire class of generic system calls behind
> a config option.
I would have loved to make them optional individually, but they all are
semantic variations of the same thing: Moving data between fd's without that
data passing through userspace. It therefore isn't surprising that these
syscalls share an underlying entanglement of code (which is where the bulk of
the space saving is to be had).
What a tiny product developer should be asking himself, is: "Do I really need
to efficiently move data between file descriptors?". If the answer no, he can
disable CONFIG_SYSCALL_SPLICE to squeeze an extra 8KB out of his kernel.
[Resending this mail due to some email encoding brokenness that
prevented it from reaching LKML the first time; sorry to anyone who
receives two copies.]
On Tue, Nov 25, 2014 at 08:17:58AM -0800, Randy Dunlap wrote:
> On 11/24/2014 03:00 PM, Pieter Smith wrote:
> >REPO: https://github.com/smipi1/linux-tinification.git
> >
> >BRANCH: tiny/config-syscall-splice
> >
> >BACKGROUND: This patch-set forms part of the Linux Kernel Tinification effort (
> > https://tiny.wiki.kernel.org/).
> >
> >GOAL: Support compiling out the splice family of syscalls (splice, vmsplice,
> > tee and sendfile) along with all supporting infrastructure if not needed.
> > Many embedded systems will not need the splice-family syscalls. Omitting them
> > saves space.
>
> Hi,
>
> Is the splice family of syscalls the only one that tiny has identified
> for optional building or can we expect similar treatment for other
> syscalls?
Pretty much any system call that you could conceive of writing a
userspace without.
There's a partial project list at https://tiny.wiki.kernel.org/projects.
> Why will many embedded systems not need these syscalls? You know
> exactly what apps they run and you are positive that those apps do
> not use splice?
Yes, precisely. We're talking about embedded systems small enough that
you're booting with init=/your/app and don't even call fork(), where you
know exactly what code you're putting in and what libraries you use.
And they're almost certainly not running glibc.
> >RESULTS: A tinyconfig bloat-o-meter score for the entire patch-set:
> >
> >add/remove: 0/41 grow/shrink: 5/7 up/down: 23/-8422 (-8399)
>
> The summary is that this patch saves around 8 KB of code space --
> is that correct?
Right. For reference, we're talking about kernels where the *total*
size is a few hundred kB.
> How much storage space do embedded systems have nowadays?
For the embedded systems we're targeting for the tinification effort, in
a first pass: 512k-2M of storage (often for an *uncompressed* kernel, to
support execute-in-place), and 128k-512k of memory. We've successfully
built useful kernels and userspaces for such environments, and we'd like
to go even smaller.
- Josh Triplett
On Tue, 25 Nov 2014 14:04:41 -0500 (EST)
David Miller <[email protected]> wrote:
> From: [email protected]
> Date: Tue, 25 Nov 2014 10:53:10 -0800
>
> > It's not a "slippery slope"; it's been our standard practice for ages.
>
> We've never put an entire class of generic system calls behind
> a config option.
Try running an original MCC Linux binary and C lib on a current kernel
We've put *entire binary formats* behind a config option. We've put older
syscalls behind it, we've put sysfs behind it, sysctl behind it, the
older microcode interfaces behind it, ISA bus as a concept behind
options. VDSO, IPC, even 32bit support ... the list goes on and on.
I'd say those were far more generic on the whole than splice/sendfile.
Alan