Hi, all
This series aims to add DCE based DSE support, here is the first
revision of the RFC patchset [1], the whole series includes three parts,
here is the Part1.
This Part1 adds basic DCE based DSE support.
Part2 will further eliminate the unused syscalls forcely kept by the
exception tables.
Part3 will add DSE test support with nolibc-test.c.
Changes from RFC patchset [1]:
- The DCE support [2] for RISC-V has been merged [3]
- The "nolibc: Record used syscalls in their own sections" [4] will be
delayed to Part3
- Add debug support for DCE
- Further allows CONFIG_USED_SYSCALLS accept a file stores used syscalls
- Now, only accepts symbolic syscalls, not support integral number again
- Works with newly added riscv syscalls suffix: __riscv_
- Further trims the syscall tables by removing the tailing invalid parts
The nolibc-test based initrd run well on riscv64 kernel image with dead
syscalls eliminated:
$ nm build/riscv64/virt/linux/v6.6-rc2/vmlinux | grep "T __riscv_sys" | grep -v sys_ni_syscall | wc -l
48
These options should be enabled:
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION_DEBUG=y
CONFIG_TRIM_UNUSED_SYSCALLS=y
CONFIG_USED_SYSCALLS="sys_dup sys_dup3 sys_ioctl sys_mknodat sys_mkdirat sys_unlinkat sys_symlinkat sys_linkat sys_mount sys_chdir sys_chroot sys_fchmodat sys_fchownat sys_openat sys_close sys_pipe2 sys_getdents64 sys_lseek sys_read sys_write sys_pselect6 sys_ppoll sys_exit sys_sched_yield sys_kill sys_reboot sys_getpgid sys_prctl sys_gettimeofday sys_getpid sys_getppid sys_getuid sys_geteuid sys_brk sys_munmap sys_clone sys_execve sys_mmap sys_wait4 sys_statx"
The really used syscalls:
$ echo "sys_dup sys_dup3 sys_ioctl sys_mknodat sys_mkdirat sys_unlinkat sys_symlinkat sys_linkat sys_mount sys_chdir sys_chroot sys_fchmodat sys_fchownat sys_openat sys_close sys_pipe2 sys_getdents64 sys_lseek sys_read sys_write sys_pselect6 sys_ppoll sys_exit sys_sched_yield sys_kill sys_reboot sys_getpgid sys_prctl sys_gettimeofday sys_getpid sys_getppid sys_getuid sys_geteuid sys_brk sys_munmap sys_clone sys_execve sys_mmap sys_wait4 sys_statx" | tr ' ' '\n' | wc -l
40
Thanks to Yuan Tan, he has researched and verified the elimination of
the unused syscalls forcely kept by the exception tables, both section
group and section link order attributes of ld work. part2 will be sent
out soon to further remove another 8 unused syscalls and eventually we
are able to run a dead loop application on a kernel image without
syscalls.
Best Regards,
Zhangjin Wu
---
[1]: https://lore.kernel.org/lkml/[email protected]/
[2]: https://lore.kernel.org/lkml/234017be6d06ef84844583230542e31068fa3685.1676594211.git.falcon@tinylab.org/
[3]: https://lore.kernel.org/lkml/CAFP8O3+41QFVyNTVJ2iZYkB0tqnvdLTAoGShgGy-qPP1PHjBEw@mail.gmail.com/
[4]: https://lore.kernel.org/lkml/cbcbfbb37cabfd9aed6088c75515e4ea86006cff.1676594211.git.falcon@tinylab.org/
Zhangjin Wu (7):
DCE: add debug support
DCE/DSE: add unused syscalls elimination configure support
DCE/DSE: Add a new scripts/Makefile.syscalls
DCE/DSE: mips: add HAVE_TRIM_UNUSED_SYSCALLS support
DCE/DSE: riscv: move syscall tables to syscalls/
DCE/DSE: riscv: add HAVE_TRIM_UNUSED_SYSCALLS support
DCE/DSE: riscv: trim syscall tables
Makefile | 3 +
arch/mips/Kconfig | 1 +
arch/mips/kernel/syscalls/Makefile | 23 ++++++-
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/unistd.h | 2 +
arch/riscv/kernel/Makefile | 7 +-
arch/riscv/kernel/syscalls/Makefile | 69 +++++++++++++++++++
.../{ => syscalls}/compat_syscall_table.c | 4 +-
.../kernel/{ => syscalls}/syscall_table.c | 4 +-
init/Kconfig | 49 +++++++++++++
scripts/Makefile.syscalls | 29 ++++++++
11 files changed, 182 insertions(+), 10 deletions(-)
create mode 100644 arch/riscv/kernel/syscalls/Makefile
rename arch/riscv/kernel/{ => syscalls}/compat_syscall_table.c (82%)
rename arch/riscv/kernel/{ => syscalls}/syscall_table.c (83%)
create mode 100644 scripts/Makefile.syscalls
--
2.25.1
When CONFIG_TRIM_UNUSED_SYSCALLS is enabled, get used syscalls from
CONFIG_USED_SYSCALLS. CONFIG_USED_SYSCALLS may be a list of used
syscalls or a file to store such a list.
If CONFIG_USED_SYSCALLS is configured as a list of the used syscalls,
directly record them in a used_syscalls variable, if it is a file to
store the list, record the file name to the used_syscalls_file variable
and put its content to the used_syscalls variable.
Signed-off-by: Zhangjin Wu <[email protected]>
---
scripts/Makefile.syscalls | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
create mode 100644 scripts/Makefile.syscalls
diff --git a/scripts/Makefile.syscalls b/scripts/Makefile.syscalls
new file mode 100644
index 000000000000..5864d3a85996
--- /dev/null
+++ b/scripts/Makefile.syscalls
@@ -0,0 +1,29 @@
+# SPDX-License-Identifier: GPL-2.0
+
+ifndef SCRIPTS_MAKEFILE_SYSCALLS
+ SCRIPTS_MAKEFILE_SYSCALLS = 1
+
+ ifdef CONFIG_TRIM_UNUSED_SYSCALLS
+ ifneq ($(wildcard $(CONFIG_USED_SYSCALLS)),)
+ used_syscalls_file = $(CONFIG_USED_SYSCALLS)
+ ifeq ($(shell test -s $(used_syscalls_file); echo $$?),0)
+ used_syscalls != cat $(CONFIG_USED_SYSCALLS)
+ endif
+ else
+ ifeq ($(subst /,,$(CONFIG_USED_SYSCALLS)),$(CONFIG_USED_SYSCALLS))
+ used_syscalls = $(CONFIG_USED_SYSCALLS)
+ else
+ $(error No such file: $(CONFIG_USED_SYSCALLS))
+ endif
+ endif
+
+ ifneq ($(used_syscalls),)
+ used_syscalls := $(subst $(space),|,$(strip $(used_syscalls)))
+ endif
+
+ used_syscalls_deps = $(used_syscalls_file) $(objtree)/.config
+
+ export used_syscalls used_syscalls_deps
+ endif # CONFIG_TRIM_UNUSED_SYSCALLS
+
+endif # SCRIPTS_MAKEFILE_SYSCALLS
--
2.25.1
A minimal embedded Linux system may only has a very few of functions and
only uses a minimal subset of the posix syscalls, the unused syscalls
will never be used and eventually in a dead status, that also means disk
storage and memory footprint waste.
Based on dead code elimination support, it is able to further eliminate
the above dead or unused syscalls.
Firstly, both a new common CONFIG_TRIM_UNUSED_SYSCALLS option and a new
architecture specific HAVE_TRIM_UNUSED_SYSCALLS are added to enable or
disable such feature.
Secondly, a new CONFIG_USED_SYSCALLS option is added to allow configure
the syscalls used in a target system. CONFIG_USED_SYSCALLS can be a list
of the used syscalls or a file to store such a list.
Based on the above options, it is able to only reserve the used syscalls
and let CONFIG_LD_DEAD_CODE_DATA_ELIMINATION trim the unused ones for us
automatically.
Signed-off-by: Zhangjin Wu <[email protected]>
---
init/Kconfig | 42 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/init/Kconfig b/init/Kconfig
index 4350d8ba7db4..aa648ce8bca1 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1457,6 +1457,11 @@ config BPF
bool
select CRYPTO_LIB_SHA1
+config HAVE_TRIM_UNUSED_SYSCALLS
+ bool
+ depends on HAVE_LD_DEAD_CODE_DATA_ELIMINATION
+ default n
+
menuconfig EXPERT
bool "Configure standard kernel features (expert users)"
# Unhide debug options, to make the on-by-default options visible
@@ -1683,6 +1688,43 @@ config MEMBARRIER
If unsure, say Y.
+config TRIM_UNUSED_SYSCALLS
+ bool "Trim unused syscalls (EXPERIMENTAL)" if EXPERT
+ default n
+ depends on HAVE_TRIM_UNUSED_SYSCALLS
+ depends on HAVE_LD_DEAD_CODE_DATA_ELIMINATION
+ select LD_DEAD_CODE_DATA_ELIMINATION
+ help
+ Say Y here to trim all of the unused syscalls for a target system.
+
+ Note, this is only for minimal embedded systems, please don't use it
+ for generic Linux distributions.
+
+ If unsure, say N.
+
+config USED_SYSCALLS
+ string "Configure used syscalls (EXPERIMENTAL)" if EXPERT
+ depends on TRIM_UNUSED_SYSCALLS
+ default ""
+ help
+ This option allows to configure the syscalls used in a target system,
+ the unused ones will be disabled and trimmed by TRIM_UNUSED_SYSCALLS.
+
+ The used syscalls should be listed one by one like this:
+
+ write exit reboot
+
+ Or put them into a file specified by this option, one syscall per
+ line is recommended for such a config file:
+
+ write
+ exit
+ reboot
+
+ Note, If keep this empty, all of the syscalls will be trimmed.
+
+ If unsure, please disable TRIM_UNUSED_SYSCALLS.
+
config KALLSYMS
bool "Load all symbols for debugging/ksymoops" if EXPERT
default y
--
2.25.1
Enable --print-gc-sections for --gc-sections to monitor which sections
are really eliminated.
Signed-off-by: Zhangjin Wu <[email protected]>
---
Makefile | 3 +++
init/Kconfig | 7 +++++++
2 files changed, 10 insertions(+)
diff --git a/Makefile b/Makefile
index 57698d048e2c..a4e522b747cb 100644
--- a/Makefile
+++ b/Makefile
@@ -938,6 +938,9 @@ ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
KBUILD_CFLAGS_KERNEL += -ffunction-sections -fdata-sections
KBUILD_RUSTFLAGS_KERNEL += -Zfunction-sections=y
LDFLAGS_vmlinux += --gc-sections
+ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION_DEBUG
+LDFLAGS_vmlinux += --print-gc-sections
+endif
endif
ifdef CONFIG_SHADOW_CALL_STACK
diff --git a/init/Kconfig b/init/Kconfig
index 6d35728b94b2..4350d8ba7db4 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1404,6 +1404,13 @@ config LD_DEAD_CODE_DATA_ELIMINATION
present. This option is not well tested yet, so use at your
own risk.
+config LD_DEAD_CODE_DATA_ELIMINATION_DEBUG
+ bool "Debug dead code and data elimination (EXPERIMENTAL)"
+ depends on LD_DEAD_CODE_DATA_ELIMINATION
+ default n
+ help
+ Enable --print-gc-sections for --gc-sections
+
config LD_ORPHAN_WARN
def_bool y
depends on ARCH_WANT_LD_ORPHAN_WARN
--
2.25.1
For HAVE_TRIM_UNUSED_SYSCALLS, the syscall tables are hacked with the
input used syscalls.
Based on the used syscalls information, a new version of tbl file is
generated from the original tbl file and named with a 'used' suffix.
With this new tbl file, both unistd_nr_*.h and syscall_table_*.h files
are updated to only include the used syscalls.
$ grep _Linux_syscalls -ur arch/mips/include/generated/asm/
arch/mips/include/generated/asm/unistd_nr_n64.h:#define __NR_64_Linux_syscalls 165
arch/mips/include/generated/asm/unistd_nr_n32.h:#define __NR_N32_Linux_syscalls 165
arch/mips/include/generated/asm/unistd_nr_o32.h:#define __NR_O32_Linux_syscalls 89
$ grep -vr sys_ni_syscall arch/mips/include/generated/asm/syscall_table_*.h
arch/mips/include/generated/asm/syscall_table_n32.h:__SYSCALL(58, sys_exit)
arch/mips/include/generated/asm/syscall_table_n32.h:__SYSCALL(164, sys_reboot)
arch/mips/include/generated/asm/syscall_table_n64.h:__SYSCALL(58, sys_exit)
arch/mips/include/generated/asm/syscall_table_n64.h:__SYSCALL(164, sys_reboot)
arch/mips/include/generated/asm/syscall_table_o32.h:__SYSCALL(1, sys_exit)
arch/mips/include/generated/asm/syscall_table_o32.h:__SYSCALL(88, sys_reboot)
Signed-off-by: Zhangjin Wu <[email protected]>
---
arch/mips/Kconfig | 1 +
arch/mips/kernel/syscalls/Makefile | 23 +++++++++++++++++++++--
2 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index bc8421859006..8a6927eff23d 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -89,6 +89,7 @@ config MIPS
select HAVE_SPARSE_SYSCALL_NR
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
+ select HAVE_TRIM_UNUSED_SYSCALLS if HAVE_LD_DEAD_CODE_DATA_ELIMINATION
select HAVE_VIRT_CPU_ACCOUNTING_GEN if 64BIT || !SMP
select IRQ_FORCED_THREADING
select ISA if EISA
diff --git a/arch/mips/kernel/syscalls/Makefile b/arch/mips/kernel/syscalls/Makefile
index e6b21de65cca..1e292a9f84a0 100644
--- a/arch/mips/kernel/syscalls/Makefile
+++ b/arch/mips/kernel/syscalls/Makefile
@@ -26,10 +26,29 @@ sysnr_pfx_unistd_nr_n32 := N32
sysnr_pfx_unistd_nr_n64 := 64
sysnr_pfx_unistd_nr_o32 := O32
-$(kapi)/unistd_nr_%.h: $(src)/syscall_%.tbl $(sysnr) FORCE
+ifndef CONFIG_TRIM_UNUSED_SYSCALLS
+tbl = $(src)/syscall_%.tbl
+else
+
+include $(srctree)/scripts/Makefile.syscalls
+
+orig_tbl = $(src)/syscall_%.tbl
+ tbl_dir = arch/$(SRCARCH)/include/generated/tbl
+ tbl = $(tbl_dir)/syscall_used_%.tbl
+
+quiet_cmd_used = USED $@
+ cmd_used = sed -E -e "/^[0-9]*[[:space:]]/{/(^($(used_syscalls))[[:space:]]|[[:space:]]($(used_syscalls))[[:space:]]|[[:space:]]($(used_syscalls))$$)/!{s/^/\#/g}}" $< > $@;
+
+$(tbl): $(orig_tbl) $(used_syscalls_deps) FORCE
+ $(Q)mkdir -p $(tbl_dir)
+ $(call cmd,used)
+
+endif
+
+$(kapi)/unistd_nr_%.h: $(tbl) $(sysnr) FORCE
$(call if_changed,sysnr)
-$(kapi)/syscall_table_%.h: $(src)/syscall_%.tbl $(systbl) FORCE
+$(kapi)/syscall_table_%.h: $(tbl) $(systbl) FORCE
$(call if_changed,systbl)
uapisyshdr-y += unistd_n32.h \
--
2.25.1
Both syscall table and compat syscall table share some dead syscalls
elimination code, to avoid cluttering the main RISC-V kernel Makefile,
let's move these tables and the corresponding compile settings to
syscalls/.
Signed-off-by: Zhangjin Wu <[email protected]>
---
arch/riscv/kernel/Makefile | 5 +----
arch/riscv/kernel/syscalls/Makefile | 10 ++++++++++
.../riscv/kernel/{ => syscalls}/compat_syscall_table.c | 0
arch/riscv/kernel/{ => syscalls}/syscall_table.c | 0
4 files changed, 11 insertions(+), 4 deletions(-)
create mode 100644 arch/riscv/kernel/syscalls/Makefile
rename arch/riscv/kernel/{ => syscalls}/compat_syscall_table.c (100%)
rename arch/riscv/kernel/{ => syscalls}/syscall_table.c (100%)
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 95cf25d48405..40aebbf06880 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -8,8 +8,6 @@ CFLAGS_REMOVE_ftrace.o = $(CC_FLAGS_FTRACE)
CFLAGS_REMOVE_patch.o = $(CC_FLAGS_FTRACE)
CFLAGS_REMOVE_sbi.o = $(CC_FLAGS_FTRACE)
endif
-CFLAGS_syscall_table.o += $(call cc-option,-Wno-override-init,)
-CFLAGS_compat_syscall_table.o += $(call cc-option,-Wno-override-init,)
ifdef CONFIG_KEXEC
AFLAGS_kexec_relocate.o := -mcmodel=medany $(call cc-option,-mno-relax)
@@ -48,7 +46,7 @@ obj-y += ptrace.o
obj-y += reset.o
obj-y += setup.o
obj-y += signal.o
-obj-y += syscall_table.o
+obj-y += syscalls/
obj-y += sys_riscv.o
obj-y += time.o
obj-y += traps.o
@@ -95,7 +93,6 @@ obj-$(CONFIG_JUMP_LABEL) += jump_label.o
obj-$(CONFIG_CFI_CLANG) += cfi.o
obj-$(CONFIG_EFI) += efi.o
-obj-$(CONFIG_COMPAT) += compat_syscall_table.o
obj-$(CONFIG_COMPAT) += compat_signal.o
obj-$(CONFIG_COMPAT) += compat_vdso/
diff --git a/arch/riscv/kernel/syscalls/Makefile b/arch/riscv/kernel/syscalls/Makefile
new file mode 100644
index 000000000000..65abd0871ee5
--- /dev/null
+++ b/arch/riscv/kernel/syscalls/Makefile
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the RISC-V syscall tables
+#
+
+CFLAGS_syscall_table.o += $(call cc-option,-Wno-override-init,)
+CFLAGS_compat_syscall_table.o += $(call cc-option,-Wno-override-init,)
+
+obj-y += syscall_table.o
+obj-$(CONFIG_COMPAT) += compat_syscall_table.o
diff --git a/arch/riscv/kernel/compat_syscall_table.c b/arch/riscv/kernel/syscalls/compat_syscall_table.c
similarity index 100%
rename from arch/riscv/kernel/compat_syscall_table.c
rename to arch/riscv/kernel/syscalls/compat_syscall_table.c
diff --git a/arch/riscv/kernel/syscall_table.c b/arch/riscv/kernel/syscalls/syscall_table.c
similarity index 100%
rename from arch/riscv/kernel/syscall_table.c
rename to arch/riscv/kernel/syscalls/syscall_table.c
--
2.25.1
For HAVE_TRIM_UNUSED_SYSCALLS, the syscall tables are hacked with the
inputing unused_syscalls.
Firstly, the intermediate preprocessed .i files are generated from the
original C version of syscall tables respectively, and named with a
'used' suffix: syscall_table_used.i, compat_syscall_table_used.i.
Secondly, all of the unused syscalls are commented.
At last, two new objective files sufixed with 'used' are generated from
the hacked .i files and they are linked into the eventual kernel image.
Signed-off-by: Zhangjin Wu <[email protected]>
---
arch/riscv/Kconfig | 1 +
arch/riscv/kernel/syscalls/Makefile | 37 +++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index d607ab0f7c6d..b5e726b49a6f 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -140,6 +140,7 @@ config RISCV
select HAVE_RSEQ
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
+ select HAVE_TRIM_UNUSED_SYSCALLS if HAVE_LD_DEAD_CODE_DATA_ELIMINATION
select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
diff --git a/arch/riscv/kernel/syscalls/Makefile b/arch/riscv/kernel/syscalls/Makefile
index 65abd0871ee5..3b5969aaa9e8 100644
--- a/arch/riscv/kernel/syscalls/Makefile
+++ b/arch/riscv/kernel/syscalls/Makefile
@@ -3,8 +3,45 @@
# Makefile for the RISC-V syscall tables
#
+ifndef CONFIG_TRIM_UNUSED_SYSCALLS
+
CFLAGS_syscall_table.o += $(call cc-option,-Wno-override-init,)
CFLAGS_compat_syscall_table.o += $(call cc-option,-Wno-override-init,)
obj-y += syscall_table.o
obj-$(CONFIG_COMPAT) += compat_syscall_table.o
+else # CONFIG_TRIM_UNUSED_SYSCALLS
+
+include $(srctree)/scripts/Makefile.syscalls
+
+CFLAGS_syscall_table_used.o += $(call cc-option,-Wno-override-init,)
+CFLAGS_compat_syscall_table_used.o += $(call cc-option,-Wno-override-init,)
+
+obj-y += syscall_table_used.o
+obj-$(CONFIG_COMPAT) += compat_syscall_table_used.o
+
+# comment the unused syscalls
+quiet_cmd_used = USED $@
+ cmd_used = sed -E -e '/^\[([0-9]+|\([0-9]+ \+ [0-9]+\))\] = /{/= *__riscv_(__sys_|sys_|compat_)*($(used_syscalls)),/!{s%^%/* %g;s%$$% */%g}}' -i $@;
+
+$(obj)/syscall_table_used.c: $(src)/syscall_table.c
+ $(Q)cp $< $@
+
+$(obj)/syscall_table_used.i: $(src)/syscall_table_used.c $(used_syscalls_deps) FORCE
+ $(call if_changed_dep,cpp_i_c)
+ $(call cmd,used)
+
+$(obj)/syscall_table_used.o: $(obj)/syscall_table_used.i FORCE
+ $(call if_changed,cc_o_c)
+
+$(obj)/compat_syscall_table_used.c: $(src)/compat_syscall_table.c
+ $(Q)cp $< $@
+
+$(obj)/compat_syscall_table_used.i: $(src)/compat_syscall_table_used.c $(used_syscalls_deps) FORCE
+ $(call if_changed_dep,cpp_i_c)
+ $(call cmd,used)
+
+$(obj)/compat_syscall_table_used.o: $(obj)/compat_syscall_table_used.i FORCE
+ $(call if_changed,cc_o_c)
+
+endif # CONFIG_TRIM_UNUSED_SYSCALLS
--
2.25.1
When the maximum nr of the used syscalls is smaller than __NR_syscalls
(original syscalls total). It is able to update __NR_syscalls to
(maximum nr + 1) and further trim the '>= (maximum nr + 1)' part of the
syscall tables:
For example:
sys_call_table [143] = {
[0 ... 143 - 1] = sys_ni_syscall,
[64] = sys_write,
[93] = sys_exit,
[142] = sys_reboot,
}
The >= 143 part of the syscall tables can be trimmed.
At the same time, the syscall >= 143 from user space must be ignored
from do_trap_ecall_u() of traps.c.
Signed-off-by: Zhangjin Wu <[email protected]>
---
arch/riscv/include/asm/unistd.h | 2 ++
arch/riscv/kernel/Makefile | 2 ++
arch/riscv/kernel/syscalls/Makefile | 22 +++++++++++++++++++
.../kernel/syscalls/compat_syscall_table.c | 4 ++--
arch/riscv/kernel/syscalls/syscall_table.c | 4 ++--
5 files changed, 30 insertions(+), 4 deletions(-)
diff --git a/arch/riscv/include/asm/unistd.h b/arch/riscv/include/asm/unistd.h
index 221630bdbd07..4d8e41f446ff 100644
--- a/arch/riscv/include/asm/unistd.h
+++ b/arch/riscv/include/asm/unistd.h
@@ -23,4 +23,6 @@
#include <uapi/asm/unistd.h>
+#ifndef NR_syscalls
#define NR_syscalls (__NR_syscalls)
+#endif
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 40aebbf06880..e75424c10729 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -49,7 +49,9 @@ obj-y += signal.o
obj-y += syscalls/
obj-y += sys_riscv.o
obj-y += time.o
+ifneq ($(CONFIG_TRIM_UNUSED_SYSCALLS),y)
obj-y += traps.o
+endif
obj-y += riscv_ksyms.o
obj-y += stacktrace.o
obj-y += cacheinfo.o
diff --git a/arch/riscv/kernel/syscalls/Makefile b/arch/riscv/kernel/syscalls/Makefile
index 3b5969aaa9e8..f1a0597c8b24 100644
--- a/arch/riscv/kernel/syscalls/Makefile
+++ b/arch/riscv/kernel/syscalls/Makefile
@@ -14,9 +14,18 @@ else # CONFIG_TRIM_UNUSED_SYSCALLS
include $(srctree)/scripts/Makefile.syscalls
+# calculate syscalls total from $(obj)/syscall_table_used.i
+ifneq ($(used_syscalls),)
+ NR_syscalls := $$(($$(sed -E -n -e '/^\[([0-9]+|\([0-9]+ \+ [0-9]+\))\] = /{s/^\[(.*)\].*/\1/gp}' $(obj)/syscall_table_used.i | bc | sort -g | tail -1 | grep '[0-9]' || echo -1) + 1))
+else
+ NR_syscalls := 0
+endif
+
+CFLAGS_traps_used.o += -DNR_syscalls=$(NR_syscalls)
CFLAGS_syscall_table_used.o += $(call cc-option,-Wno-override-init,)
CFLAGS_compat_syscall_table_used.o += $(call cc-option,-Wno-override-init,)
+obj-y += traps_used.o
obj-y += syscall_table_used.o
obj-$(CONFIG_COMPAT) += compat_syscall_table_used.o
@@ -24,15 +33,26 @@ obj-$(CONFIG_COMPAT) += compat_syscall_table_used.o
quiet_cmd_used = USED $@
cmd_used = sed -E -e '/^\[([0-9]+|\([0-9]+ \+ [0-9]+\))\] = /{/= *__riscv_(__sys_|sys_|compat_)*($(used_syscalls)),/!{s%^%/* %g;s%$$% */%g}}' -i $@;
+# update the syscalls total
+quiet_cmd_snr = SNR $@
+ cmd_snr = snr=$(NR_syscalls); if [ $$snr -ne 0 ]; then \
+ sed -i -e "s/sys_call_table\[.*\] =/sys_call_table[($$snr)] =/g;s/\[0 ... (.*) - 1\] = __riscv_sys_ni_syscall/[0 ... ($$snr) - 1] = __riscv_sys_ni_syscall/g" $@; \
+ fi;
+
+$(obj)/traps_used.c: $(src)/../traps.c $(obj)/syscall_table_used.i FORCE
+ $(Q)cp $< $@
+
$(obj)/syscall_table_used.c: $(src)/syscall_table.c
$(Q)cp $< $@
$(obj)/syscall_table_used.i: $(src)/syscall_table_used.c $(used_syscalls_deps) FORCE
$(call if_changed_dep,cpp_i_c)
$(call cmd,used)
+ $(call cmd,snr)
$(obj)/syscall_table_used.o: $(obj)/syscall_table_used.i FORCE
$(call if_changed,cc_o_c)
+ $(call cmd,force_checksrc)
$(obj)/compat_syscall_table_used.c: $(src)/compat_syscall_table.c
$(Q)cp $< $@
@@ -40,8 +60,10 @@ $(obj)/compat_syscall_table_used.c: $(src)/compat_syscall_table.c
$(obj)/compat_syscall_table_used.i: $(src)/compat_syscall_table_used.c $(used_syscalls_deps) FORCE
$(call if_changed_dep,cpp_i_c)
$(call cmd,used)
+ $(call cmd,snr)
$(obj)/compat_syscall_table_used.o: $(obj)/compat_syscall_table_used.i FORCE
$(call if_changed,cc_o_c)
+ $(call cmd,force_checksrc)
endif # CONFIG_TRIM_UNUSED_SYSCALLS
diff --git a/arch/riscv/kernel/syscalls/compat_syscall_table.c b/arch/riscv/kernel/syscalls/compat_syscall_table.c
index ad7f2d712f5f..4756b6858eac 100644
--- a/arch/riscv/kernel/syscalls/compat_syscall_table.c
+++ b/arch/riscv/kernel/syscalls/compat_syscall_table.c
@@ -17,7 +17,7 @@
asmlinkage long compat_sys_rt_sigreturn(void);
-void * const compat_sys_call_table[__NR_syscalls] = {
- [0 ... __NR_syscalls - 1] = __riscv_sys_ni_syscall,
+void * const compat_sys_call_table[NR_syscalls] = {
+ [0 ... NR_syscalls - 1] = __riscv_sys_ni_syscall,
#include <asm/unistd.h>
};
diff --git a/arch/riscv/kernel/syscalls/syscall_table.c b/arch/riscv/kernel/syscalls/syscall_table.c
index dda913764903..d2b3233ae5d4 100644
--- a/arch/riscv/kernel/syscalls/syscall_table.c
+++ b/arch/riscv/kernel/syscalls/syscall_table.c
@@ -16,7 +16,7 @@
#undef __SYSCALL
#define __SYSCALL(nr, call) [nr] = __riscv_##call,
-void * const sys_call_table[__NR_syscalls] = {
- [0 ... __NR_syscalls - 1] = __riscv_sys_ni_syscall,
+void * const sys_call_table[NR_syscalls] = {
+ [0 ... NR_syscalls - 1] = __riscv_sys_ni_syscall,
#include <asm/unistd.h>
};
--
2.25.1
On Tue, Sep 26, 2023, at 00:42, Zhangjin Wu wrote:
> For HAVE_TRIM_UNUSED_SYSCALLS, the syscall tables are hacked with the
> inputing unused_syscalls.
>
> Firstly, the intermediate preprocessed .i files are generated from the
> original C version of syscall tables respectively, and named with a
> 'used' suffix: syscall_table_used.i, compat_syscall_table_used.i.
>
> Secondly, all of the unused syscalls are commented.
>
> At last, two new objective files sufixed with 'used' are generated from
> the hacked .i files and they are linked into the eventual kernel image.
>
> Signed-off-by: Zhangjin Wu <[email protected]>
As mentioned in my comment on the mips patch, hacking the preprocessed
file here is too much strain on the old infrastructure, the
asm-generic/unistd.h file is already too hard to understand for
anyone and in need of an overhaul, so let's work together on fixing
it up first.
Arnd
On Tue, Sep 26, 2023, at 00:33, Zhangjin Wu wrote:
>
> This series aims to add DCE based DSE support, here is the first
> revision of the RFC patchset [1], the whole series includes three parts,
> here is the Part1.
>
> This Part1 adds basic DCE based DSE support.
>
> Part2 will further eliminate the unused syscalls forcely kept by the
> exception tables.
>
> Part3 will add DSE test support with nolibc-test.c.
I missed the RFC version, but I think this is a useful thing to
have overall, though it will probably need to go through a couple
of revisions and rewrites, mostly to ensure we are not adding
complexity that gets in the way of other improvements I would
like to see to the syscall entry handling.
It would be nice to include some size numbers here for at least
one practical use case. If you have a defconfig for a shipping
product with a small kernel, what is the 'size -B' output you
see comparing with and without DCE and, and with DCE+DSE?
There is generally not much work going into micro-optimizing
the size of the kernel image any more, for a number of reasons,
but if you are able to show that this is a noticeable improvement,
we should be able to find a way to do it. Geert is doing statistics
about size bloat over time, and anything that undoes a couple
of years worth of bloat would clearly be significant here.
Another alternative would be to resume the work done by Nicolas
Pitre, who added Kconfig symbols for controlling groups of
system calls. Since we already have a number of those compile
time options, adding more of them should generally be
less controversial and more consistent, while bringing most
of the same benefits.
Arnd
On Tue, Sep 26, 2023, at 00:38, Zhangjin Wu wrote:
> When CONFIG_TRIM_UNUSED_SYSCALLS is enabled, get used syscalls from
> CONFIG_USED_SYSCALLS. CONFIG_USED_SYSCALLS may be a list of used
> syscalls or a file to store such a list.
>
> If CONFIG_USED_SYSCALLS is configured as a list of the used syscalls,
> directly record them in a used_syscalls variable, if it is a file to
> store the list, record the file name to the used_syscalls_file variable
> and put its content to the used_syscalls variable.
>
> Signed-off-by: Zhangjin Wu <[email protected]>
I like the idea of configuring the set of syscalls more, but we
should probably discuss the implementation of this here. You
introduce two new ways of doing this, on top of the existing
coarse-grained method (per syscall class Kconfig symbols).
Both methods seem a little awkward to me, but are doable
in principle if we can't come up with a better way. However,
I'd much prefer to not add both the Kconfig symbol and the
extra file here, since at least one of them is redundant.
Do you have automatic tooling to generate these lists from
a profile, or do you require manually writing them? Do you
have an example list?
Arnd
On Tue, Sep 26, 2023, at 00:43, Zhangjin Wu wrote:
> When the maximum nr of the used syscalls is smaller than __NR_syscalls
> (original syscalls total). It is able to update __NR_syscalls to
> (maximum nr + 1) and further trim the '>= (maximum nr + 1)' part of the
> syscall tables:
>
> For example:
>
> sys_call_table [143] = {
> [0 ... 143 - 1] = sys_ni_syscall,
> [64] = sys_write,
> [93] = sys_exit,
> [142] = sys_reboot,
> }
>
> The >= 143 part of the syscall tables can be trimmed.
>
> At the same time, the syscall >= 143 from user space must be ignored
> from do_trap_ecall_u() of traps.c.
>
> Signed-off-by: Zhangjin Wu <[email protected]>
> ---
> arch/riscv/include/asm/unistd.h | 2 ++
> arch/riscv/kernel/Makefile | 2 ++
> arch/riscv/kernel/syscalls/Makefile | 22 +++++++++++++++++++
> .../kernel/syscalls/compat_syscall_table.c | 4 ++--
> arch/riscv/kernel/syscalls/syscall_table.c | 4 ++--
> 5 files changed, 30 insertions(+), 4 deletions(-)
This bit feels like you are overoptimizing for a corner case:
there is not much to be gained in terms of memory savings, but
you add complexity in an area that I feel should be made common
between architectures.
I hope to get back to working on consolidating both the
syscall.tbl input files and the build infrastructure for them
across architectures, and you make that harder here, so I'd
prefer you to drop this part, at least until the code is
shared across all architectures.
Arnd
On Tue, Sep 26, 2023, at 00:40, Zhangjin Wu wrote:
> For HAVE_TRIM_UNUSED_SYSCALLS, the syscall tables are hacked with the
> input used syscalls.
>
> Based on the used syscalls information, a new version of tbl file is
> generated from the original tbl file and named with a 'used' suffix.
>
> With this new tbl file, both unistd_nr_*.h and syscall_table_*.h files
> are updated to only include the used syscalls.
>
> $ grep _Linux_syscalls -ur arch/mips/include/generated/asm/
> arch/mips/include/generated/asm/unistd_nr_n64.h:#define
> __NR_64_Linux_syscalls 165
> arch/mips/include/generated/asm/unistd_nr_n32.h:#define
> __NR_N32_Linux_syscalls 165
> arch/mips/include/generated/asm/unistd_nr_o32.h:#define
> __NR_O32_Linux_syscalls 89
>
> $ grep -vr sys_ni_syscall
> arch/mips/include/generated/asm/syscall_table_*.h
> arch/mips/include/generated/asm/syscall_table_n32.h:__SYSCALL(58,
> sys_exit)
> arch/mips/include/generated/asm/syscall_table_n32.h:__SYSCALL(164,
> sys_reboot)
> arch/mips/include/generated/asm/syscall_table_n64.h:__SYSCALL(58,
> sys_exit)
My feeling is that instead of postprocessing the generated files,
it would be much better to make the elimination part of the
existing infrastructure that generates the files from syscall.tbl,
and finally change the include/asm-generic/unistd.h to the
same format, as we had planned for a long time.
I should be able to help out with that part.
Arnd
On Tue, Sep 26, 2023, at 13:24, Arnd Bergmann wrote:
> $ size build/tmp/vmlinux-*
> text data bss dec hex filename
> 754772 220016 71841 1046629 ff865 vmlinux-tinyconfig
> 717500 223368 71841 1012709 f73e5 vmlinux-tiny+nosyscalls
> 567310 176200 71473 814983 c6f87 vmlinux-tiny+gc-sections
> 493278 170752 71433 735463 b38e7 vmlinux-tiny+gc-sections+nosyscalls
> 10120058 3572756 493701 14186515 d87813 vmlinux-defconfig
> 9953934 3529004 491525 13974463 d53bbf vmlinux-defconfig+gc
> 9709856 3500600 489221 13699677 d10a5d vmlinux-defconfig+gc+nosyscalls
>
> This would put us at an upper bound of 10% size savings (80kb) for
> tinyconfig, which is clearly significant. For defconfig, it's
> still 2.0% or 275kb size reduction when all syscalls are dropped.
I did one more test to see which syscalls actually cause bloat in
when CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is set in order to drop them
all. I build the above riscv tinyconfig with
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION and truncated the syscall
table before and after each syscall to see the size difference.
A lot of syscalls are already conditional, so those show up as
0, 4 or 8 bytes (not sure why they are not always 0). Others
could probably be made to fit within some category that can
be made optional (e.g. xattr or adjtimex). Having a Kconfig
option for those would also let users remove even more code that
is not useful without the syscalls but might be called from
somewhere else in the kernel.
Arnd
syscall size name
-------------------------
0 8 io_setup
1 4 io_destroy
2 8 io_submit
3 4 io_cancel
4 8 io_getevents
5 1496 setxattr
6 28 lsetxattr
7 148 fsetxattr
8 1404 getxattr
9 16 lgetxattr
10 80 fgetxattr
11 276 listxattr
12 16 llistxattr
13 68 flistxattr
14 460 removexattr
15 20 lremovexattr
16 92 fremovexattr
17 240 getcwd
18 4 lookup_dcookie
19 8 eventfd2
20 4 epoll_create1
21 8 epoll_ctl
22 4 epoll_pwait
23 64 dup
24 300 dup3
25 1684 fcntl
26 4 inotify_init1
27 8 inotify_add_watch
29 0 ioctl
28 4 inotify_rm_watch
30 8 ioprio_set
31 4 ioprio_get
32 8 flock
33 456 mknodat
34 192 mkdirat
35 64 unlinkat
36 208 symlinkat
38 0 renameat
37 324 linkat
40 0 mount
39 64 umount2
42 0 nfsservctl
41 708 pivot_root
43 424 statfs
44 132 fstatfs
45 272 truncate
46 216 ftruncate
47 88 fallocate
48 420 faccessat
49 120 chdir
50 112 fchdir
51 120 chroot
52 68 fchmod
53 164 fchmodat
54 184 fchownat
55 136 fchown
56 184 openat
57 204 close
58 4 vhangup
59 648 pipe2
61 0 getdents64
60 4 quotactl
62 148 lseek
63 328 read
64 356 write
65 952 readv
66 252 writev
67 92 pread64
68 92 pwrite64
69 100 preadv
71 0 sendfile
72 0 pselect6
70 100 pwritev
73 132 ppoll
74 4 signalfd4
75 2808 vmsplice
76 1388 splice
77 536 tee
78 424 readlinkat
79 244 fstatat
80 64 fstat
81 296 sync
82 100 fsync
83 20 fdatasync
84 448 sync_file_range
85 8 timerfd_create
86 4 timerfd_settime
87 8 timerfd_gettime
88 300 utimensat
89 4 acct
90 8 capget
91 4 capset
92 24 personality
93 24 exit
94 24 exit_group
95 16 waitid
96 28 set_tid_address
97 608 unshare
98 4 futex
99 8 set_robust_list
100 4 get_robust_list
101 276 nanosleep
103 0 setitimer
102 8 getitimer
104 4 kexec_load
105 8 init_module
107 0 timer_create
108 0 timer_gettime
109 0 timer_getoverrun
110 0 timer_settime
111 0 timer_delete
106 4 delete_module
112 44 clock_settime
113 88 clock_gettime
114 64 clock_getres
115 160 clock_nanosleep
116 8 syslog
117 740 ptrace
118 140 sched_setparam
119 36 sched_setscheduler
120 64 sched_getscheduler
121 88 sched_getparam
122 196 sched_setaffinity
123 180 sched_getaffinity
124 24 sched_yield
125 60 sched_get_priority_max
126 60 sched_get_priority_min
127 164 sched_rr_get_interval
128 12 restart_syscall
129 304 kill
130 212 tkill
131 40 tgkill
132 100 sigaltstack
133 104 rt_sigsuspend
134 396 rt_sigaction
135 180 rt_sigprocmask
136 76 rt_sigpending
137 336 rt_sigtimedwait
139 0 rt_sigreturn
138 120 rt_sigqueueinfo
140 396 setpriority
141 276 getpriority
142 1256 reboot
143 4 setregid
144 8 setgid
145 4 setreuid
146 8 setuid
147 4 setresuid
148 8 getresuid
149 4 setresgid
150 8 getresgid
151 4 setfsuid
152 8 setfsgid
153 152 times
154 252 setpgid
155 48 getpgid
156 48 getsid
157 140 setsid
158 8 getgroups
159 4 setgroups
160 172 uname
161 132 sethostname
162 136 setdomainname
163 156 getrlimit
164 52 setrlimit
165 88 getrusage
167 0 prctl
168 0 getcpu
169 0 gettimeofday
170 0 settimeofday
166 24 umask
171 1514 adjtimex
172 20 getpid
173 20 getppid
174 4 getuid
175 4 geteuid
176 4 getgid
177 4 getegid
178 20 gettid
179 276 sysinfo
180 4 mq_open
181 8 mq_unlink
182 4 mq_timedsend
183 8 mq_timedreceive
184 4 mq_notify
185 8 mq_getsetattr
186 4 msgget
187 8 msgctl
188 4 msgrcv
189 8 msgsnd
190 4 semget
191 8 semctl
192 4 semtimedop
193 8 semop
194 4 shmget
195 8 shmctl
196 4 shmat
197 8 shmdt
198 4 socket
199 8 socketpair
200 4 bind
201 8 listen
202 4 accept
203 8 connect
204 4 getsockname
205 8 getpeername
206 4 sendto
207 8 recvfrom
208 4 setsockopt
209 8 getsockopt
210 4 shutdown
211 8 sendmsg
212 4 recvmsg
213 460 readahead
214 2872 brk
215 288 munmap
216 4268 mremap
217 4 add_key
218 8 request_key
219 4 keyctl
220 100 clone
221 724 execve
222 2504 mmap
223 8 fadvise64
224 4 swapon
225 8 swapoff
226 2180 mprotect
227 320 msync
228 1140 mlock
229 84 munlock
230 304 mlockall
231 52 munlockall
232 828 mincore
233 4 madvise
234 324 remap_file_pages
235 4 mbind
236 8 get_mempolicy
237 4 set_mempolicy
238 8 migrate_pages
239 4 move_pages
240 132 rt_tgsigqueueinfo
241 8 perf_event_open
242 4 accept4
244 0 arch_specific_syscall
243 8 recvmmsg
260 100 wait4
261 252 prlimit64
262 8 fanotify_init
263 4 fanotify_mark
264 8 name_to_handle_at
266 0 clock_adjtime
265 4 open_by_handle_at
267 120 syncfs
268 624 setns
269 4 sendmmsg
270 8 process_vm_readv
271 4 process_vm_writev
272 8 kcmp
274 0 sched_setattr
273 4 finit_module
275 208 sched_getattr
276 2364 renameat2
277 4 seccomp
278 124 getrandom
279 4 memfd_create
280 8 bpf
281 52 execveat
282 4 userfaultfd
283 8 membarrier
284 40 mlock2
285 708 copy_file_range
286 32 preadv2
287 32 pwritev2
288 8 pkey_mprotect
289 4 pkey_alloc
290 8 pkey_free
291 356 statx
292 4 io_pgetevents
424 244 pidfd_send_signal
425 8 io_uring_setup
426 4 io_uring_enter
427 8 io_uring_register
428 368 open_tree
429 404 move_mount
430 556 fsopen
431 1056 fsconfig
432 484 fsmount
433 220 fspick
434 124 pidfd_open
435 516 clone3
436 240 close_range
437 120 openat2
438 304 pidfd_getfd
439 12 faccessat2
440 8 process_madvise
441 4 epoll_pwait2
442 1088 mount_setattr
443 8 quotactl_fd
444 4 landlock_create_ruleset
445 8 landlock_add_rule
446 4 landlock_restrict_self
447 8 memfd_secret
448 240 process_mrelease
449 4 futex_waitv
450 8 set_mempolicy_home_node
451 4 cachestat
452 28 fchmodat2
454 4 futex_wake
On Tue, 26 Sep 2023, Arnd Bergmann wrote:
> On Tue, Sep 26, 2023, at 09:14, Arnd Bergmann wrote:
> > On Tue, Sep 26, 2023, at 00:33, Zhangjin Wu wrote:
> >
> > It would be nice to include some size numbers here for at least
> > one practical use case. If you have a defconfig for a shipping
> > product with a small kernel, what is the 'size -B' output you
> > see comparing with and without DCE and, and with DCE+DSE?
>
> To follow up on this myself, for a very rough baseline,
> I tried a riscv tinyconfig build with and without
> CONFIG_LD_DEAD_CODE_DATA_ELIMINATION (this is currently
> not supported on arm, so I did not try it there), and
> then another build with simply *all* system calls stubbed
> out by hacking asm/syscall-wrapper.h:
>
> $ size build/tmp/vmlinux-*
> text data bss dec hex filename
> 754772 220016 71841 1046629 ff865 vmlinux-tinyconfig
> 717500 223368 71841 1012709 f73e5 vmlinux-tiny+nosyscalls
> 567310 176200 71473 814983 c6f87 vmlinux-tiny+gc-sections
> 493278 170752 71433 735463 b38e7 vmlinux-tiny+gc-sections+nosyscalls
> 10120058 3572756 493701 14186515 d87813 vmlinux-defconfig
> 9953934 3529004 491525 13974463 d53bbf vmlinux-defconfig+gc
> 9709856 3500600 489221 13699677 d10a5d vmlinux-defconfig+gc+nosyscalls
>
> This would put us at an upper bound of 10% size savings (80kb) for
> tinyconfig, which is clearly significant. For defconfig, it's
> still 2.0% or 275kb size reduction when all syscalls are dropped.
I did something similar a while ago. Results included here:
https://lwn.net/Articles/746780/
In my case, stubbing out all syscalls produced a 7.8% reduction which
was somewhat disappointing compared to other techniques. Of course it
all depends on what is your actual goal.
Nicolas
On Tue, Sep 26, 2023, at 09:14, Arnd Bergmann wrote:
> On Tue, Sep 26, 2023, at 00:33, Zhangjin Wu wrote:
>
> It would be nice to include some size numbers here for at least
> one practical use case. If you have a defconfig for a shipping
> product with a small kernel, what is the 'size -B' output you
> see comparing with and without DCE and, and with DCE+DSE?
To follow up on this myself, for a very rough baseline,
I tried a riscv tinyconfig build with and without
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION (this is currently
not supported on arm, so I did not try it there), and
then another build with simply *all* system calls stubbed
out by hacking asm/syscall-wrapper.h:
$ size build/tmp/vmlinux-*
text data bss dec hex filename
754772 220016 71841 1046629 ff865 vmlinux-tinyconfig
717500 223368 71841 1012709 f73e5 vmlinux-tiny+nosyscalls
567310 176200 71473 814983 c6f87 vmlinux-tiny+gc-sections
493278 170752 71433 735463 b38e7 vmlinux-tiny+gc-sections+nosyscalls
10120058 3572756 493701 14186515 d87813 vmlinux-defconfig
9953934 3529004 491525 13974463 d53bbf vmlinux-defconfig+gc
9709856 3500600 489221 13699677 d10a5d vmlinux-defconfig+gc+nosyscalls
This would put us at an upper bound of 10% size savings (80kb) for
tinyconfig, which is clearly significant. For defconfig, it's
still 2.0% or 275kb size reduction when all syscalls are dropped.
Arnd
On Tue, Sep 26, 2023, at 22:49, Nicolas Pitre wrote:
> On Tue, 26 Sep 2023, Arnd Bergmann wrote:
>
>> $ size build/tmp/vmlinux-*
>> text data bss dec hex filename
>> 754772 220016 71841 1046629 ff865 vmlinux-tinyconfig
>> 717500 223368 71841 1012709 f73e5 vmlinux-tiny+nosyscalls
>> 567310 176200 71473 814983 c6f87 vmlinux-tiny+gc-sections
>> 493278 170752 71433 735463 b38e7 vmlinux-tiny+gc-sections+nosyscalls
>> 10120058 3572756 493701 14186515 d87813 vmlinux-defconfig
>> 9953934 3529004 491525 13974463 d53bbf vmlinux-defconfig+gc
>> 9709856 3500600 489221 13699677 d10a5d vmlinux-defconfig+gc+nosyscalls
>>
>> This would put us at an upper bound of 10% size savings (80kb) for
>> tinyconfig, which is clearly significant. For defconfig, it's
>> still 2.0% or 275kb size reduction when all syscalls are dropped.
>
> I did something similar a while ago. Results included here:
>
> https://lwn.net/Articles/746780/
>
> In my case, stubbing out all syscalls produced a 7.8% reduction which
> was somewhat disappointing compared to other techniques. Of course it
> all depends on what is your actual goal.
Thanks for the link, I had forgotten about your article.
With all the findings combined, I guess the filtering
at the syscall table level is not all that promising
any more. Going through the list of saved space, I ended up
with 5.7% (47kb) in the best case after I left the 40 syscalls
from the example in this thread.
Removing entire groups of features using normal Kconfig symbols
based on the remaining syscalls that have the largest size
probably gives better results. I can see possible groups
of syscalls that could be disabled under CONFIG_EXPERT,
along with making their underlying infrastructure optional:
- xattr
- ptrace
- adjtimex
- splice/vmsplice/tee
- unshare/setns
- sched_*
After those, one would quickly hit diminishing returns.
Arnd
I don't know why [email protected] reject my email send out by
thunderbird. So here I am resending this mail with git send-email.
Here is a test result about DEAD_CODE_DATA_ELIMINATION (DCE) and dead syscalls
elimination (DSE). It's based on config[1] and a simple hello.c initramfs.
In the DSE test, we set CONFIG_SYSCALLS_USED="sys_write sys_exit
sys_reboot," which is used by hello.c to simply print "Hello" then exit
and shut down qemu.
| | syscall remain | vmlinux size | vmlinux after strip |
| ---------------------------------- | -------------- | ---------------- | ------------------- |
| disable DCE | 236 | 2559632 | 1963400 |
| enable DCE | 208 | 2037384 (-20.4%) | 1485776 (-24.3%) |
| enable DCE and DSE(SHE_GROUP) | 3 | 1856640 (-27.6%) | 1354424 (-31.0%) |
| enable DCE and DSE(SHE_LINK_ORDER) | 3 | 1856664 (-27.6%) | 1354424 (-31.0%) |
It shows that dead syscalls elimination can save 7% of space based on DCE.
[1]: https://pastebin.com/KG4fd7aT
I didn't test DSE with explicit KEEP() in the previous mail. So, I will make up
for it now.
This test result is about DEAD_CODE_DATA_ELIMINATION (DCE) and dead syscalls
elimination (DSE). It's based on config[1] and a simple hello.c initramfs.
We set CONFIG_SYSCALLS_USED="sys_write sys_exit sys_reboot", which is used by
hello.c to simply print "Hello" then exit and shut down qemu.
| | syscall remain | vmlinux size | vmlinux after strip |
| ------------------------------------------------------------ | -------------- | ---------------- | ------------------- |
| disable DCE | 236 | 2559632 | 1963400 |
| enable DCE | 208 | 2037384 (-20.4%) | 1485776 (-24.3%) |
| enable DCE and DSE with explicit KEEP() of except table | 17 | 1899208 (-25.8%) | 1387272 (-29.3%) |
| enable DCE and DSE without KEEP() (By SHF_GROUP method) | 3 | 1856640 (-27.6%) | 1354424 (-31.0%) |
| enable DCE and DSE without KEEP() (By SHE_LINK_ORDER method) | 3 | 1856664 (-27.6%) | 1354424 (-31.0%) |
It shows that dead syscalls elimination can save 7% of space based on DCE.
Although no KEEP() can only save up 2% space, it can reduce the attack surface
and eliminate the misuse of KEEP(). It ensures that every orphan section is not
orphaned anymore.
[1]: https://pastebin.com/KG4fd7aT
Hi Zhangjin,
> A minimal embedded Linux system may only has a very few of functions and
> only uses a minimal subset of the posix syscalls, the unused syscalls
> will never be used and eventually in a dead status, that also means disk
> storage and memory footprint waste.
>
> Based on dead code elimination support, it is able to further eliminate
> the above dead or unused syscalls.
>
> Firstly, both a new common CONFIG_TRIM_UNUSED_SYSCALLS option and a new
> architecture specific HAVE_TRIM_UNUSED_SYSCALLS are added to enable or
> disable such feature.
>
> Secondly, a new CONFIG_USED_SYSCALLS option is added to allow configure
> the syscalls used in a target system. CONFIG_USED_SYSCALLS can be a list
> of the used syscalls or a file to store such a list.
>
> Based on the above options, it is able to only reserve the used syscalls
> and let CONFIG_LD_DEAD_CODE_DATA_ELIMINATION trim the unused ones for us
> automatically.
>
> Signed-off-by: Zhangjin Wu <[email protected]>
> ---
> init/Kconfig | 42 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 42 insertions(+)
>
> diff --git a/init/Kconfig b/init/Kconfig
> index 4350d8ba7db4..aa648ce8bca1 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1457,6 +1457,11 @@ config BPF
> bool
> select CRYPTO_LIB_SHA1
>
> +config HAVE_TRIM_UNUSED_SYSCALLS
> + bool
> + depends on HAVE_LD_DEAD_CODE_DATA_ELIMINATION
> + default n
> +
> menuconfig EXPERT
> bool "Configure standard kernel features (expert users)"
> # Unhide debug options, to make the on-by-default options visible
> @@ -1683,6 +1688,43 @@ config MEMBARRIER
>
> If unsure, say Y.
>
> +config TRIM_UNUSED_SYSCALLS
> + bool "Trim unused syscalls (EXPERIMENTAL)" if EXPERT
> + default n
> + depends on HAVE_TRIM_UNUSED_SYSCALLS
> + depends on HAVE_LD_DEAD_CODE_DATA_ELIMINATION
> + select LD_DEAD_CODE_DATA_ELIMINATION
> + help
> + Say Y here to trim all of the unused syscalls for a target system.
I think changing this sentence to "Say Y here to trim all of the unused
syscalls, excluding those defined in USED_SYSCALLS." would be clearer.
By the way, consider adding the three files syscall_table_used.c,
compat_syscall_table_used.c, and traps_used.c to the .gitignore file.
> +
> + Note, this is only for minimal embedded systems, please don't use it
> + for generic Linux distributions.
> +
> + If unsure, say N.
> +
> +config USED_SYSCALLS
> + string "Configure used syscalls (EXPERIMENTAL)" if EXPERT
> + depends on TRIM_UNUSED_SYSCALLS
> + default ""
> + help
> + This option allows to configure the syscalls used in a target system,
> + the unused ones will be disabled and trimmed by TRIM_UNUSED_SYSCALLS.
> +
> + The used syscalls should be listed one by one like this:
> +
> + write exit reboot
> +
> + Or put them into a file specified by this option, one syscall per
> + line is recommended for such a config file:
> +
> + write
> + exit
> + reboot
> +
> + Note, If keep this empty, all of the syscalls will be trimmed.
> +
> + If unsure, please disable TRIM_UNUSED_SYSCALLS.
> +
> config KALLSYMS
> bool "Load all symbols for debugging/ksymoops" if EXPERT
> default y
> --
Hi, Arnd
> On Tue, Sep 26, 2023, at 00:40, Zhangjin Wu wrote:
> > For HAVE_TRIM_UNUSED_SYSCALLS, the syscall tables are hacked with the
> > input used syscalls.
> >
> > Based on the used syscalls information, a new version of tbl file is
> > generated from the original tbl file and named with a 'used' suffix.
> >
> > With this new tbl file, both unistd_nr_*.h and syscall_table_*.h files
> > are updated to only include the used syscalls.
> >
> > $ grep _Linux_syscalls -ur arch/mips/include/generated/asm/
> > arch/mips/include/generated/asm/unistd_nr_n64.h:#define
> > __NR_64_Linux_syscalls 165
> > arch/mips/include/generated/asm/unistd_nr_n32.h:#define
> > __NR_N32_Linux_syscalls 165
> > arch/mips/include/generated/asm/unistd_nr_o32.h:#define
> > __NR_O32_Linux_syscalls 89
> >
> > $ grep -vr sys_ni_syscall
> > arch/mips/include/generated/asm/syscall_table_*.h
> > arch/mips/include/generated/asm/syscall_table_n32.h:__SYSCALL(58,
> > sys_exit)
> > arch/mips/include/generated/asm/syscall_table_n32.h:__SYSCALL(164,
> > sys_reboot)
> > arch/mips/include/generated/asm/syscall_table_n64.h:__SYSCALL(58,
> > sys_exit)
>
> My feeling is that instead of postprocessing the generated files,
> it would be much better to make the elimination part of the
> existing infrastructure that generates the files from syscall.tbl,
> and finally change the include/asm-generic/unistd.h to the
> same format, as we had planned for a long time.
>
Agree very much, then we can simply touch the common files, no need to
touch the arch specific files.
> I should be able to help out with that part.
>
Thanks, is it enough to touch these ones?
$ ls scripts/syscall*
scripts/syscallhdr.sh scripts/syscallnr.sh scripts/syscalltbl.sh
One question here is that is it possible or required to share the used syscalls
selection code among them?
Another question require your help is the compat part, the compat stuff makes
things harder (include the Kconfig symbol interface definition and select
logic), perhaps we can simply limit our first DSE version under !COMPAT?
Best regards,
Zhangjin
> Arnd
Hi, Arnd
> On Tue, Sep 26, 2023, at 00:42, Zhangjin Wu wrote:
> > For HAVE_TRIM_UNUSED_SYSCALLS, the syscall tables are hacked with the
> > inputing unused_syscalls.
> >
> > Firstly, the intermediate preprocessed .i files are generated from the
> > original C version of syscall tables respectively, and named with a
> > 'used' suffix: syscall_table_used.i, compat_syscall_table_used.i.
> >
> > Secondly, all of the unused syscalls are commented.
> >
> > At last, two new objective files sufixed with 'used' are generated from
> > the hacked .i files and they are linked into the eventual kernel image.
> >
> > Signed-off-by: Zhangjin Wu <[email protected]>
>
> As mentioned in my comment on the mips patch, hacking the preprocessed
> file here is too much strain on the old infrastructure, the
> asm-generic/unistd.h file is already too hard to understand for
> anyone and in need of an overhaul, so let's work together on fixing
> it up first.
>
Ok, I was thinking about using asm/syscall_table.h instead of asm/unistd.h like mips.
void * const sys_call_table[NR_syscalls] = {
[0 ... NR_syscalls - 1] = __riscv_sys_ni_syscall,
#include <asm/syscall_table.h>
};
Therefore, we can generate syscall_table.h from asm/unist.h with a tool like scripts/syscallused.sh
Another solution may be firstly generate a list of `#define __USED_NR_##call 1`
for the used syscalls from Kconfig symbol, and then change __SYSCALL() macro
to:
#define __SYSCALL(nr, call) [nr] = __is_defined(__USED_NR_##call) ? __riscv_##call : __riscv_sys_ni_syscall,
`include/linux/kconfig.h` defined the '__is_defined'.
This method may work for the archs with .tbl files too.
Thanks,
Zhangjin
> Arnd
Hi, Arnd
> On Tue, Sep 26, 2023, at 00:43, Zhangjin Wu wrote:
> > When the maximum nr of the used syscalls is smaller than __NR_syscalls
> > (original syscalls total). It is able to update __NR_syscalls to
> > (maximum nr + 1) and further trim the '>= (maximum nr + 1)' part of the
> > syscall tables:
> >
> > For example:
> >
> > sys_call_table [143] = {
> > [0 ... 143 - 1] = sys_ni_syscall,
> > [64] = sys_write,
> > [93] = sys_exit,
> > [142] = sys_reboot,
> > }
> >
> > The >= 143 part of the syscall tables can be trimmed.
> >
> > At the same time, the syscall >= 143 from user space must be ignored
> > from do_trap_ecall_u() of traps.c.
> >
> > Signed-off-by: Zhangjin Wu <[email protected]>
> > ---
> > arch/riscv/include/asm/unistd.h | 2 ++
> > arch/riscv/kernel/Makefile | 2 ++
> > arch/riscv/kernel/syscalls/Makefile | 22 +++++++++++++++++++
> > .../kernel/syscalls/compat_syscall_table.c | 4 ++--
> > arch/riscv/kernel/syscalls/syscall_table.c | 4 ++--
> > 5 files changed, 30 insertions(+), 4 deletions(-)
>
> This bit feels like you are overoptimizing for a corner case:
> there is not much to be gained in terms of memory savings, but
> you add complexity in an area that I feel should be made common
> between architectures.
>
> I hope to get back to working on consolidating both the
> syscall.tbl input files and the build infrastructure for them
> across architectures, and you make that harder here, so I'd
> prefer you to drop this part, at least until the code is
> shared across all architectures.
>
Agree, let's drop it.
Thanks,
Zhangjin
> Arnd
On Sat, Oct 7, 2023, at 15:29, Zhangjin Wu wrote:
>> On Tue, Sep 26, 2023, at 00:42, Zhangjin Wu wrote:
>> > For HAVE_TRIM_UNUSED_SYSCALLS, the syscall tables are hacked with the
>> > inputing unused_syscalls.
>> >
>> > Firstly, the intermediate preprocessed .i files are generated from the
>> > original C version of syscall tables respectively, and named with a
>> > 'used' suffix: syscall_table_used.i, compat_syscall_table_used.i.
>> >
>> > Secondly, all of the unused syscalls are commented.
>> >
>> > At last, two new objective files sufixed with 'used' are generated from
>> > the hacked .i files and they are linked into the eventual kernel image.
>> >
>> > Signed-off-by: Zhangjin Wu <[email protected]>
>>
>> As mentioned in my comment on the mips patch, hacking the preprocessed
>> file here is too much strain on the old infrastructure, the
>> asm-generic/unistd.h file is already too hard to understand for
>> anyone and in need of an overhaul, so let's work together on fixing
>> it up first.
>>
>
> Ok, I was thinking about using asm/syscall_table.h instead of
> asm/unistd.h like mips.
>
> void * const sys_call_table[NR_syscalls] = {
> [0 ... NR_syscalls - 1] = __riscv_sys_ni_syscall,
> #include <asm/syscall_table.h>
> };
>
> Therefore, we can generate syscall_table.h from asm/unist.h with a tool
> like scripts/syscallused.sh
>
> Another solution may be firstly generate a list of `#define __USED_NR_##call 1`
> for the used syscalls from Kconfig symbol, and then change __SYSCALL() macro
> to:
>
> #define __SYSCALL(nr, call) [nr] =
> __is_defined(__USED_NR_##call) ? __riscv_##call :
> __riscv_sys_ni_syscall,
>
> `include/linux/kconfig.h` defined the '__is_defined'.
>
> This method may work for the archs with .tbl files too.
Right, either way would be much better than than your first
approach. For the mips version (and all the other
traditional architectures that use the syscall.tbl method)
I think I'd integrate the filtering in scripts/syscalltbl.sh
if we decide to go that way. For the riscv version
(and all the others using asm-generic/unistd.h), the
__USED_NR_## macro would be fine as an intermediate
step, until we manage to convert those to syscall.tbl
parsing.
On the other hand, based on the earlier findings, my
overall feeling is that we're better off not adding
the extra indirection at all, but instead add the
more Kconfig symbols to control the largest groups
of syscalls, with the hope of conditionally removing
additional code for each of these symbols beyond the
automatic gc-section logic.
Arnd