2024-03-06 20:09:49

by Calvin Owens

[permalink] [raw]
Subject: [RFC][PATCH 0/4] Make bpf_jit and kprobes work with CONFIG_MODULES=n

Hello all,

This patchset makes it possible to use bpftrace with kprobes on kernels
built without loadable module support.

On a Raspberry Pi 4b, this saves about 700KB of memory where BPF is
needed but loadable module support is not. These two kernels had
identical configurations, except CONFIG_MODULE was off in the second:

- Linux version 6.8.0-rc7
- Memory: 3330672K/4050944K available (16576K kernel code, 2390K rwdata,
- 12364K rodata, 5632K init, 675K bss, 195984K reserved, 524288K cma-reserved)
+ Linux version 6.8.0-rc7-00003-g2af01251ca21
+ Memory: 3331400K/4050944K available (16512K kernel code, 2384K rwdata,
+ 11728K rodata, 5632K init, 673K bss, 195256K reserved, 524288K cma-reserved)

I don't intend to present an exhaustive list of !MODULES usecases, since
I'm sure there are many I'm not aware of. Performance is a common one,
the primary justification being that static text is mapped on hugepages
and module text is not. Security is another, since rootkits are much
harder to implement without modules.

The first patch is the interesting one: it moves module_alloc() into its
own file with its own Kconfig option, so it can be utilized even when
loadable module support is disabled. I got the idea from an unmerged
patch from a few years ago I found on lkml (see [1/4] for details). I
think this also has value in its own right, since I suspect there are
potential users beyond bpf, hopefully we will hear from some.

Patches 2-3 are proofs of concept to demonstrate the first patch is
sufficient to achieve my goal (full ebpf functionality without modules).

Patch 4 adds a new "-n" argument to vmtest.sh to run the BPF selftests
without modules, so the prior three patches can be rigorously tested.

If something like the first patch were to eventually be merged, the rest
could go through the normal bpf-next process as I clean them up: I've
only based them on Linus' tree and combined them into a series here to
introduce the idea.

If you prefer to fetch the patches via git:

[1/4] https://github.com/jcalvinowens/linux.git work/module-alloc
+[2/4]+[3/4] https://github.com/jcalvinowens/linux.git work/nomodule-bpf
+[4/4] https://github.com/jcalvinowens/linux.git testing/nomodule-bpf-ci

In addition to the automated BPF selftests, I've lightly tested this on
my laptop (x86_64), a Raspberry Pi 4b (arm64), and a Raspberry Pi Zero W
(arm). The other architectures have only been compile tested.

I didn't want to spam all the arch maintainers with what I expect will
be a discussion mostly about modules and bpf, so I've left them off this
first submission. I will be sure to add them on future submissions of
the first patch. Of course, feedback on the arch bits is welcome here.

In addition to feedback on the patches themselves, I'm interested in
hearing from anybody else who might find this functionality useful.

Thanks,
Calvin


Calvin Owens (4):
module: mm: Make module_alloc() generally available
bpf: Allow BPF_JIT with CONFIG_MODULES=n
kprobes: Allow kprobes with CONFIG_MODULES=n
selftests/bpf: Support testing the !MODULES case

arch/Kconfig | 4 +-
arch/arm/kernel/module.c | 35 -----
arch/arm/mm/Makefile | 2 +
arch/arm/mm/module_alloc.c | 40 ++++++
arch/arm64/kernel/module.c | 127 -----------------
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/module_alloc.c | 130 ++++++++++++++++++
arch/loongarch/kernel/module.c | 6 -
arch/loongarch/mm/Makefile | 2 +
arch/loongarch/mm/module_alloc.c | 10 ++
arch/mips/kernel/module.c | 10 --
arch/mips/mm/Makefile | 2 +
arch/mips/mm/module_alloc.c | 13 ++
arch/nios2/kernel/module.c | 20 ---
arch/nios2/mm/Makefile | 2 +
arch/nios2/mm/module_alloc.c | 22 +++
arch/parisc/kernel/module.c | 12 --
arch/parisc/mm/Makefile | 1 +
arch/parisc/mm/module_alloc.c | 15 ++
arch/powerpc/kernel/module.c | 36 -----
arch/powerpc/mm/Makefile | 1 +
arch/powerpc/mm/module_alloc.c | 41 ++++++
arch/riscv/kernel/module.c | 11 --
arch/riscv/mm/Makefile | 1 +
arch/riscv/mm/module_alloc.c | 17 +++
arch/s390/kernel/module.c | 37 -----
arch/s390/mm/Makefile | 1 +
arch/s390/mm/module_alloc.c | 42 ++++++
arch/sparc/kernel/module.c | 31 -----
arch/sparc/mm/Makefile | 2 +
arch/sparc/mm/module_alloc.c | 31 +++++
arch/x86/kernel/ftrace.c | 2 +-
arch/x86/kernel/module.c | 56 --------
arch/x86/mm/Makefile | 2 +
arch/x86/mm/module_alloc.c | 59 ++++++++
fs/proc/kcore.c | 2 +-
include/trace/events/bpf_testmod.h | 1 +
kernel/bpf/Kconfig | 11 +-
kernel/bpf/Makefile | 2 +
kernel/bpf/bpf_struct_ops.c | 28 +++-
kernel/bpf/bpf_testmod/Makefile | 1 +
kernel/bpf/bpf_testmod/bpf_testmod.c | 1 +
kernel/bpf/bpf_testmod/bpf_testmod.h | 1 +
kernel/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 +
kernel/kprobes.c | 22 +++
kernel/module/Kconfig | 1 +
kernel/module/main.c | 17 ---
kernel/trace/trace_kprobe.c | 11 ++
mm/Kconfig | 3 +
mm/Makefile | 1 +
mm/module_alloc.c | 21 +++
mm/vmalloc.c | 2 +-
net/bpf/test_run.c | 2 +
tools/testing/selftests/bpf/Makefile | 28 ++--
.../selftests/bpf/bpf_testmod/Makefile | 2 +-
.../bpf/bpf_testmod/bpf_testmod-events.h | 6 +
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 4 +
.../bpf/bpf_testmod/bpf_testmod_kfunc.h | 2 +
tools/testing/selftests/bpf/config | 5 -
tools/testing/selftests/bpf/config.mods | 5 +
tools/testing/selftests/bpf/config.nomods | 1 +
.../selftests/bpf/progs/btf_type_tag_percpu.c | 2 +
.../selftests/bpf/progs/btf_type_tag_user.c | 2 +
tools/testing/selftests/bpf/progs/core_kern.c | 2 +
.../selftests/bpf/progs/iters_testmod_seq.c | 2 +
.../bpf/progs/test_core_reloc_module.c | 2 +
.../selftests/bpf/progs/test_ldsx_insn.c | 2 +
.../selftests/bpf/progs/test_module_attach.c | 3 +
.../selftests/bpf/progs/tracing_struct.c | 2 +
tools/testing/selftests/bpf/testing_helpers.c | 14 ++
tools/testing/selftests/bpf/vmtest.sh | 24 +++-
71 files changed, 636 insertions(+), 424 deletions(-)
create mode 100644 arch/arm/mm/module_alloc.c
create mode 100644 arch/arm64/mm/module_alloc.c
create mode 100644 arch/loongarch/mm/module_alloc.c
create mode 100644 arch/mips/mm/module_alloc.c
create mode 100644 arch/nios2/mm/module_alloc.c
create mode 100644 arch/parisc/mm/module_alloc.c
create mode 100644 arch/powerpc/mm/module_alloc.c
create mode 100644 arch/riscv/mm/module_alloc.c
create mode 100644 arch/s390/mm/module_alloc.c
create mode 100644 arch/sparc/mm/module_alloc.c
create mode 100644 arch/x86/mm/module_alloc.c
create mode 120000 include/trace/events/bpf_testmod.h
create mode 100644 kernel/bpf/bpf_testmod/Makefile
create mode 120000 kernel/bpf/bpf_testmod/bpf_testmod.c
create mode 120000 kernel/bpf/bpf_testmod/bpf_testmod.h
create mode 120000 kernel/bpf/bpf_testmod/bpf_testmod_kfunc.h
create mode 100644 mm/module_alloc.c
create mode 100644 tools/testing/selftests/bpf/config.mods
create mode 100644 tools/testing/selftests/bpf/config.nomods

--
2.43.0



2024-03-06 20:10:30

by Calvin Owens

[permalink] [raw]
Subject: [RFC][PATCH 4/4] selftests/bpf: Support testing the !MODULES case

This symlinks bpf_testmod into the main source, so it can be built-in
for running selftests in the new !MODULES case.

To be clear, no changes to the existing selftests are required: this
only exists to enable testing the new case which was not previously
possible. I'm sure somebody will be able to suggest a less ugly way I
can do this...

Signed-off-by: Calvin Owens <[email protected]>
---
include/trace/events/bpf_testmod.h | 1 +
kernel/bpf/Kconfig | 9 ++++++
kernel/bpf/Makefile | 2 ++
kernel/bpf/bpf_testmod/Makefile | 1 +
kernel/bpf/bpf_testmod/bpf_testmod.c | 1 +
kernel/bpf/bpf_testmod/bpf_testmod.h | 1 +
kernel/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 +
net/bpf/test_run.c | 2 ++
tools/testing/selftests/bpf/Makefile | 28 +++++++++++++------
.../selftests/bpf/bpf_testmod/Makefile | 2 +-
.../bpf/bpf_testmod/bpf_testmod-events.h | 6 ++++
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 4 +++
.../bpf/bpf_testmod/bpf_testmod_kfunc.h | 2 ++
tools/testing/selftests/bpf/config | 5 ----
tools/testing/selftests/bpf/config.mods | 5 ++++
tools/testing/selftests/bpf/config.nomods | 1 +
.../selftests/bpf/progs/btf_type_tag_percpu.c | 2 ++
.../selftests/bpf/progs/btf_type_tag_user.c | 2 ++
tools/testing/selftests/bpf/progs/core_kern.c | 2 ++
.../selftests/bpf/progs/iters_testmod_seq.c | 2 ++
.../bpf/progs/test_core_reloc_module.c | 2 ++
.../selftests/bpf/progs/test_ldsx_insn.c | 2 ++
.../selftests/bpf/progs/test_module_attach.c | 3 ++
.../selftests/bpf/progs/tracing_struct.c | 2 ++
tools/testing/selftests/bpf/testing_helpers.c | 14 ++++++++++
tools/testing/selftests/bpf/vmtest.sh | 24 ++++++++++++++--
26 files changed, 110 insertions(+), 16 deletions(-)
create mode 120000 include/trace/events/bpf_testmod.h
create mode 100644 kernel/bpf/bpf_testmod/Makefile
create mode 120000 kernel/bpf/bpf_testmod/bpf_testmod.c
create mode 120000 kernel/bpf/bpf_testmod/bpf_testmod.h
create mode 120000 kernel/bpf/bpf_testmod/bpf_testmod_kfunc.h
create mode 100644 tools/testing/selftests/bpf/config.mods
create mode 100644 tools/testing/selftests/bpf/config.nomods

diff --git a/include/trace/events/bpf_testmod.h b/include/trace/events/bpf_testmod.h
new file mode 120000
index 000000000000..ae237a90d381
--- /dev/null
+++ b/include/trace/events/bpf_testmod.h
@@ -0,0 +1 @@
+../../../tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
\ No newline at end of file
diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index 77df483a8925..d5ba795182e5 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -100,4 +100,13 @@ config BPF_LSM

If you are unsure how to answer this question, answer N.

+config BPF_TEST_MODULE
+ bool "Build the module for BPF selftests as a built-in"
+ depends on BPF_SYSCALL
+ depends on BPF_JIT
+ depends on !MODULES
+ default n
+ help
+ This allows most of the bpf selftests to run without modules.
+
endmenu # "BPF subsystem"
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index f526b7573e97..04b3e50ff940 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -46,3 +46,5 @@ obj-$(CONFIG_BPF_PRELOAD) += preload/
obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
$(obj)/relo_core.o: $(srctree)/tools/lib/bpf/relo_core.c FORCE
$(call if_changed_rule,cc_o_c)
+
+obj-$(CONFIG_BPF_TEST_MODULE) += bpf_testmod/
diff --git a/kernel/bpf/bpf_testmod/Makefile b/kernel/bpf/bpf_testmod/Makefile
new file mode 100644
index 000000000000..55a73fd8443e
--- /dev/null
+++ b/kernel/bpf/bpf_testmod/Makefile
@@ -0,0 +1 @@
+obj-y += bpf_testmod.o
diff --git a/kernel/bpf/bpf_testmod/bpf_testmod.c b/kernel/bpf/bpf_testmod/bpf_testmod.c
new file mode 120000
index 000000000000..ca3baca5d9c4
--- /dev/null
+++ b/kernel/bpf/bpf_testmod/bpf_testmod.c
@@ -0,0 +1 @@
+../../../tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
\ No newline at end of file
diff --git a/kernel/bpf/bpf_testmod/bpf_testmod.h b/kernel/bpf/bpf_testmod/bpf_testmod.h
new file mode 120000
index 000000000000..f8d3df98b6a5
--- /dev/null
+++ b/kernel/bpf/bpf_testmod/bpf_testmod.h
@@ -0,0 +1 @@
+../../../tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h
\ No newline at end of file
diff --git a/kernel/bpf/bpf_testmod/bpf_testmod_kfunc.h b/kernel/bpf/bpf_testmod/bpf_testmod_kfunc.h
new file mode 120000
index 000000000000..fdf42f5eaeb0
--- /dev/null
+++ b/kernel/bpf/bpf_testmod/bpf_testmod_kfunc.h
@@ -0,0 +1 @@
+../../../tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h
\ No newline at end of file
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index dfd919374017..33029c91bf92 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -573,10 +573,12 @@ __bpf_kfunc int bpf_modify_return_test2(int a, int *b, short c, int d,
return a + *b + c + d + (long)e + f + g;
}

+#if !IS_ENABLED(CONFIG_BPF_TEST_MODULE)
int noinline bpf_fentry_shadow_test(int a)
{
return a + 1;
}
+#endif

struct prog_test_member1 {
int a;
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index fd15017ed3b1..12da018c9fc3 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -108,9 +108,16 @@ TEST_PROGS_EXTENDED := with_addr.sh \
# Compile but not part of 'make run_tests'
TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \
flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \
- test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \
- xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata \
- xdp_features
+ test_lirc_mode2_user xdping test_cpp runqslower bench xskxceiver \
+ xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata xdp_features
+
+RUN_TESTS_WITHOUT_MODULES ?= 0
+TRUNNER_EXTRA_CFLAGS ?=
+
+ifeq ($(RUN_TESTS_WITHOUT_MODULES),0)
+TEST_GEN_PROGS_EXTENDED += bpf_testmod.ko
+TRUNNER_EXTRA_CFLAGS += -DBPF_TESTMOD_EXTERNAL
+endif

TEST_GEN_FILES += liburandom_read.so urandom_read sign-file uprobe_multi

@@ -400,22 +407,22 @@ $(OUTPUT)/cgroup_getset_retval_hooks.o: cgroup_getset_retval_hooks.h
# $3 - CFLAGS
define CLANG_BPF_BUILD_RULE
$(call msg,CLNG-BPF,$(TRUNNER_BINARY),$2)
- $(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v3 -o $2
+ $(Q)$(CLANG) $3 $(TRUNNER_EXTRA_CFLAGS) -O2 --target=bpf -c $1 -mcpu=v3 -o $2
endef
# Similar to CLANG_BPF_BUILD_RULE, but with disabled alu32
define CLANG_NOALU32_BPF_BUILD_RULE
$(call msg,CLNG-BPF,$(TRUNNER_BINARY),$2)
- $(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v2 -o $2
+ $(Q)$(CLANG) $3 $(TRUNNER_EXTRA_CFLAGS) -O2 --target=bpf -c $1 -mcpu=v2 -o $2
endef
# Similar to CLANG_BPF_BUILD_RULE, but with cpu-v4
define CLANG_CPUV4_BPF_BUILD_RULE
$(call msg,CLNG-BPF,$(TRUNNER_BINARY),$2)
- $(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v4 -o $2
+ $(Q)$(CLANG) $3 $(TRUNNER_EXTRA_CFLAGS) -O2 --target=bpf -c $1 -mcpu=v4 -o $2
endef
# Build BPF object using GCC
define GCC_BPF_BUILD_RULE
$(call msg,GCC-BPF,$(TRUNNER_BINARY),$2)
- $(Q)$(BPF_GCC) $3 -O2 -c $1 -o $2
+ $(Q)$(BPF_GCC) $3 $(TRUNNER_EXTRA_CFLAGS) -O2 -c $1 -o $2
endef

SKEL_BLACKLIST := btf__% test_pinning_invalid.c test_sk_assign.c
@@ -605,7 +612,7 @@ TRUNNER_EXTRA_SOURCES := test_progs.c \
json_writer.c \
flow_dissector_load.h \
ip_check_defrag_frags.h
-TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko \
+TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read \
$(OUTPUT)/liburandom_read.so \
$(OUTPUT)/xdp_synproxy \
$(OUTPUT)/sign-file \
@@ -614,6 +621,11 @@ TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko \
verify_sig_setup.sh \
$(wildcard progs/btf_dump_test_case_*.c) \
$(wildcard progs/*.bpf.o)
+
+ifeq ($(RUN_TESTS_WITHOUT_MODULES),0)
+TRUNNER_EXTRA_FILES += $(OUTPUT)/bpf_testmod.ko
+endif
+
TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE
TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS) -DENABLE_ATOMICS_TESTS
$(eval $(call DEFINE_TEST_RUNNER,test_progs))
diff --git a/tools/testing/selftests/bpf/bpf_testmod/Makefile b/tools/testing/selftests/bpf/bpf_testmod/Makefile
index 15cb36c4483a..123f161339e4 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/Makefile
+++ b/tools/testing/selftests/bpf/bpf_testmod/Makefile
@@ -10,7 +10,7 @@ endif
MODULES = bpf_testmod.ko

obj-m += bpf_testmod.o
-CFLAGS_bpf_testmod.o = -I$(src)
+CFLAGS_bpf_testmod.o = -I$(src) -DBPF_TESTMOD_EXTERNAL

all:
+$(Q)make -C $(KDIR) M=$(BPF_TESTMOD_DIR) modules
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
index 11ee801e75e7..57a9795d814a 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
@@ -1,5 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright (c) 2020 Facebook */
+
#undef TRACE_SYSTEM
#define TRACE_SYSTEM bpf_testmod

@@ -7,7 +8,10 @@
#define _BPF_TESTMOD_EVENTS_H

#include <linux/tracepoint.h>
+
+#ifdef BPF_TESTMOD_EXTERNAL
#include "bpf_testmod.h"
+#endif

TRACE_EVENT(bpf_testmod_test_read,
TP_PROTO(struct task_struct *task, struct bpf_testmod_test_read_ctx *ctx),
@@ -51,7 +55,9 @@ BPF_TESTMOD_DECLARE_TRACE(bpf_testmod_test_writable_bare,

#endif /* _BPF_TESTMOD_EVENTS_H */

+#ifdef BPF_TESTMOD_EXTERNAL
#undef TRACE_INCLUDE_PATH
#define TRACE_INCLUDE_PATH .
#define TRACE_INCLUDE_FILE bpf_testmod-events
+#endif
#include <trace/define_trace.h>
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index 91907b321f91..78769fe1c66b 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -12,7 +12,11 @@
#include "bpf_testmod_kfunc.h"

#define CREATE_TRACE_POINTS
+#ifdef BPF_TESTMOD_EXTERNAL
#include "bpf_testmod-events.h"
+#else
+#include "trace/events/bpf_testmod.h"
+#endif

typedef int (*func_proto_typedef)(long);
typedef int (*func_proto_typedef_nested1)(func_proto_typedef);
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h
index 7c664dd61059..fe4a67cf04cb 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h
@@ -26,6 +26,7 @@ struct prog_test_ref_kfunc {
};
#endif

+#if defined(BPF_TESTMOD_EXTERNAL) || defined(__KERNEL__)
struct prog_test_pass1 {
int x0;
struct {
@@ -63,6 +64,7 @@ struct prog_test_fail3 {
char arr1[2];
char arr2[];
};
+#endif

struct prog_test_ref_kfunc *
bpf_kfunc_call_test_acquire(unsigned long *scalar_ptr) __ksym;
diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index c125c441abc7..b26e79e42fb7 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -44,11 +44,6 @@ CONFIG_IPV6_TUNNEL=y
CONFIG_KEYS=y
CONFIG_LIRC=y
CONFIG_LWTUNNEL=y
-CONFIG_MODULE_SIG=y
-CONFIG_MODULE_SRCVERSION_ALL=y
-CONFIG_MODULE_UNLOAD=y
-CONFIG_MODULES=y
-CONFIG_MODVERSIONS=y
CONFIG_MPLS=y
CONFIG_MPLS_IPTUNNEL=y
CONFIG_MPLS_ROUTING=y
diff --git a/tools/testing/selftests/bpf/config.mods b/tools/testing/selftests/bpf/config.mods
new file mode 100644
index 000000000000..7fc4edb66b35
--- /dev/null
+++ b/tools/testing/selftests/bpf/config.mods
@@ -0,0 +1,5 @@
+CONFIG_MODULE_SIG=y
+CONFIG_MODULE_SRCVERSION_ALL=y
+CONFIG_MODULE_UNLOAD=y
+CONFIG_MODULES=y
+CONFIG_MODVERSIONS=y
diff --git a/tools/testing/selftests/bpf/config.nomods b/tools/testing/selftests/bpf/config.nomods
new file mode 100644
index 000000000000..aea6afdc0a0b
--- /dev/null
+++ b/tools/testing/selftests/bpf/config.nomods
@@ -0,0 +1 @@
+CONFIG_BPF_TEST_MODULE=y
diff --git a/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c b/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c
index 38f78d9345de..b3b52934dd37 100644
--- a/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c
+++ b/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c
@@ -4,6 +4,7 @@
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>

+#ifdef BPF_TESTMOD_EXTERNAL
struct bpf_testmod_btf_type_tag_1 {
int a;
};
@@ -11,6 +12,7 @@ struct bpf_testmod_btf_type_tag_1 {
struct bpf_testmod_btf_type_tag_2 {
struct bpf_testmod_btf_type_tag_1 *p;
};
+#endif

__u64 g;

diff --git a/tools/testing/selftests/bpf/progs/btf_type_tag_user.c b/tools/testing/selftests/bpf/progs/btf_type_tag_user.c
index 5523f77c5a44..a41cf28ef437 100644
--- a/tools/testing/selftests/bpf/progs/btf_type_tag_user.c
+++ b/tools/testing/selftests/bpf/progs/btf_type_tag_user.c
@@ -4,6 +4,7 @@
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>

+#ifdef BPF_TESTMOD_EXTERNAL
struct bpf_testmod_btf_type_tag_1 {
int a;
};
@@ -11,6 +12,7 @@ struct bpf_testmod_btf_type_tag_1 {
struct bpf_testmod_btf_type_tag_2 {
struct bpf_testmod_btf_type_tag_1 *p;
};
+#endif

int g;

diff --git a/tools/testing/selftests/bpf/progs/core_kern.c b/tools/testing/selftests/bpf/progs/core_kern.c
index 004f2acef2eb..82deb60ef672 100644
--- a/tools/testing/selftests/bpf/progs/core_kern.c
+++ b/tools/testing/selftests/bpf/progs/core_kern.c
@@ -67,9 +67,11 @@ struct __sk_bUfF /* it will not exist in vmlinux */ {
int len;
} __attribute__((preserve_access_index));

+#ifdef BPF_TESTMOD_EXTERNAL
struct bpf_testmod_test_read_ctx /* it exists in bpf_testmod */ {
size_t len;
} __attribute__((preserve_access_index));
+#endif

SEC("tc")
int balancer_ingress(struct __sk_buff *ctx)
diff --git a/tools/testing/selftests/bpf/progs/iters_testmod_seq.c b/tools/testing/selftests/bpf/progs/iters_testmod_seq.c
index 3873fb6c292a..39658b05ac1e 100644
--- a/tools/testing/selftests/bpf/progs/iters_testmod_seq.c
+++ b/tools/testing/selftests/bpf/progs/iters_testmod_seq.c
@@ -5,10 +5,12 @@
#include <bpf/bpf_helpers.h>
#include "bpf_misc.h"

+#ifdef BPF_TESTMOD_EXTERNAL
struct bpf_iter_testmod_seq {
u64 :64;
u64 :64;
};
+#endif

extern int bpf_iter_testmod_seq_new(struct bpf_iter_testmod_seq *it, s64 value, int cnt) __ksym;
extern s64 *bpf_iter_testmod_seq_next(struct bpf_iter_testmod_seq *it) __ksym;
diff --git a/tools/testing/selftests/bpf/progs/test_core_reloc_module.c b/tools/testing/selftests/bpf/progs/test_core_reloc_module.c
index bcb31ff92dcc..77b2dae54dd5 100644
--- a/tools/testing/selftests/bpf/progs/test_core_reloc_module.c
+++ b/tools/testing/selftests/bpf/progs/test_core_reloc_module.c
@@ -8,12 +8,14 @@

char _license[] SEC("license") = "GPL";

+#ifdef BPF_TESTMOD_EXTERNAL
struct bpf_testmod_test_read_ctx {
/* field order is mixed up */
size_t len;
char *buf;
loff_t off;
} __attribute__((preserve_access_index));
+#endif

struct {
char in[256];
diff --git a/tools/testing/selftests/bpf/progs/test_ldsx_insn.c b/tools/testing/selftests/bpf/progs/test_ldsx_insn.c
index 2a2a942737d7..f1d7276c6629 100644
--- a/tools/testing/selftests/bpf/progs/test_ldsx_insn.c
+++ b/tools/testing/selftests/bpf/progs/test_ldsx_insn.c
@@ -48,9 +48,11 @@ int map_val_prog(const void *ctx)

}

+#ifdef BPF_TESTMOD_EXTERNAL
struct bpf_testmod_struct_arg_1 {
int a;
};
+#endif

long long int_member;

diff --git a/tools/testing/selftests/bpf/progs/test_module_attach.c b/tools/testing/selftests/bpf/progs/test_module_attach.c
index 8a1b50f3a002..772cff1190b1 100644
--- a/tools/testing/selftests/bpf/progs/test_module_attach.c
+++ b/tools/testing/selftests/bpf/progs/test_module_attach.c
@@ -5,7 +5,10 @@
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
+
+#ifdef BPF_TESTMOD_EXTERNAL
#include "../bpf_testmod/bpf_testmod.h"
+#endif

__u32 raw_tp_read_sz = 0;

diff --git a/tools/testing/selftests/bpf/progs/tracing_struct.c b/tools/testing/selftests/bpf/progs/tracing_struct.c
index 515daef3c84b..3b5c69858feb 100644
--- a/tools/testing/selftests/bpf/progs/tracing_struct.c
+++ b/tools/testing/selftests/bpf/progs/tracing_struct.c
@@ -5,6 +5,7 @@
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_helpers.h>

+#ifdef BPF_TESTMOD_EXTERNAL
struct bpf_testmod_struct_arg_1 {
int a;
};
@@ -22,6 +23,7 @@ struct bpf_testmod_struct_arg_4 {
u64 a;
int b;
};
+#endif

long t1_a_a, t1_a_b, t1_b, t1_c, t1_ret, t1_nregs;
__u64 t1_reg0, t1_reg1, t1_reg2, t1_reg3;
diff --git a/tools/testing/selftests/bpf/testing_helpers.c b/tools/testing/selftests/bpf/testing_helpers.c
index d2458c1b1671..331be87d74d5 100644
--- a/tools/testing/selftests/bpf/testing_helpers.c
+++ b/tools/testing/selftests/bpf/testing_helpers.c
@@ -342,6 +342,12 @@ int unload_bpf_testmod(bool verbose)
{
if (kern_sync_rcu())
fprintf(stdout, "Failed to trigger kernel-side RCU sync!\n");
+
+ if (access("/proc/modules", F_OK)) {
+ fprintf(stdout, "Modules are disabled, fake unload success\n");
+ return 0;
+ }
+
if (delete_module("bpf_testmod", 0)) {
if (errno == ENOENT) {
if (verbose)
@@ -363,6 +369,14 @@ int load_bpf_testmod(bool verbose)
if (verbose)
fprintf(stdout, "Loading bpf_testmod.ko...\n");

+ if (access("/proc/modules", F_OK)) {
+ if (!access("/sys/kernel/debug/tracing/events/bpf_testmod", F_OK))
+ return 0;
+
+ fprintf(stdout, "Modules are disabled, testmod not built-in\n");
+ return -ENOENT;
+ }
+
fd = open("bpf_testmod.ko", O_RDONLY);
if (fd < 0) {
fprintf(stdout, "Can't find bpf_testmod.ko kernel module: %d\n", -errno);
diff --git a/tools/testing/selftests/bpf/vmtest.sh b/tools/testing/selftests/bpf/vmtest.sh
index 65d14f3bbe30..27e0b1241b16 100755
--- a/tools/testing/selftests/bpf/vmtest.sh
+++ b/tools/testing/selftests/bpf/vmtest.sh
@@ -44,11 +44,12 @@ NUM_COMPILE_JOBS="$(nproc)"
LOG_FILE_BASE="$(date +"bpf_selftests.%Y-%m-%d_%H-%M-%S")"
LOG_FILE="${LOG_FILE_BASE}.log"
EXIT_STATUS_FILE="${LOG_FILE_BASE}.exit_status"
+MODULES="yes"

usage()
{
cat <<EOF
-Usage: $0 [-i] [-s] [-d <output_dir>] -- [<command>]
+Usage: $0 [-i] [-s] [-n] [-d <output_dir>] -- [<command>]

<command> is the command you would normally run when you are in
tools/testing/selftests/bpf. e.g:
@@ -76,6 +77,7 @@ Options:
-s) Instead of powering off the VM, start an interactive
shell. If <command> is specified, the shell runs after
the command finishes executing
+ -n) Run tests with CONFIG_MODULES=n
EOF
}

@@ -341,7 +343,7 @@ main()
local exit_command="poweroff -f"
local debug_shell="no"

- while getopts ':hskid:j:' opt; do
+ while getopts ':hskid:j:n' opt; do
case ${opt} in
i)
update_image="yes"
@@ -357,6 +359,9 @@ main()
debug_shell="yes"
exit_command="bash"
;;
+ n)
+ MODULES="no"
+ ;;
h)
usage
exit 0
@@ -409,12 +414,27 @@ main()

echo "Output directory: ${OUTPUT_DIR}"

+ if [[ "${MODULES}" == "yes" ]]; then
+ KCONFIG_REL_PATHS+=("tools/testing/selftests/bpf/config.mods")
+ else
+ make_command="${make_command} RUN_TESTS_WITHOUT_MODULES=1"
+ KCONFIG_REL_PATHS+=("tools/testing/selftests/bpf/config.nomods")
+ fi
+
mkdir -p "${OUTPUT_DIR}"
mkdir -p "${mount_dir}"
update_kconfig "${kernel_checkout}" "${kconfig_file}"

recompile_kernel "${kernel_checkout}" "${make_command}"

+ # Touch the opposite mods/nomods config we used to ensure the
+ # kernel is rebuilt when the user adds or drops the -n flag.
+ if [[ "${MODULES}" == "yes" ]]; then
+ touch -m "tools/testing/selftests/bpf/config.nomods"
+ else
+ touch -m "tools/testing/selftests/bpf/config.mods"
+ fi
+
if [[ "${update_image}" == "no" && ! -f "${rootfs_img}" ]]; then
echo "rootfs image not found in ${rootfs_img}"
update_image="yes"
--
2.43.0


2024-03-06 20:12:16

by Calvin Owens

[permalink] [raw]
Subject: [RFC][PATCH 3/4] kprobes: Allow kprobes with CONFIG_MODULES=n

If something like this is merged down the road, it can go in at leisure
once the module_alloc change is in: it's a one-way dependency.

Signed-off-by: Calvin Owens <[email protected]>
---
arch/Kconfig | 2 +-
kernel/kprobes.c | 22 ++++++++++++++++++++++
kernel/trace/trace_kprobe.c | 11 +++++++++++
3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index cfc24ced16dd..e60ce984d095 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -52,8 +52,8 @@ config GENERIC_ENTRY

config KPROBES
bool "Kprobes"
- depends on MODULES
depends on HAVE_KPROBES
+ select MODULE_ALLOC
select KALLSYMS
select TASKS_RCU if PREEMPTION
help
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 9d9095e81792..194270e17d57 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1556,8 +1556,12 @@ static bool is_cfi_preamble_symbol(unsigned long addr)
str_has_prefix("__pfx_", symbuf);
}

+#if IS_ENABLED(CONFIG_MODULES)
static int check_kprobe_address_safe(struct kprobe *p,
struct module **probed_mod)
+#else
+static int check_kprobe_address_safe(struct kprobe *p)
+#endif
{
int ret;

@@ -1580,6 +1584,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
goto out;
}

+#if IS_ENABLED(CONFIG_MODULES)
/* Check if 'p' is probing a module. */
*probed_mod = __module_text_address((unsigned long) p->addr);
if (*probed_mod) {
@@ -1603,6 +1608,8 @@ static int check_kprobe_address_safe(struct kprobe *p,
ret = -ENOENT;
}
}
+#endif
+
out:
preempt_enable();
jump_label_unlock();
@@ -1614,7 +1621,9 @@ int register_kprobe(struct kprobe *p)
{
int ret;
struct kprobe *old_p;
+#if IS_ENABLED(CONFIG_MODULES)
struct module *probed_mod;
+#endif
kprobe_opcode_t *addr;
bool on_func_entry;

@@ -1633,7 +1642,11 @@ int register_kprobe(struct kprobe *p)
p->nmissed = 0;
INIT_LIST_HEAD(&p->list);

+#if IS_ENABLED(CONFIG_MODULES)
ret = check_kprobe_address_safe(p, &probed_mod);
+#else
+ ret = check_kprobe_address_safe(p);
+#endif
if (ret)
return ret;

@@ -1676,8 +1689,10 @@ int register_kprobe(struct kprobe *p)
out:
mutex_unlock(&kprobe_mutex);

+#if IS_ENABLED(CONFIG_MODULES)
if (probed_mod)
module_put(probed_mod);
+#endif

return ret;
}
@@ -2482,6 +2497,7 @@ int kprobe_add_area_blacklist(unsigned long start, unsigned long end)
return 0;
}

+#if IS_ENABLED(CONFIG_MODULES)
/* Remove all symbols in given area from kprobe blacklist */
static void kprobe_remove_area_blacklist(unsigned long start, unsigned long end)
{
@@ -2499,6 +2515,7 @@ static void kprobe_remove_ksym_blacklist(unsigned long entry)
{
kprobe_remove_area_blacklist(entry, entry + 1);
}
+#endif

int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
char *type, char *sym)
@@ -2564,6 +2581,7 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
return ret ? : arch_populate_kprobe_blacklist();
}

+#if IS_ENABLED(CONFIG_MODULES)
static void add_module_kprobe_blacklist(struct module *mod)
{
unsigned long start, end;
@@ -2665,6 +2683,7 @@ static struct notifier_block kprobe_module_nb = {
.notifier_call = kprobes_module_callback,
.priority = 0
};
+#endif /* IS_ENABLED(CONFIG_MODULES) */

void kprobe_free_init_mem(void)
{
@@ -2724,8 +2743,11 @@ static int __init init_kprobes(void)
err = arch_init_kprobes();
if (!err)
err = register_die_notifier(&kprobe_exceptions_nb);
+
+#if IS_ENABLED(CONFIG_MODULES)
if (!err)
err = register_module_notifier(&kprobe_module_nb);
+#endif

kprobes_initialized = (err == 0);
kprobe_sysctls_init();
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index c4c6e0e0068b..dd4598f775b9 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -102,6 +102,7 @@ static nokprobe_inline bool trace_kprobe_has_gone(struct trace_kprobe *tk)
return kprobe_gone(&tk->rp.kp);
}

+#if IS_ENABLED(CONFIG_MODULES)
static nokprobe_inline bool trace_kprobe_within_module(struct trace_kprobe *tk,
struct module *mod)
{
@@ -129,6 +130,12 @@ static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)

return ret;
}
+#else
+static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
+{
+ return true;
+}
+#endif

static bool trace_kprobe_is_busy(struct dyn_event *ev)
{
@@ -670,6 +677,7 @@ static int register_trace_kprobe(struct trace_kprobe *tk)
return ret;
}

+#if IS_ENABLED(CONFIG_MODULES)
/* Module notifier call back, checking event on the module */
static int trace_kprobe_module_callback(struct notifier_block *nb,
unsigned long val, void *data)
@@ -704,6 +712,7 @@ static struct notifier_block trace_kprobe_module_nb = {
.notifier_call = trace_kprobe_module_callback,
.priority = 1 /* Invoked after kprobe module callback */
};
+#endif /* IS_ENABLED(CONFIG_MODULES) */

static int count_symbols(void *data, unsigned long unused)
{
@@ -1897,8 +1906,10 @@ static __init int init_kprobe_trace_early(void)
if (ret)
return ret;

+#if IS_ENABLED(CONFIG_MODULES)
if (register_module_notifier(&trace_kprobe_module_nb))
return -EINVAL;
+#endif

return 0;
}
--
2.43.0


2024-03-06 23:23:45

by Calvin Owens

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/4] Make bpf_jit and kprobes work with CONFIG_MODULES=n

On Wednesday 03/06 at 13:34 -0800, Luis Chamberlain wrote:
> On Wed, Mar 06, 2024 at 12:05:07PM -0800, Calvin Owens wrote:
> > Hello all,
> >
> > This patchset makes it possible to use bpftrace with kprobes on kernels
> > built without loadable module support.
>
> This is a step in the right direction for another reason: clearly the
> module_alloc() is not about modules, and we have special reasons for it
> now beyond modules. The effort to share a generalize a huge page for
> these things is also another reason for some of this but that is more
> long term.
>
> I'm all for minor changes here so to avoid regressions but it seems a
> rename is in order -- if we're going to all this might as well do it
> now. And for that I'd just like to ask you paint the bikeshed with
> Song Liu as he's been the one slowly making way to help us get there
> with the "module: replace module_layout with module_memory",
> and Mike Rapoport as he's had some follow up attempts [0]. As I see it,
> the EXECMEM stuff would be what we use instead then. Mike kept the
> module_alloc() and the execmem was just a wrapper but your move of the
> arch stuff makes sense as well and I think would complement his series
> nicely.

I apologize for missing that. I think these are the four most recent
versions of the different series referenced from that LWN link:

a) https://lore.kernel.org/all/[email protected]/
b) https://lore.kernel.org/all/[email protected]/
c) https://lore.kernel.org/all/[email protected]/
d) https://lore.kernel.org/all/[email protected]/

Song and Mike, please correct me if I'm wrong, but I think what I've
done here (see [1], sorry for not adding you initially) is compatible
with everything both of you have recently proposed above. How do you
feel about this as a first step?

For naming, execmem_alloc() seems reasonable to me? I have no strong
feelings at all, I'll just use that going forward unless somebody else
expresses an opinion.

[1] https://lore.kernel.org/lkml/[email protected]/T/#m337096e158a5f771d0c7c2fb15a3b80a4443226a

> If you're gonna split code up to move to another place, it'd be nice
> if you can add copyright headers as was done with the kernel/module.c
> split into kernel/module/*.c

Silly question: should it be the same copyright header as the original
corresponding module.c, or a new one? I tried to preserve the license
header because I wasn't sure what to do about it.

Thanks,
Calvin

> Can we start with some small basic stuff we can all agree on?
>
> [0] https://lwn.net/Articles/944857/
>
> Luis

2024-03-07 01:39:55

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/4] Make bpf_jit and kprobes work with CONFIG_MODULES=n

On Wed, Mar 06, 2024 at 12:05:07PM -0800, Calvin Owens wrote:
> Hello all,
>
> This patchset makes it possible to use bpftrace with kprobes on kernels
> built without loadable module support.

This is a step in the right direction for another reason: clearly the
module_alloc() is not about modules, and we have special reasons for it
now beyond modules. The effort to share a generalize a huge page for
these things is also another reason for some of this but that is more
long term.

I'm all for minor changes here so to avoid regressions but it seems a
rename is in order -- if we're going to all this might as well do it
now. And for that I'd just like to ask you paint the bikeshed with
Song Liu as he's been the one slowly making way to help us get there
with the "module: replace module_layout with module_memory",
and Mike Rapoport as he's had some follow up attempts [0]. As I see it,
the EXECMEM stuff would be what we use instead then. Mike kept the
module_alloc() and the execmem was just a wrapper but your move of the
arch stuff makes sense as well and I think would complement his series
nicely.

If you're gonna split code up to move to another place, it'd be nice
if you can add copyright headers as was done with the kernel/module.c
split into kernel/module/*.c

Can we start with some small basic stuff we can all agree on?

[0] https://lwn.net/Articles/944857/

Luis

2024-03-07 01:58:50

by Song Liu

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/4] Make bpf_jit and kprobes work with CONFIG_MODULES=n

Hi Calvin,

It is great to hear from you! :)

On Wed, Mar 6, 2024 at 3:23 PM Calvin Owens <[email protected]> wrote:
>
> On Wednesday 03/06 at 13:34 -0800, Luis Chamberlain wrote:
> > On Wed, Mar 06, 2024 at 12:05:07PM -0800, Calvin Owens wrote:
> > > Hello all,
> > >
> > > This patchset makes it possible to use bpftrace with kprobes on kernels
> > > built without loadable module support.
> >
> > This is a step in the right direction for another reason: clearly the
> > module_alloc() is not about modules, and we have special reasons for it
> > now beyond modules. The effort to share a generalize a huge page for
> > these things is also another reason for some of this but that is more
> > long term.
> >
> > I'm all for minor changes here so to avoid regressions but it seems a
> > rename is in order -- if we're going to all this might as well do it
> > now. And for that I'd just like to ask you paint the bikeshed with
> > Song Liu as he's been the one slowly making way to help us get there
> > with the "module: replace module_layout with module_memory",
> > and Mike Rapoport as he's had some follow up attempts [0]. As I see it,
> > the EXECMEM stuff would be what we use instead then. Mike kept the
> > module_alloc() and the execmem was just a wrapper but your move of the
> > arch stuff makes sense as well and I think would complement his series
> > nicely.
>
> I apologize for missing that. I think these are the four most recent
> versions of the different series referenced from that LWN link:
>
> a) https://lore.kernel.org/all/[email protected]/
> b) https://lore.kernel.org/all/[email protected]/
> c) https://lore.kernel.org/all/[email protected]/
> d) https://lore.kernel.org/all/[email protected]/
>
> Song and Mike, please correct me if I'm wrong, but I think what I've
> done here (see [1], sorry for not adding you initially) is compatible
> with everything both of you have recently proposed above. How do you
> feel about this as a first step?

I agree that the work here is compatible with other efforts. I have no
objection to making this the first step.

>
> For naming, execmem_alloc() seems reasonable to me? I have no strong
> feelings at all, I'll just use that going forward unless somebody else
> expresses an opinion.

I am not good at naming things. No objection from me to "execmem_alloc".

Thanks,
Song

2024-03-07 07:14:48

by Mike Rapoport

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/4] Make bpf_jit and kprobes work with CONFIG_MODULES=n

Hi Calvin,

On Wed, Mar 06, 2024 at 03:23:22PM -0800, Calvin Owens wrote:
> On Wednesday 03/06 at 13:34 -0800, Luis Chamberlain wrote:
> > On Wed, Mar 06, 2024 at 12:05:07PM -0800, Calvin Owens wrote:
> > > Hello all,
> > >
> > > This patchset makes it possible to use bpftrace with kprobes on kernels
> > > built without loadable module support.
> >
> > This is a step in the right direction for another reason: clearly the
> > module_alloc() is not about modules, and we have special reasons for it
> > now beyond modules. The effort to share a generalize a huge page for
> > these things is also another reason for some of this but that is more
> > long term.
> >
> > I'm all for minor changes here so to avoid regressions but it seems a
> > rename is in order -- if we're going to all this might as well do it
> > now. And for that I'd just like to ask you paint the bikeshed with
> > Song Liu as he's been the one slowly making way to help us get there
> > with the "module: replace module_layout with module_memory",
> > and Mike Rapoport as he's had some follow up attempts [0]. As I see it,
> > the EXECMEM stuff would be what we use instead then. Mike kept the
> > module_alloc() and the execmem was just a wrapper but your move of the
> > arch stuff makes sense as well and I think would complement his series
> > nicely.

Actually I've dropped module_alloc() in favor of execmem_alloc() ;-)

> I apologize for missing that. I think these are the four most recent
> versions of the different series referenced from that LWN link:
>
> a) https://lore.kernel.org/all/[email protected]/
> b) https://lore.kernel.org/all/[email protected]/
> c) https://lore.kernel.org/all/[email protected]/
> d) https://lore.kernel.org/all/[email protected]/
>
> Song and Mike, please correct me if I'm wrong, but I think what I've
> done here (see [1], sorry for not adding you initially) is compatible
> with everything both of you have recently proposed above. How do you
> feel about this as a first step?

No objections from me.

> For naming, execmem_alloc() seems reasonable to me? I have no strong
> feelings at all, I'll just use that going forward unless somebody else
> expresses an opinion.

I like execmem_alloc() and CONFIG_EXECMEM.

--
Sincerely yours,
Mike.

2024-03-07 07:23:12

by Mike Rapoport

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/4] kprobes: Allow kprobes with CONFIG_MODULES=n

On Wed, Mar 06, 2024 at 12:05:10PM -0800, Calvin Owens wrote:
> If something like this is merged down the road, it can go in at leisure
> once the module_alloc change is in: it's a one-way dependency.
>
> Signed-off-by: Calvin Owens <[email protected]>
> ---
> arch/Kconfig | 2 +-
> kernel/kprobes.c | 22 ++++++++++++++++++++++
> kernel/trace/trace_kprobe.c | 11 +++++++++++
> 3 files changed, 34 insertions(+), 1 deletion(-)

When I did this in my last execmem posting, I think I've got slightly less
ugly ifdery, you may want to take a look at that:

https://lore.kernel.org/all/[email protected]

> diff --git a/arch/Kconfig b/arch/Kconfig
> index cfc24ced16dd..e60ce984d095 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -52,8 +52,8 @@ config GENERIC_ENTRY
>
> config KPROBES
> bool "Kprobes"
> - depends on MODULES
> depends on HAVE_KPROBES
> + select MODULE_ALLOC
> select KALLSYMS
> select TASKS_RCU if PREEMPTION
> help
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 9d9095e81792..194270e17d57 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1556,8 +1556,12 @@ static bool is_cfi_preamble_symbol(unsigned long addr)
> str_has_prefix("__pfx_", symbuf);
> }
>
> +#if IS_ENABLED(CONFIG_MODULES)
> static int check_kprobe_address_safe(struct kprobe *p,
> struct module **probed_mod)
> +#else
> +static int check_kprobe_address_safe(struct kprobe *p)
> +#endif
> {
> int ret;
>
> @@ -1580,6 +1584,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
> goto out;
> }
>
> +#if IS_ENABLED(CONFIG_MODULES)

Plain #ifdef will do here and below. IS_ENABLED is for usage withing the
code, like

if (IS_ENABLED(CONFIG_MODULES))
;

> /* Check if 'p' is probing a module. */
> *probed_mod = __module_text_address((unsigned long) p->addr);
> if (*probed_mod) {

--
Sincerely yours,
Mike.

2024-03-07 22:17:51

by Christophe Leroy

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/4] kprobes: Allow kprobes with CONFIG_MODULES=n



Le 06/03/2024 à 21:05, Calvin Owens a écrit :
> [Vous ne recevez pas souvent de courriers de [email protected]. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
>
> If something like this is merged down the road, it can go in at leisure
> once the module_alloc change is in: it's a one-way dependency.

Too many #ifdef, please reorganise stuff to avoid that and avoid
changing prototypes based of CONFIG_MODULES.

Other few comments below.

>
> Signed-off-by: Calvin Owens <[email protected]>
> ---
> arch/Kconfig | 2 +-
> kernel/kprobes.c | 22 ++++++++++++++++++++++
> kernel/trace/trace_kprobe.c | 11 +++++++++++
> 3 files changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index cfc24ced16dd..e60ce984d095 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -52,8 +52,8 @@ config GENERIC_ENTRY
>
> config KPROBES
> bool "Kprobes"
> - depends on MODULES
> depends on HAVE_KPROBES
> + select MODULE_ALLOC
> select KALLSYMS
> select TASKS_RCU if PREEMPTION
> help
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 9d9095e81792..194270e17d57 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1556,8 +1556,12 @@ static bool is_cfi_preamble_symbol(unsigned long addr)
> str_has_prefix("__pfx_", symbuf);
> }
>
> +#if IS_ENABLED(CONFIG_MODULES)
> static int check_kprobe_address_safe(struct kprobe *p,
> struct module **probed_mod)
> +#else
> +static int check_kprobe_address_safe(struct kprobe *p)
> +#endif

A bit ugly to have to change the prototype, why not just keep probed_mod
at all time ?

When CONFIG_MODULES is not selected, __module_text_address() returns
NULL so it should work without that many #ifdefs.

> {
> int ret;
>
> @@ -1580,6 +1584,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
> goto out;
> }
>
> +#if IS_ENABLED(CONFIG_MODULES)
> /* Check if 'p' is probing a module. */
> *probed_mod = __module_text_address((unsigned long) p->addr);
> if (*probed_mod) {
> @@ -1603,6 +1608,8 @@ static int check_kprobe_address_safe(struct kprobe *p,
> ret = -ENOENT;
> }
> }
> +#endif
> +
> out:
> preempt_enable();
> jump_label_unlock();
> @@ -1614,7 +1621,9 @@ int register_kprobe(struct kprobe *p)
> {
> int ret;
> struct kprobe *old_p;
> +#if IS_ENABLED(CONFIG_MODULES)
> struct module *probed_mod;
> +#endif
> kprobe_opcode_t *addr;
> bool on_func_entry;
>
> @@ -1633,7 +1642,11 @@ int register_kprobe(struct kprobe *p)
> p->nmissed = 0;
> INIT_LIST_HEAD(&p->list);
>
> +#if IS_ENABLED(CONFIG_MODULES)
> ret = check_kprobe_address_safe(p, &probed_mod);
> +#else
> + ret = check_kprobe_address_safe(p);
> +#endif
> if (ret)
> return ret;
>
> @@ -1676,8 +1689,10 @@ int register_kprobe(struct kprobe *p)
> out:
> mutex_unlock(&kprobe_mutex);
>
> +#if IS_ENABLED(CONFIG_MODULES)
> if (probed_mod)
> module_put(probed_mod);
> +#endif
>
> return ret;
> }
> @@ -2482,6 +2497,7 @@ int kprobe_add_area_blacklist(unsigned long start, unsigned long end)
> return 0;
> }
>
> +#if IS_ENABLED(CONFIG_MODULES)
> /* Remove all symbols in given area from kprobe blacklist */
> static void kprobe_remove_area_blacklist(unsigned long start, unsigned long end)
> {
> @@ -2499,6 +2515,7 @@ static void kprobe_remove_ksym_blacklist(unsigned long entry)
> {
> kprobe_remove_area_blacklist(entry, entry + 1);
> }
> +#endif
>
> int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
> char *type, char *sym)
> @@ -2564,6 +2581,7 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
> return ret ? : arch_populate_kprobe_blacklist();
> }
>
> +#if IS_ENABLED(CONFIG_MODULES)
> static void add_module_kprobe_blacklist(struct module *mod)
> {
> unsigned long start, end;
> @@ -2665,6 +2683,7 @@ static struct notifier_block kprobe_module_nb = {
> .notifier_call = kprobes_module_callback,
> .priority = 0
> };
> +#endif /* IS_ENABLED(CONFIG_MODULES) */
>
> void kprobe_free_init_mem(void)
> {
> @@ -2724,8 +2743,11 @@ static int __init init_kprobes(void)
> err = arch_init_kprobes();
> if (!err)
> err = register_die_notifier(&kprobe_exceptions_nb);
> +
> +#if IS_ENABLED(CONFIG_MODULES)
> if (!err)
> err = register_module_notifier(&kprobe_module_nb);
> +#endif
>
> kprobes_initialized = (err == 0);
> kprobe_sysctls_init();
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index c4c6e0e0068b..dd4598f775b9 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -102,6 +102,7 @@ static nokprobe_inline bool trace_kprobe_has_gone(struct trace_kprobe *tk)
> return kprobe_gone(&tk->rp.kp);
> }
>
> +#if IS_ENABLED(CONFIG_MODULES)
> static nokprobe_inline bool trace_kprobe_within_module(struct trace_kprobe *tk,
> struct module *mod)
> {
> @@ -129,6 +130,12 @@ static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
>
> return ret;
> }
> +#else
> +static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
> +{
> + return true;
> +}
> +#endif
>
> static bool trace_kprobe_is_busy(struct dyn_event *ev)
> {
> @@ -670,6 +677,7 @@ static int register_trace_kprobe(struct trace_kprobe *tk)
> return ret;
> }
>
> +#if IS_ENABLED(CONFIG_MODULES)
> /* Module notifier call back, checking event on the module */
> static int trace_kprobe_module_callback(struct notifier_block *nb,
> unsigned long val, void *data)
> @@ -704,6 +712,7 @@ static struct notifier_block trace_kprobe_module_nb = {
> .notifier_call = trace_kprobe_module_callback,
> .priority = 1 /* Invoked after kprobe module callback */
> };
> +#endif /* IS_ENABLED(CONFIG_MODULES) */
>
> static int count_symbols(void *data, unsigned long unused)
> {
> @@ -1897,8 +1906,10 @@ static __init int init_kprobe_trace_early(void)
> if (ret)
> return ret;
>
> +#if IS_ENABLED(CONFIG_MODULES)
> if (register_module_notifier(&trace_kprobe_module_nb))
> return -EINVAL;
Why a #if here ?

If CONFIG_MODULES is not selected, register_module_notifier() return
always 0.

> +#endif
>
> return 0;
> }
> --
> 2.43.0
>
>

2024-03-08 02:46:15

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/4] Make bpf_jit and kprobes work with CONFIG_MODULES=n

Hi,

On Wed, 6 Mar 2024 13:34:40 -0800
Luis Chamberlain <[email protected]> wrote:

> On Wed, Mar 06, 2024 at 12:05:07PM -0800, Calvin Owens wrote:
> > Hello all,
> >
> > This patchset makes it possible to use bpftrace with kprobes on kernels
> > built without loadable module support.
>
> This is a step in the right direction for another reason: clearly the
> module_alloc() is not about modules, and we have special reasons for it
> now beyond modules. The effort to share a generalize a huge page for
> these things is also another reason for some of this but that is more
> long term.

Indeed. If it works without CONFIG_MODULES, it may be exec_alloc() or
something like that. Anyway, thanks for great job on this item!

>
> I'm all for minor changes here so to avoid regressions but it seems a
> rename is in order -- if we're going to all this might as well do it
> now. And for that I'd just like to ask you paint the bikeshed with
> Song Liu as he's been the one slowly making way to help us get there
> with the "module: replace module_layout with module_memory",
> and Mike Rapoport as he's had some follow up attempts [0]. As I see it,
> the EXECMEM stuff would be what we use instead then. Mike kept the
> module_alloc() and the execmem was just a wrapper but your move of the
> arch stuff makes sense as well and I think would complement his series
> nicely.

yeah, it is better to work with Mike.

Thank you,

>
> If you're gonna split code up to move to another place, it'd be nice
> if you can add copyright headers as was done with the kernel/module.c
> split into kernel/module/*.c
>
> Can we start with some small basic stuff we can all agree on?
>
> [0] https://lwn.net/Articles/944857/
>
> Luis


--
Masami Hiramatsu (Google) <[email protected]>

2024-03-08 02:47:23

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/4] kprobes: Allow kprobes with CONFIG_MODULES=n

On Wed, 6 Mar 2024 12:05:10 -0800
Calvin Owens <[email protected]> wrote:

> If something like this is merged down the road, it can go in at leisure
> once the module_alloc change is in: it's a one-way dependency.
>
> Signed-off-by: Calvin Owens <[email protected]>
> ---
> arch/Kconfig | 2 +-
> kernel/kprobes.c | 22 ++++++++++++++++++++++
> kernel/trace/trace_kprobe.c | 11 +++++++++++
> 3 files changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index cfc24ced16dd..e60ce984d095 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -52,8 +52,8 @@ config GENERIC_ENTRY
>
> config KPROBES
> bool "Kprobes"
> - depends on MODULES
> depends on HAVE_KPROBES
> + select MODULE_ALLOC

OK, if we use EXEC_ALLOC,

config EXEC_ALLOC
depends on HAVE_EXEC_ALLOC

And

config KPROBES
bool "Kprobes"
depends on MODULES || EXEC_ALLOC
select EXEC_ALLOC if HAVE_EXEC_ALLOC

then kprobes can be enabled either modules supported or exec_alloc is supported.
(new arch does not need to implement exec_alloc)

Maybe we also need something like

#ifdef CONFIG_EXEC_ALLOC
#define module_alloc(size) exec_alloc(size)
#endif

in kprobes.h, or just add `replacing module_alloc with exec_alloc` patch.

Thank you,

> select KALLSYMS
> select TASKS_RCU if PREEMPTION
> help
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 9d9095e81792..194270e17d57 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1556,8 +1556,12 @@ static bool is_cfi_preamble_symbol(unsigned long addr)
> str_has_prefix("__pfx_", symbuf);
> }
>
> +#if IS_ENABLED(CONFIG_MODULES)
> static int check_kprobe_address_safe(struct kprobe *p,
> struct module **probed_mod)
> +#else
> +static int check_kprobe_address_safe(struct kprobe *p)
> +#endif
> {
> int ret;
>
> @@ -1580,6 +1584,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
> goto out;
> }
>
> +#if IS_ENABLED(CONFIG_MODULES)
> /* Check if 'p' is probing a module. */
> *probed_mod = __module_text_address((unsigned long) p->addr);
> if (*probed_mod) {
> @@ -1603,6 +1608,8 @@ static int check_kprobe_address_safe(struct kprobe *p,
> ret = -ENOENT;
> }
> }
> +#endif
> +
> out:
> preempt_enable();
> jump_label_unlock();
> @@ -1614,7 +1621,9 @@ int register_kprobe(struct kprobe *p)
> {
> int ret;
> struct kprobe *old_p;
> +#if IS_ENABLED(CONFIG_MODULES)
> struct module *probed_mod;
> +#endif
> kprobe_opcode_t *addr;
> bool on_func_entry;
>
> @@ -1633,7 +1642,11 @@ int register_kprobe(struct kprobe *p)
> p->nmissed = 0;
> INIT_LIST_HEAD(&p->list);
>
> +#if IS_ENABLED(CONFIG_MODULES)
> ret = check_kprobe_address_safe(p, &probed_mod);
> +#else
> + ret = check_kprobe_address_safe(p);
> +#endif
> if (ret)
> return ret;
>
> @@ -1676,8 +1689,10 @@ int register_kprobe(struct kprobe *p)
> out:
> mutex_unlock(&kprobe_mutex);
>
> +#if IS_ENABLED(CONFIG_MODULES)
> if (probed_mod)
> module_put(probed_mod);
> +#endif
>
> return ret;
> }
> @@ -2482,6 +2497,7 @@ int kprobe_add_area_blacklist(unsigned long start, unsigned long end)
> return 0;
> }
>
> +#if IS_ENABLED(CONFIG_MODULES)
> /* Remove all symbols in given area from kprobe blacklist */
> static void kprobe_remove_area_blacklist(unsigned long start, unsigned long end)
> {
> @@ -2499,6 +2515,7 @@ static void kprobe_remove_ksym_blacklist(unsigned long entry)
> {
> kprobe_remove_area_blacklist(entry, entry + 1);
> }
> +#endif
>
> int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
> char *type, char *sym)
> @@ -2564,6 +2581,7 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
> return ret ? : arch_populate_kprobe_blacklist();
> }
>
> +#if IS_ENABLED(CONFIG_MODULES)
> static void add_module_kprobe_blacklist(struct module *mod)
> {
> unsigned long start, end;
> @@ -2665,6 +2683,7 @@ static struct notifier_block kprobe_module_nb = {
> .notifier_call = kprobes_module_callback,
> .priority = 0
> };
> +#endif /* IS_ENABLED(CONFIG_MODULES) */
>
> void kprobe_free_init_mem(void)
> {
> @@ -2724,8 +2743,11 @@ static int __init init_kprobes(void)
> err = arch_init_kprobes();
> if (!err)
> err = register_die_notifier(&kprobe_exceptions_nb);
> +
> +#if IS_ENABLED(CONFIG_MODULES)
> if (!err)
> err = register_module_notifier(&kprobe_module_nb);
> +#endif
>
> kprobes_initialized = (err == 0);
> kprobe_sysctls_init();
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index c4c6e0e0068b..dd4598f775b9 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -102,6 +102,7 @@ static nokprobe_inline bool trace_kprobe_has_gone(struct trace_kprobe *tk)
> return kprobe_gone(&tk->rp.kp);
> }
>
> +#if IS_ENABLED(CONFIG_MODULES)
> static nokprobe_inline bool trace_kprobe_within_module(struct trace_kprobe *tk,
> struct module *mod)
> {
> @@ -129,6 +130,12 @@ static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
>
> return ret;
> }
> +#else
> +static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
> +{
> + return true;
> +}
> +#endif
>
> static bool trace_kprobe_is_busy(struct dyn_event *ev)
> {
> @@ -670,6 +677,7 @@ static int register_trace_kprobe(struct trace_kprobe *tk)
> return ret;
> }
>
> +#if IS_ENABLED(CONFIG_MODULES)
> /* Module notifier call back, checking event on the module */
> static int trace_kprobe_module_callback(struct notifier_block *nb,
> unsigned long val, void *data)
> @@ -704,6 +712,7 @@ static struct notifier_block trace_kprobe_module_nb = {
> .notifier_call = trace_kprobe_module_callback,
> .priority = 1 /* Invoked after kprobe module callback */
> };
> +#endif /* IS_ENABLED(CONFIG_MODULES) */
>
> static int count_symbols(void *data, unsigned long unused)
> {
> @@ -1897,8 +1906,10 @@ static __init int init_kprobe_trace_early(void)
> if (ret)
> return ret;
>
> +#if IS_ENABLED(CONFIG_MODULES)
> if (register_module_notifier(&trace_kprobe_module_nb))
> return -EINVAL;
> +#endif
>
> return 0;
> }
> --
> 2.43.0
>


--
Masami Hiramatsu (Google) <[email protected]>

2024-03-08 02:47:42

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/4] kprobes: Allow kprobes with CONFIG_MODULES=n

On Thu, 7 Mar 2024 09:22:07 +0200
Mike Rapoport <[email protected]> wrote:

> On Wed, Mar 06, 2024 at 12:05:10PM -0800, Calvin Owens wrote:
> > If something like this is merged down the road, it can go in at leisure
> > once the module_alloc change is in: it's a one-way dependency.
> >
> > Signed-off-by: Calvin Owens <[email protected]>
> > ---
> > arch/Kconfig | 2 +-
> > kernel/kprobes.c | 22 ++++++++++++++++++++++
> > kernel/trace/trace_kprobe.c | 11 +++++++++++
> > 3 files changed, 34 insertions(+), 1 deletion(-)
>
> When I did this in my last execmem posting, I think I've got slightly less
> ugly ifdery, you may want to take a look at that:
>
> https://lore.kernel.org/all/[email protected]

Good catch, and sorry I missed that series last year.
But it seems your patch seems less ugly.

Calvin, can you follow his patch?

Thank you,

>
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index cfc24ced16dd..e60ce984d095 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -52,8 +52,8 @@ config GENERIC_ENTRY
> >
> > config KPROBES
> > bool "Kprobes"
> > - depends on MODULES
> > depends on HAVE_KPROBES
> > + select MODULE_ALLOC
> > select KALLSYMS
> > select TASKS_RCU if PREEMPTION
> > help
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index 9d9095e81792..194270e17d57 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1556,8 +1556,12 @@ static bool is_cfi_preamble_symbol(unsigned long addr)
> > str_has_prefix("__pfx_", symbuf);
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > static int check_kprobe_address_safe(struct kprobe *p,
> > struct module **probed_mod)
> > +#else
> > +static int check_kprobe_address_safe(struct kprobe *p)
> > +#endif
> > {
> > int ret;
> >
> > @@ -1580,6 +1584,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
> > goto out;
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
>
> Plain #ifdef will do here and below. IS_ENABLED is for usage withing the
> code, like
>
> if (IS_ENABLED(CONFIG_MODULES))
> ;
>
> > /* Check if 'p' is probing a module. */
> > *probed_mod = __module_text_address((unsigned long) p->addr);
> > if (*probed_mod) {
>
> --
> Sincerely yours,
> Mike.


--
Masami Hiramatsu (Google) <[email protected]>

2024-03-08 02:56:21

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/4] Make bpf_jit and kprobes work with CONFIG_MODULES=n

On Thu, Mar 7, 2024 at 6:50 PM Masami Hiramatsu <[email protected]> wrote:
>
> On Wed, 6 Mar 2024 17:58:14 -0800
> Song Liu <[email protected]> wrote:
>
> > Hi Calvin,
> >
> > It is great to hear from you! :)
> >
> > On Wed, Mar 6, 2024 at 3:23 PM Calvin Owens <jcalvinowens@gmailcom> wrote:
> > >
> > > On Wednesday 03/06 at 13:34 -0800, Luis Chamberlain wrote:
> > > > On Wed, Mar 06, 2024 at 12:05:07PM -0800, Calvin Owens wrote:
> > > > > Hello all,
> > > > >
> > > > > This patchset makes it possible to use bpftrace with kprobes on kernels
> > > > > built without loadable module support.
> > > >
> > > > This is a step in the right direction for another reason: clearly the
> > > > module_alloc() is not about modules, and we have special reasons for it
> > > > now beyond modules. The effort to share a generalize a huge page for
> > > > these things is also another reason for some of this but that is more
> > > > long term.
> > > >
> > > > I'm all for minor changes here so to avoid regressions but it seems a
> > > > rename is in order -- if we're going to all this might as well do it
> > > > now. And for that I'd just like to ask you paint the bikeshed with
> > > > Song Liu as he's been the one slowly making way to help us get there
> > > > with the "module: replace module_layout with module_memory",
> > > > and Mike Rapoport as he's had some follow up attempts [0]. As I see it,
> > > > the EXECMEM stuff would be what we use instead then. Mike kept the
> > > > module_alloc() and the execmem was just a wrapper but your move of the
> > > > arch stuff makes sense as well and I think would complement his series
> > > > nicely.
> > >
> > > I apologize for missing that. I think these are the four most recent
> > > versions of the different series referenced from that LWN link:
> > >
> > > a) https://lore.kernel.org/all/20230918072955.2507221-1-rppt@kernelorg/
> > > b) https://lore.kernel.org/all/20230526051529.3387103-1-song@kernelorg/
> > > c) https://lore.kernel.org/all/20221107223921.3451913-1-song@kernelorg/
> > > d) https://lore.kernel.org/all/[email protected]/
> > >
> > > Song and Mike, please correct me if I'm wrong, but I think what I've
> > > done here (see [1], sorry for not adding you initially) is compatible
> > > with everything both of you have recently proposed above. How do you
> > > feel about this as a first step?
> >
> > I agree that the work here is compatible with other efforts. I have no
> > objection to making this the first step.
> >
> > >
> > > For naming, execmem_alloc() seems reasonable to me? I have no strong
> > > feelings at all, I'll just use that going forward unless somebody else
> > > expresses an opinion.
> >
> > I am not good at naming things. No objection from me to "execmem_alloc".
>
> Hm, it sounds good to me too. I think we should add a patch which just
> rename the module_alloc/module_memfree with execmem_alloc/free first.

I think that would be cleaner, yes. Leaving the possible move to a
secondary patch and placing the testing more on the later part.

Luis

2024-03-08 02:57:36

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/4] Make bpf_jit and kprobes work with CONFIG_MODULES=n

On Wed, 6 Mar 2024 17:58:14 -0800
Song Liu <[email protected]> wrote:

> Hi Calvin,
>
> It is great to hear from you! :)
>
> On Wed, Mar 6, 2024 at 3:23 PM Calvin Owens <[email protected]> wrote:
> >
> > On Wednesday 03/06 at 13:34 -0800, Luis Chamberlain wrote:
> > > On Wed, Mar 06, 2024 at 12:05:07PM -0800, Calvin Owens wrote:
> > > > Hello all,
> > > >
> > > > This patchset makes it possible to use bpftrace with kprobes on kernels
> > > > built without loadable module support.
> > >
> > > This is a step in the right direction for another reason: clearly the
> > > module_alloc() is not about modules, and we have special reasons for it
> > > now beyond modules. The effort to share a generalize a huge page for
> > > these things is also another reason for some of this but that is more
> > > long term.
> > >
> > > I'm all for minor changes here so to avoid regressions but it seems a
> > > rename is in order -- if we're going to all this might as well do it
> > > now. And for that I'd just like to ask you paint the bikeshed with
> > > Song Liu as he's been the one slowly making way to help us get there
> > > with the "module: replace module_layout with module_memory",
> > > and Mike Rapoport as he's had some follow up attempts [0]. As I see it,
> > > the EXECMEM stuff would be what we use instead then. Mike kept the
> > > module_alloc() and the execmem was just a wrapper but your move of the
> > > arch stuff makes sense as well and I think would complement his series
> > > nicely.
> >
> > I apologize for missing that. I think these are the four most recent
> > versions of the different series referenced from that LWN link:
> >
> > a) https://lore.kernel.org/all/[email protected]/
> > b) https://lore.kernel.org/all/[email protected]/
> > c) https://lore.kernel.org/all/[email protected]/
> > d) https://lore.kernel.org/all/[email protected]/
> >
> > Song and Mike, please correct me if I'm wrong, but I think what I've
> > done here (see [1], sorry for not adding you initially) is compatible
> > with everything both of you have recently proposed above. How do you
> > feel about this as a first step?
>
> I agree that the work here is compatible with other efforts. I have no
> objection to making this the first step.
>
> >
> > For naming, execmem_alloc() seems reasonable to me? I have no strong
> > feelings at all, I'll just use that going forward unless somebody else
> > expresses an opinion.
>
> I am not good at naming things. No objection from me to "execmem_alloc".

Hm, it sounds good to me too. I think we should add a patch which just
rename the module_alloc/module_memfree with execmem_alloc/free first.

Thanks,

--
Masami Hiramatsu (Google) <[email protected]>

2024-03-08 20:28:26

by Calvin Owens

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/4] Make bpf_jit and kprobes work with CONFIG_MODULES=n

On Thursday 03/07 at 18:55 -0800, Luis Chamberlain wrote:
> On Thu, Mar 7, 2024 at 6:50 PM Masami Hiramatsu <[email protected]> wrote:
> >
> > On Wed, 6 Mar 2024 17:58:14 -0800
> > Song Liu <[email protected]> wrote:
> >
> > > Hi Calvin,
> > >
> > > It is great to hear from you! :)
> > >
> > > On Wed, Mar 6, 2024 at 3:23 PM Calvin Owens <[email protected]> wrote:
> > > >
> > > > On Wednesday 03/06 at 13:34 -0800, Luis Chamberlain wrote:
> > > > > On Wed, Mar 06, 2024 at 12:05:07PM -0800, Calvin Owens wrote:
> > > > > > Hello all,
> > > > > >
> > > > > > This patchset makes it possible to use bpftrace with kprobes on kernels
> > > > > > built without loadable module support.
> > > > >
> > > > > This is a step in the right direction for another reason: clearly the
> > > > > module_alloc() is not about modules, and we have special reasons for it
> > > > > now beyond modules. The effort to share a generalize a huge page for
> > > > > these things is also another reason for some of this but that is more
> > > > > long term.
> > > > >
> > > > > I'm all for minor changes here so to avoid regressions but it seems a
> > > > > rename is in order -- if we're going to all this might as well do it
> > > > > now. And for that I'd just like to ask you paint the bikeshed with
> > > > > Song Liu as he's been the one slowly making way to help us get there
> > > > > with the "module: replace module_layout with module_memory",
> > > > > and Mike Rapoport as he's had some follow up attempts [0]. As I see it,
> > > > > the EXECMEM stuff would be what we use instead then. Mike kept the
> > > > > module_alloc() and the execmem was just a wrapper but your move of the
> > > > > arch stuff makes sense as well and I think would complement his series
> > > > > nicely.
> > > >
> > > > I apologize for missing that. I think these are the four most recent
> > > > versions of the different series referenced from that LWN link:
> > > >
> > > > a) https://lore.kernel.org/all/[email protected]/
> > > > b) https://lore.kernel.org/all/[email protected]/
> > > > c) https://lore.kernel.org/all/[email protected]/
> > > > d) https://lore.kernel.org/all/[email protected]/
> > > >
> > > > Song and Mike, please correct me if I'm wrong, but I think what I've
> > > > done here (see [1], sorry for not adding you initially) is compatible
> > > > with everything both of you have recently proposed above. How do you
> > > > feel about this as a first step?
> > >
> > > I agree that the work here is compatible with other efforts. I have no
> > > objection to making this the first step.
> > >
> > > >
> > > > For naming, execmem_alloc() seems reasonable to me? I have no strong
> > > > feelings at all, I'll just use that going forward unless somebody else
> > > > expresses an opinion.
> > >
> > > I am not good at naming things. No objection from me to "execmem_alloc".
> >
> > Hm, it sounds good to me too. I think we should add a patch which just
> > rename the module_alloc/module_memfree with execmem_alloc/free first.
>
> I think that would be cleaner, yes. Leaving the possible move to a
> secondary patch and placing the testing more on the later part.

Makes sense to me.

2024-03-08 20:36:28

by Calvin Owens

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/4] kprobes: Allow kprobes with CONFIG_MODULES=n

On Thursday 03/07 at 09:22 +0200, Mike Rapoport wrote:
> On Wed, Mar 06, 2024 at 12:05:10PM -0800, Calvin Owens wrote:
> > If something like this is merged down the road, it can go in at leisure
> > once the module_alloc change is in: it's a one-way dependency.
> >
> > Signed-off-by: Calvin Owens <[email protected]>
> > ---
> > arch/Kconfig | 2 +-
> > kernel/kprobes.c | 22 ++++++++++++++++++++++
> > kernel/trace/trace_kprobe.c | 11 +++++++++++
> > 3 files changed, 34 insertions(+), 1 deletion(-)
>
> When I did this in my last execmem posting, I think I've got slightly less
> ugly ifdery, you may want to take a look at that:
>
> https://lore.kernel.org/all/[email protected]

Thanks Mike, I definitely agree. I'm annoyed at myself for not finding
your patches, I spent some time looking for prior work and I really
don't know how I missed it...

> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index cfc24ced16dd..e60ce984d095 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -52,8 +52,8 @@ config GENERIC_ENTRY
> >
> > config KPROBES
> > bool "Kprobes"
> > - depends on MODULES
> > depends on HAVE_KPROBES
> > + select MODULE_ALLOC
> > select KALLSYMS
> > select TASKS_RCU if PREEMPTION
> > help
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index 9d9095e81792..194270e17d57 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1556,8 +1556,12 @@ static bool is_cfi_preamble_symbol(unsigned long addr)
> > str_has_prefix("__pfx_", symbuf);
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > static int check_kprobe_address_safe(struct kprobe *p,
> > struct module **probed_mod)
> > +#else
> > +static int check_kprobe_address_safe(struct kprobe *p)
> > +#endif
> > {
> > int ret;
> >
> > @@ -1580,6 +1584,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
> > goto out;
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
>
> Plain #ifdef will do here and below. IS_ENABLED is for usage withing the
> code, like
>
> if (IS_ENABLED(CONFIG_MODULES))
> ;
>
> > /* Check if 'p' is probing a module. */
> > *probed_mod = __module_text_address((unsigned long) p->addr);
> > if (*probed_mod) {
>
> --
> Sincerely yours,
> Mike.

2024-03-08 20:57:16

by Calvin Owens

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/4] kprobes: Allow kprobes with CONFIG_MODULES=n

On Friday 03/08 at 11:46 +0900, Masami Hiramatsu wrote:
> On Wed, 6 Mar 2024 12:05:10 -0800
> Calvin Owens <[email protected]> wrote:
>
> > If something like this is merged down the road, it can go in at leisure
> > once the module_alloc change is in: it's a one-way dependency.
> >
> > Signed-off-by: Calvin Owens <[email protected]>
> > ---
> > arch/Kconfig | 2 +-
> > kernel/kprobes.c | 22 ++++++++++++++++++++++
> > kernel/trace/trace_kprobe.c | 11 +++++++++++
> > 3 files changed, 34 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index cfc24ced16dd..e60ce984d095 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -52,8 +52,8 @@ config GENERIC_ENTRY
> >
> > config KPROBES
> > bool "Kprobes"
> > - depends on MODULES
> > depends on HAVE_KPROBES
> > + select MODULE_ALLOC
>
> OK, if we use EXEC_ALLOC,
>
> config EXEC_ALLOC
> depends on HAVE_EXEC_ALLOC
>
> And
>
> config KPROBES
> bool "Kprobes"
> depends on MODULES || EXEC_ALLOC
> select EXEC_ALLOC if HAVE_EXEC_ALLOC
>
> then kprobes can be enabled either modules supported or exec_alloc is supported.
> (new arch does not need to implement exec_alloc)
>
> Maybe we also need something like
>
> #ifdef CONFIG_EXEC_ALLOC
> #define module_alloc(size) exec_alloc(size)
> #endif
>
> in kprobes.h, or just add `replacing module_alloc with exec_alloc` patch.
>
> Thank you,

The example was helpful, thanks. I see what you mean with
HAVE_EXEC_ALLOC, I'll implement it like that in the next verison.

> > select KALLSYMS
> > select TASKS_RCU if PREEMPTION
> > help
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index 9d9095e81792..194270e17d57 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1556,8 +1556,12 @@ static bool is_cfi_preamble_symbol(unsigned long addr)
> > str_has_prefix("__pfx_", symbuf);
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > static int check_kprobe_address_safe(struct kprobe *p,
> > struct module **probed_mod)
> > +#else
> > +static int check_kprobe_address_safe(struct kprobe *p)
> > +#endif
> > {
> > int ret;
> >
> > @@ -1580,6 +1584,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
> > goto out;
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > /* Check if 'p' is probing a module. */
> > *probed_mod = __module_text_address((unsigned long) p->addr);
> > if (*probed_mod) {
> > @@ -1603,6 +1608,8 @@ static int check_kprobe_address_safe(struct kprobe *p,
> > ret = -ENOENT;
> > }
> > }
> > +#endif
> > +
> > out:
> > preempt_enable();
> > jump_label_unlock();
> > @@ -1614,7 +1621,9 @@ int register_kprobe(struct kprobe *p)
> > {
> > int ret;
> > struct kprobe *old_p;
> > +#if IS_ENABLED(CONFIG_MODULES)
> > struct module *probed_mod;
> > +#endif
> > kprobe_opcode_t *addr;
> > bool on_func_entry;
> >
> > @@ -1633,7 +1642,11 @@ int register_kprobe(struct kprobe *p)
> > p->nmissed = 0;
> > INIT_LIST_HEAD(&p->list);
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > ret = check_kprobe_address_safe(p, &probed_mod);
> > +#else
> > + ret = check_kprobe_address_safe(p);
> > +#endif
> > if (ret)
> > return ret;
> >
> > @@ -1676,8 +1689,10 @@ int register_kprobe(struct kprobe *p)
> > out:
> > mutex_unlock(&kprobe_mutex);
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > if (probed_mod)
> > module_put(probed_mod);
> > +#endif
> >
> > return ret;
> > }
> > @@ -2482,6 +2497,7 @@ int kprobe_add_area_blacklist(unsigned long start, unsigned long end)
> > return 0;
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > /* Remove all symbols in given area from kprobe blacklist */
> > static void kprobe_remove_area_blacklist(unsigned long start, unsigned long end)
> > {
> > @@ -2499,6 +2515,7 @@ static void kprobe_remove_ksym_blacklist(unsigned long entry)
> > {
> > kprobe_remove_area_blacklist(entry, entry + 1);
> > }
> > +#endif
> >
> > int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
> > char *type, char *sym)
> > @@ -2564,6 +2581,7 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
> > return ret ? : arch_populate_kprobe_blacklist();
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > static void add_module_kprobe_blacklist(struct module *mod)
> > {
> > unsigned long start, end;
> > @@ -2665,6 +2683,7 @@ static struct notifier_block kprobe_module_nb = {
> > .notifier_call = kprobes_module_callback,
> > .priority = 0
> > };
> > +#endif /* IS_ENABLED(CONFIG_MODULES) */
> >
> > void kprobe_free_init_mem(void)
> > {
> > @@ -2724,8 +2743,11 @@ static int __init init_kprobes(void)
> > err = arch_init_kprobes();
> > if (!err)
> > err = register_die_notifier(&kprobe_exceptions_nb);
> > +
> > +#if IS_ENABLED(CONFIG_MODULES)
> > if (!err)
> > err = register_module_notifier(&kprobe_module_nb);
> > +#endif
> >
> > kprobes_initialized = (err == 0);
> > kprobe_sysctls_init();
> > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> > index c4c6e0e0068b..dd4598f775b9 100644
> > --- a/kernel/trace/trace_kprobe.c
> > +++ b/kernel/trace/trace_kprobe.c
> > @@ -102,6 +102,7 @@ static nokprobe_inline bool trace_kprobe_has_gone(struct trace_kprobe *tk)
> > return kprobe_gone(&tk->rp.kp);
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > static nokprobe_inline bool trace_kprobe_within_module(struct trace_kprobe *tk,
> > struct module *mod)
> > {
> > @@ -129,6 +130,12 @@ static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
> >
> > return ret;
> > }
> > +#else
> > +static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
> > +{
> > + return true;
> > +}
> > +#endif
> >
> > static bool trace_kprobe_is_busy(struct dyn_event *ev)
> > {
> > @@ -670,6 +677,7 @@ static int register_trace_kprobe(struct trace_kprobe *tk)
> > return ret;
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > /* Module notifier call back, checking event on the module */
> > static int trace_kprobe_module_callback(struct notifier_block *nb,
> > unsigned long val, void *data)
> > @@ -704,6 +712,7 @@ static struct notifier_block trace_kprobe_module_nb = {
> > .notifier_call = trace_kprobe_module_callback,
> > .priority = 1 /* Invoked after kprobe module callback */
> > };
> > +#endif /* IS_ENABLED(CONFIG_MODULES) */
> >
> > static int count_symbols(void *data, unsigned long unused)
> > {
> > @@ -1897,8 +1906,10 @@ static __init int init_kprobe_trace_early(void)
> > if (ret)
> > return ret;
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > if (register_module_notifier(&trace_kprobe_module_nb))
> > return -EINVAL;
> > +#endif
> >
> > return 0;
> > }
> > --
> > 2.43.0
> >
>
>
> --
> Masami Hiramatsu (Google) <[email protected]>

2024-03-08 21:02:48

by Calvin Owens

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/4] kprobes: Allow kprobes with CONFIG_MODULES=n

On Thursday 03/07 at 22:16 +0000, Christophe Leroy wrote:
>
>
> Le 06/03/2024 à 21:05, Calvin Owens a écrit :
> > [Vous ne recevez pas souvent de courriers de [email protected]. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
> >
> > If something like this is merged down the road, it can go in at leisure
> > once the module_alloc change is in: it's a one-way dependency.
>
> Too many #ifdef, please reorganise stuff to avoid that and avoid
> changing prototypes based of CONFIG_MODULES.
>
> Other few comments below.

TBH the ugliness here was just me trying not to trigger -Wunused, but
that was silly: as you point out below, it's unncessary. I'll clean it
up.

> >
> > Signed-off-by: Calvin Owens <[email protected]>
> > ---
> > arch/Kconfig | 2 +-
> > kernel/kprobes.c | 22 ++++++++++++++++++++++
> > kernel/trace/trace_kprobe.c | 11 +++++++++++
> > 3 files changed, 34 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index cfc24ced16dd..e60ce984d095 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -52,8 +52,8 @@ config GENERIC_ENTRY
> >
> > config KPROBES
> > bool "Kprobes"
> > - depends on MODULES
> > depends on HAVE_KPROBES
> > + select MODULE_ALLOC
> > select KALLSYMS
> > select TASKS_RCU if PREEMPTION
> > help
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index 9d9095e81792..194270e17d57 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1556,8 +1556,12 @@ static bool is_cfi_preamble_symbol(unsigned long addr)
> > str_has_prefix("__pfx_", symbuf);
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > static int check_kprobe_address_safe(struct kprobe *p,
> > struct module **probed_mod)
> > +#else
> > +static int check_kprobe_address_safe(struct kprobe *p)
> > +#endif
>
> A bit ugly to have to change the prototype, why not just keep probed_mod
> at all time ?
>
> When CONFIG_MODULES is not selected, __module_text_address() returns
> NULL so it should work without that many #ifdefs.
>
> > {
> > int ret;
> >
> > @@ -1580,6 +1584,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
> > goto out;
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > /* Check if 'p' is probing a module. */
> > *probed_mod = __module_text_address((unsigned long) p->addr);
> > if (*probed_mod) {
> > @@ -1603,6 +1608,8 @@ static int check_kprobe_address_safe(struct kprobe *p,
> > ret = -ENOENT;
> > }
> > }
> > +#endif
> > +
> > out:
> > preempt_enable();
> > jump_label_unlock();
> > @@ -1614,7 +1621,9 @@ int register_kprobe(struct kprobe *p)
> > {
> > int ret;
> > struct kprobe *old_p;
> > +#if IS_ENABLED(CONFIG_MODULES)
> > struct module *probed_mod;
> > +#endif
> > kprobe_opcode_t *addr;
> > bool on_func_entry;
> >
> > @@ -1633,7 +1642,11 @@ int register_kprobe(struct kprobe *p)
> > p->nmissed = 0;
> > INIT_LIST_HEAD(&p->list);
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > ret = check_kprobe_address_safe(p, &probed_mod);
> > +#else
> > + ret = check_kprobe_address_safe(p);
> > +#endif
> > if (ret)
> > return ret;
> >
> > @@ -1676,8 +1689,10 @@ int register_kprobe(struct kprobe *p)
> > out:
> > mutex_unlock(&kprobe_mutex);
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > if (probed_mod)
> > module_put(probed_mod);
> > +#endif
> >
> > return ret;
> > }
> > @@ -2482,6 +2497,7 @@ int kprobe_add_area_blacklist(unsigned long start, unsigned long end)
> > return 0;
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > /* Remove all symbols in given area from kprobe blacklist */
> > static void kprobe_remove_area_blacklist(unsigned long start, unsigned long end)
> > {
> > @@ -2499,6 +2515,7 @@ static void kprobe_remove_ksym_blacklist(unsigned long entry)
> > {
> > kprobe_remove_area_blacklist(entry, entry + 1);
> > }
> > +#endif
> >
> > int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
> > char *type, char *sym)
> > @@ -2564,6 +2581,7 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
> > return ret ? : arch_populate_kprobe_blacklist();
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > static void add_module_kprobe_blacklist(struct module *mod)
> > {
> > unsigned long start, end;
> > @@ -2665,6 +2683,7 @@ static struct notifier_block kprobe_module_nb = {
> > .notifier_call = kprobes_module_callback,
> > .priority = 0
> > };
> > +#endif /* IS_ENABLED(CONFIG_MODULES) */
> >
> > void kprobe_free_init_mem(void)
> > {
> > @@ -2724,8 +2743,11 @@ static int __init init_kprobes(void)
> > err = arch_init_kprobes();
> > if (!err)
> > err = register_die_notifier(&kprobe_exceptions_nb);
> > +
> > +#if IS_ENABLED(CONFIG_MODULES)
> > if (!err)
> > err = register_module_notifier(&kprobe_module_nb);
> > +#endif
> >
> > kprobes_initialized = (err == 0);
> > kprobe_sysctls_init();
> > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> > index c4c6e0e0068b..dd4598f775b9 100644
> > --- a/kernel/trace/trace_kprobe.c
> > +++ b/kernel/trace/trace_kprobe.c
> > @@ -102,6 +102,7 @@ static nokprobe_inline bool trace_kprobe_has_gone(struct trace_kprobe *tk)
> > return kprobe_gone(&tk->rp.kp);
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > static nokprobe_inline bool trace_kprobe_within_module(struct trace_kprobe *tk,
> > struct module *mod)
> > {
> > @@ -129,6 +130,12 @@ static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
> >
> > return ret;
> > }
> > +#else
> > +static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
> > +{
> > + return true;
> > +}
> > +#endif
> >
> > static bool trace_kprobe_is_busy(struct dyn_event *ev)
> > {
> > @@ -670,6 +677,7 @@ static int register_trace_kprobe(struct trace_kprobe *tk)
> > return ret;
> > }
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > /* Module notifier call back, checking event on the module */
> > static int trace_kprobe_module_callback(struct notifier_block *nb,
> > unsigned long val, void *data)
> > @@ -704,6 +712,7 @@ static struct notifier_block trace_kprobe_module_nb = {
> > .notifier_call = trace_kprobe_module_callback,
> > .priority = 1 /* Invoked after kprobe module callback */
> > };
> > +#endif /* IS_ENABLED(CONFIG_MODULES) */
> >
> > static int count_symbols(void *data, unsigned long unused)
> > {
> > @@ -1897,8 +1906,10 @@ static __init int init_kprobe_trace_early(void)
> > if (ret)
> > return ret;
> >
> > +#if IS_ENABLED(CONFIG_MODULES)
> > if (register_module_notifier(&trace_kprobe_module_nb))
> > return -EINVAL;
> Why a #if here ?
>
> If CONFIG_MODULES is not selected, register_module_notifier() return
> always 0.
>
> > +#endif
> >
> > return 0;
> > }
> > --
> > 2.43.0
> >
> >

2024-03-25 22:46:34

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/4] Make bpf_jit and kprobes work with CONFIG_MODULES=n

On Wed Mar 6, 2024 at 10:05 PM EET, Calvin Owens wrote:
> Hello all,
>
> This patchset makes it possible to use bpftrace with kprobes on kernels
> built without loadable module support.
>
> On a Raspberry Pi 4b, this saves about 700KB of memory where BPF is
> needed but loadable module support is not. These two kernels had
> identical configurations, except CONFIG_MODULE was off in the second:
>
> - Linux version 6.8.0-rc7
> - Memory: 3330672K/4050944K available (16576K kernel code, 2390K rwdata,
> - 12364K rodata, 5632K init, 675K bss, 195984K reserved, 524288K cma-reserved)
> + Linux version 6.8.0-rc7-00003-g2af01251ca21
> + Memory: 3331400K/4050944K available (16512K kernel code, 2384K rwdata,
> + 11728K rodata, 5632K init, 673K bss, 195256K reserved, 524288K cma-reserved)
>
> I don't intend to present an exhaustive list of !MODULES usecases, since
> I'm sure there are many I'm not aware of. Performance is a common one,
> the primary justification being that static text is mapped on hugepages
> and module text is not. Security is another, since rootkits are much
> harder to implement without modules.
>
> The first patch is the interesting one: it moves module_alloc() into its
> own file with its own Kconfig option, so it can be utilized even when
> loadable module support is disabled. I got the idea from an unmerged
> patch from a few years ago I found on lkml (see [1/4] for details). I
> think this also has value in its own right, since I suspect there are
> potential users beyond bpf, hopefully we will hear from some.
>
> Patches 2-3 are proofs of concept to demonstrate the first patch is
> sufficient to achieve my goal (full ebpf functionality without modules).
>
> Patch 4 adds a new "-n" argument to vmtest.sh to run the BPF selftests
> without modules, so the prior three patches can be rigorously tested.
>
> If something like the first patch were to eventually be merged, the rest
> could go through the normal bpf-next process as I clean them up: I've
> only based them on Linus' tree and combined them into a series here to
> introduce the idea.
>
> If you prefer to fetch the patches via git:
>
> [1/4] https://github.com/jcalvinowens/linux.git work/module-alloc
> +[2/4]+[3/4] https://github.com/jcalvinowens/linux.git work/nomodule-bpf
> +[4/4] https://github.com/jcalvinowens/linux.git testing/nomodule-bpf-ci
>
> In addition to the automated BPF selftests, I've lightly tested this on
> my laptop (x86_64), a Raspberry Pi 4b (arm64), and a Raspberry Pi Zero W
> (arm). The other architectures have only been compile tested.
>
> I didn't want to spam all the arch maintainers with what I expect will
> be a discussion mostly about modules and bpf, so I've left them off this
> first submission. I will be sure to add them on future submissions of
> the first patch. Of course, feedback on the arch bits is welcome here.
>
> In addition to feedback on the patches themselves, I'm interested in
> hearing from anybody else who might find this functionality useful.
>
> Thanks,
> Calvin
>
>
> Calvin Owens (4):
> module: mm: Make module_alloc() generally available
> bpf: Allow BPF_JIT with CONFIG_MODULES=n
> kprobes: Allow kprobes with CONFIG_MODULES=n
> selftests/bpf: Support testing the !MODULES case
>
> arch/Kconfig | 4 +-
> arch/arm/kernel/module.c | 35 -----
> arch/arm/mm/Makefile | 2 +
> arch/arm/mm/module_alloc.c | 40 ++++++
> arch/arm64/kernel/module.c | 127 -----------------
> arch/arm64/mm/Makefile | 1 +
> arch/arm64/mm/module_alloc.c | 130 ++++++++++++++++++
> arch/loongarch/kernel/module.c | 6 -
> arch/loongarch/mm/Makefile | 2 +
> arch/loongarch/mm/module_alloc.c | 10 ++
> arch/mips/kernel/module.c | 10 --
> arch/mips/mm/Makefile | 2 +
> arch/mips/mm/module_alloc.c | 13 ++
> arch/nios2/kernel/module.c | 20 ---
> arch/nios2/mm/Makefile | 2 +
> arch/nios2/mm/module_alloc.c | 22 +++
> arch/parisc/kernel/module.c | 12 --
> arch/parisc/mm/Makefile | 1 +
> arch/parisc/mm/module_alloc.c | 15 ++
> arch/powerpc/kernel/module.c | 36 -----
> arch/powerpc/mm/Makefile | 1 +
> arch/powerpc/mm/module_alloc.c | 41 ++++++
> arch/riscv/kernel/module.c | 11 --
> arch/riscv/mm/Makefile | 1 +
> arch/riscv/mm/module_alloc.c | 17 +++
> arch/s390/kernel/module.c | 37 -----
> arch/s390/mm/Makefile | 1 +
> arch/s390/mm/module_alloc.c | 42 ++++++
> arch/sparc/kernel/module.c | 31 -----
> arch/sparc/mm/Makefile | 2 +
> arch/sparc/mm/module_alloc.c | 31 +++++
> arch/x86/kernel/ftrace.c | 2 +-
> arch/x86/kernel/module.c | 56 --------
> arch/x86/mm/Makefile | 2 +
> arch/x86/mm/module_alloc.c | 59 ++++++++
> fs/proc/kcore.c | 2 +-
> include/trace/events/bpf_testmod.h | 1 +
> kernel/bpf/Kconfig | 11 +-
> kernel/bpf/Makefile | 2 +
> kernel/bpf/bpf_struct_ops.c | 28 +++-
> kernel/bpf/bpf_testmod/Makefile | 1 +
> kernel/bpf/bpf_testmod/bpf_testmod.c | 1 +
> kernel/bpf/bpf_testmod/bpf_testmod.h | 1 +
> kernel/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 +
> kernel/kprobes.c | 22 +++
> kernel/module/Kconfig | 1 +
> kernel/module/main.c | 17 ---
> kernel/trace/trace_kprobe.c | 11 ++
> mm/Kconfig | 3 +
> mm/Makefile | 1 +
> mm/module_alloc.c | 21 +++
> mm/vmalloc.c | 2 +-
> net/bpf/test_run.c | 2 +
> tools/testing/selftests/bpf/Makefile | 28 ++--
> .../selftests/bpf/bpf_testmod/Makefile | 2 +-
> .../bpf/bpf_testmod/bpf_testmod-events.h | 6 +
> .../selftests/bpf/bpf_testmod/bpf_testmod.c | 4 +
> .../bpf/bpf_testmod/bpf_testmod_kfunc.h | 2 +
> tools/testing/selftests/bpf/config | 5 -
> tools/testing/selftests/bpf/config.mods | 5 +
> tools/testing/selftests/bpf/config.nomods | 1 +
> .../selftests/bpf/progs/btf_type_tag_percpu.c | 2 +
> .../selftests/bpf/progs/btf_type_tag_user.c | 2 +
> tools/testing/selftests/bpf/progs/core_kern.c | 2 +
> .../selftests/bpf/progs/iters_testmod_seq.c | 2 +
> .../bpf/progs/test_core_reloc_module.c | 2 +
> .../selftests/bpf/progs/test_ldsx_insn.c | 2 +
> .../selftests/bpf/progs/test_module_attach.c | 3 +
> .../selftests/bpf/progs/tracing_struct.c | 2 +
> tools/testing/selftests/bpf/testing_helpers.c | 14 ++
> tools/testing/selftests/bpf/vmtest.sh | 24 +++-
> 71 files changed, 636 insertions(+), 424 deletions(-)
> create mode 100644 arch/arm/mm/module_alloc.c
> create mode 100644 arch/arm64/mm/module_alloc.c
> create mode 100644 arch/loongarch/mm/module_alloc.c
> create mode 100644 arch/mips/mm/module_alloc.c
> create mode 100644 arch/nios2/mm/module_alloc.c
> create mode 100644 arch/parisc/mm/module_alloc.c
> create mode 100644 arch/powerpc/mm/module_alloc.c
> create mode 100644 arch/riscv/mm/module_alloc.c
> create mode 100644 arch/s390/mm/module_alloc.c
> create mode 100644 arch/sparc/mm/module_alloc.c
> create mode 100644 arch/x86/mm/module_alloc.c
> create mode 120000 include/trace/events/bpf_testmod.h
> create mode 100644 kernel/bpf/bpf_testmod/Makefile
> create mode 120000 kernel/bpf/bpf_testmod/bpf_testmod.c
> create mode 120000 kernel/bpf/bpf_testmod/bpf_testmod.h
> create mode 120000 kernel/bpf/bpf_testmod/bpf_testmod_kfunc.h
> create mode 100644 mm/module_alloc.c
> create mode 100644 tools/testing/selftests/bpf/config.mods
> create mode 100644 tools/testing/selftests/bpf/config.nomods

I think with eBPF focus in the patch set should be only on arch's
that you use regulary, i.e. repeating same mistake I did couple of
years ago:

https://lore.kernel.org/all/[email protected]/

I don't see my patch set conflict with this work, and it adds needed
shenanigans to realize eBPF patches. I refined the shenanigans to match
Masami's suggestions:

https://lore.kernel.org/all/[email protected]/

And it this requires properly working kprobes anyway.

BR, Jarkko