hi,
we see broken access to user space with bpf_probe_read/bpf_probe_read_str
helpers on arm64 with 5.4 kernel. The problem is that both helpers try to
read user memory by calling probe_kernel_read, which seems to work on x86
but fails on arm64.
There are several fixes after v5.4 to address this issue.
There was an attempt to fix that in past [1], but it deviated far from
upstream changes.
This patchset tries to follow the upstream changes with 2 notable exceptions:
1) bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers
- this upsgream patch adds new helpers, which we don't need to do,
we just need following functions (and related helper's glue):
bpf_probe_read_kernel_common
bpf_probe_read_kernel_str_common
that implement bpf_probe_read* helpers and receive fix in next
patch below, ommiting any other hunks
2) bpf: rework the compat kernel probe handling
- taking only fixes for functions and realted helper's glue that we took
from patch above, ommiting any other hunks
It's possible to add new helpers and keep the patches closer to upstream,
but I thought trying this way first as RFC without adding any new helpers
into stable kernel, which might possibly end up later with additional fixes.
Also I'm sending this as RFC, because I might be missing some mm related
dependency change, that I'm not aware of.
We tested new kernel with our use case on arm64 and x86.
thanks,
jirka
[1] https://yhbt.net/lore/all/[email protected]/t/
---
Andrii Nakryiko (1):
bpf: bpf_probe_read_kernel_str() has to return amount of data read on success
Christoph Hellwig (4):
maccess: clarify kerneldoc comments
maccess: rename strncpy_from_unsafe_user to strncpy_from_user_nofault
maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_nofault
bpf: rework the compat kernel probe handling
Daniel Borkmann (3):
uaccess: Add strict non-pagefault kernel-space read function
bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers
bpf: Restrict bpf_probe_read{, str}() only to archs where they work
arch/arm/Kconfig | 1 +
arch/arm64/Kconfig | 1 +
arch/x86/Kconfig | 1 +
arch/x86/mm/Makefile | 2 +-
arch/x86/mm/maccess.c | 43 ++++++++++++++++++++++++++++++++++++++
include/linux/uaccess.h | 8 +++++--
init/Kconfig | 3 +++
kernel/trace/bpf_trace.c | 140 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------------
kernel/trace/trace_kprobe.c | 2 +-
mm/maccess.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++-------
10 files changed, 207 insertions(+), 57 deletions(-)
create mode 100644 arch/x86/mm/maccess.c
From: Daniel Borkmann <[email protected]>
commit 0ebeea8ca8a4d1d453ad299aef0507dab04f6e8d upstream.
[Small context conflicts due to not bckported changes in previous patch]
Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs
with overlapping address ranges, we should really take the next step to
disable them from BPF use there.
To generally fix the situation, we've recently added new helper variants
bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str().
For details on them, see 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel}
and probe_read_{user,kernel}_str helpers").
Given bpf_probe_read{,str}() have been around for ~5 years by now, there
are plenty of users at least on x86 still relying on them today, so we
cannot remove them entirely w/o breaking the BPF tracing ecosystem.
However, their use should be restricted to archs with non-overlapping
address ranges where they are working in their current form. Therefore,
move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and
have x86, arm64, arm select it (other archs supporting it can follow-up
on it as well).
For the remaining archs, they can workaround easily by relying on the
feature probe from bpftool which spills out defines that can be used out
of BPF C code to implement the drop-in replacement for old/new kernels
via: bpftool feature probe macro
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
---
arch/arm/Kconfig | 1 +
arch/arm64/Kconfig | 1 +
arch/x86/Kconfig | 1 +
init/Kconfig | 3 +++
kernel/trace/bpf_trace.c | 2 ++
5 files changed, 8 insertions(+)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a70696a95b79..7c1cb0ebdb18 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -14,6 +14,7 @@ config ARM
select ARCH_HAS_KEEPINITRD
select ARCH_HAS_KCOV
select ARCH_HAS_MEMBARRIER_SYNC_CORE
+ select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PTE_SPECIAL if ARM_LPAE
select ARCH_HAS_PHYS_TO_DMA
select ARCH_HAS_SETUP_DMA_OPS
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 384b1bf56667..0d96acb2ca3e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -22,6 +22,7 @@ config ARM64
select ARCH_HAS_KCOV
select ARCH_HAS_KEEPINITRD
select ARCH_HAS_MEMBARRIER_SYNC_CORE
+ select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PTE_DEVMAP
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SETUP_DMA_OPS
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 6002252692af..7be388116732 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -70,6 +70,7 @@ config X86
select ARCH_HAS_KCOV if X86_64
select ARCH_HAS_MEM_ENCRYPT
select ARCH_HAS_MEMBARRIER_SYNC_CORE
+ select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PMEM_API if X86_64
select ARCH_HAS_PTE_DEVMAP if X86_64
select ARCH_HAS_PTE_SPECIAL
diff --git a/init/Kconfig b/init/Kconfig
index f641518f4ac5..2297b7ce6665 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2231,6 +2231,9 @@ config ASN1
source "kernel/Kconfig.locks"
+config ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
+ bool
+
config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
bool
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 9ac27d48cc8e..61c81c38202b 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -737,10 +737,12 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_current_task_under_cgroup_proto;
case BPF_FUNC_get_prandom_u32:
return &bpf_get_prandom_u32_proto;
+#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
case BPF_FUNC_probe_read:
return &bpf_probe_read_compat_proto;
case BPF_FUNC_probe_read_str:
return &bpf_probe_read_compat_str_proto;
+#endif
#ifdef CONFIG_CGROUPS
case BPF_FUNC_get_current_cgroup_id:
return &bpf_get_current_cgroup_id_proto;
--
2.40.1
From: Andrii Nakryiko <[email protected]>
commit 02553b91da5deb63c8562b47529b09b734659af0 upstream.
During recent refactorings, bpf_probe_read_kernel_str() started returning 0 on
success, instead of amount of data successfully read. This majorly breaks
applications relying on bpf_probe_read_kernel_str() and bpf_probe_read_str()
and their results. Fix this by returning actual number of bytes read.
Fixes: 8d92db5c04d1 ("bpf: rework the compat kernel probe handling")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
---
kernel/trace/bpf_trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index a46256f99229..c4c825dcdef8 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -198,7 +198,7 @@ bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr)
if (unlikely(ret < 0))
goto fail;
- return 0;
+ return ret;
fail:
memset(dst, 0, size);
return ret;
--
2.40.1
On Mon, May 22, 2023 at 10:33:44PM +0200, Jiri Olsa wrote:
> hi,
> we see broken access to user space with bpf_probe_read/bpf_probe_read_str
> helpers on arm64 with 5.4 kernel. The problem is that both helpers try to
> read user memory by calling probe_kernel_read, which seems to work on x86
> but fails on arm64.
Has this ever worked on arm64 for the 5.4 kernel tree? If not, it's not
really a regression, and so, why not use a newer kernel that has this
new feature added to it there?
In other words, what requires you to use the 5.4.y tree and requires
feature parity across architectures?
thanks,
greg k-h
On Fri, May 26, 2023 at 07:54:17PM +0100, Greg KH wrote:
> On Mon, May 22, 2023 at 10:33:44PM +0200, Jiri Olsa wrote:
> > hi,
> > we see broken access to user space with bpf_probe_read/bpf_probe_read_str
> > helpers on arm64 with 5.4 kernel. The problem is that both helpers try to
> > read user memory by calling probe_kernel_read, which seems to work on x86
> > but fails on arm64.
>
> Has this ever worked on arm64 for the 5.4 kernel tree? If not, it's not
> really a regression, and so, why not use a newer kernel that has this
> new feature added to it there?
>
> In other words, what requires you to use the 5.4.y tree and requires
> feature parity across architectures?
we have a customer running ok on x86 v5.4, but arm64 is broken with
the same bpf/user space code
upgrade is an option of course, but it's not a big change and we can
have 5.4 working on arm64 as well
I can send out the change that will be closer to upstream changes,
if that's a concern.. with adding the new probe helpers, which I
guess is not a problem, because it does not change current API
jirka
On Sun, May 28, 2023 at 10:02:49PM +0200, Jiri Olsa wrote:
> On Fri, May 26, 2023 at 07:54:17PM +0100, Greg KH wrote:
> > On Mon, May 22, 2023 at 10:33:44PM +0200, Jiri Olsa wrote:
> > > hi,
> > > we see broken access to user space with bpf_probe_read/bpf_probe_read_str
> > > helpers on arm64 with 5.4 kernel. The problem is that both helpers try to
> > > read user memory by calling probe_kernel_read, which seems to work on x86
> > > but fails on arm64.
> >
> > Has this ever worked on arm64 for the 5.4 kernel tree? If not, it's not
> > really a regression, and so, why not use a newer kernel that has this
> > new feature added to it there?
> >
> > In other words, what requires you to use the 5.4.y tree and requires
> > feature parity across architectures?
>
> we have a customer running ok on x86 v5.4, but arm64 is broken with
> the same bpf/user space code
Again why can they not use a newer kernel version? What forces this
customer to be stuck with a specific kernel version that spans different
processor types?
> upgrade is an option of course, but it's not a big change and we can
> have 5.4 working on arm64 as well
For loads of other reasons, I'd recommend 5.15 or newer for arm64, so
why not use that?
> I can send out the change that will be closer to upstream changes,
> if that's a concern.. with adding the new probe helpers, which I
> guess is not a problem, because it does not change current API
You are trying to add features to a stable kernel that are not
regression fixes, which is something that we generally do not accept
into stable kernels.
thnaks,
greg k-h