2024-02-02 10:38:57

by Hou Tao

[permalink] [raw]
Subject: [PATCH bpf v3 0/3] Fix the read of vsyscall page through bpf

From: Hou Tao <[email protected]>

Hi,

As reported by syzboot [1] and [2], when trying to read vsyscall page
by using bpf_probe_read_kernel() or bpf_probe_read(), oops may happen.

Thomas Gleixner had proposed a test patch [3], but it seems that no
formal patch is posted after about one month [4], so I post it instead
and add an Originally-by tag in patch #2.

Patch #1 makes is_vsyscall_vaddr() being a common helper. Patch #2 fixes
the problem by disallowing vsyscall page read for
copy_from_kernel_nofault(). Patch #3 adds one test case to ensure the
read of vsyscall page through bpf is rejected. Please see individual
patches for more details.

Comments are always welcome.

[1]: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com/
[2]: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com/
[3]: https://lore.kernel.org/bpf/87r0jwquhv.ffs@tglx/
[4]: https://lore.kernel.org/bpf/[email protected]/

Change Log:
v3:
* rephrase commit message for patch #1 & #2 (Sohil)
* reword comments in copy_from_kernel_nofault_allowed() (Sohil)
* add Rvb tag for patch #1 and Acked-by tag for patch #3 (Sohil, Yonghong)

v2: https://lore.kernel.org/bpf/[email protected]/
* move is_vsyscall_vaddr to asm/vsyscall.h instead (Sohil)
* elaborate on the reason for disallowing of vsyscall page read in
copy_from_kernel_nofault_allowed() (Sohil)
* update the commit message of patch #2 to more clearly explain how
the oops occurs. (Sohil)
* update the commit message of patch #3 to explain the expected return
values of various bpf helpers (Yonghong)

v1: https://lore.kernel.org/bpf/[email protected]/

Hou Tao (3):
x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h
x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()
selftest/bpf: Test the read of vsyscall page under x86-64

arch/x86/include/asm/vsyscall.h | 10 ++++
arch/x86/mm/fault.c | 9 ---
arch/x86/mm/maccess.c | 10 ++++
.../selftests/bpf/prog_tests/read_vsyscall.c | 57 +++++++++++++++++++
.../selftests/bpf/progs/read_vsyscall.c | 45 +++++++++++++++
5 files changed, 122 insertions(+), 9 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/read_vsyscall.c
create mode 100644 tools/testing/selftests/bpf/progs/read_vsyscall.c

--
2.29.2



2024-02-02 10:39:13

by Hou Tao

[permalink] [raw]
Subject: [PATCH bpf v3 1/3] x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h

From: Hou Tao <[email protected]>

Move is_vsyscall_vaddr() into asm/vsyscall.h to make it available for
copy_from_kernel_nofault_allowed() in arch/x86/mm/maccess.c.

Reviewed-by: Sohil Mehta <[email protected]>
Signed-off-by: Hou Tao <[email protected]>
---
arch/x86/include/asm/vsyscall.h | 10 ++++++++++
arch/x86/mm/fault.c | 9 ---------
2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index ab60a71a8dcb9..472f0263dbc61 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -4,6 +4,7 @@

#include <linux/seqlock.h>
#include <uapi/asm/vsyscall.h>
+#include <asm/page_types.h>

#ifdef CONFIG_X86_VSYSCALL_EMULATION
extern void map_vsyscall(void);
@@ -24,4 +25,13 @@ static inline bool emulate_vsyscall(unsigned long error_code,
}
#endif

+/*
+ * The (legacy) vsyscall page is the long page in the kernel portion
+ * of the address space that has user-accessible permissions.
+ */
+static inline bool is_vsyscall_vaddr(unsigned long vaddr)
+{
+ return unlikely((vaddr & PAGE_MASK) == VSYSCALL_ADDR);
+}
+
#endif /* _ASM_X86_VSYSCALL_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 679b09cfe241c..d6375b3c633bc 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -798,15 +798,6 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
show_opcodes(regs, loglvl);
}

-/*
- * The (legacy) vsyscall page is the long page in the kernel portion
- * of the address space that has user-accessible permissions.
- */
-static bool is_vsyscall_vaddr(unsigned long vaddr)
-{
- return unlikely((vaddr & PAGE_MASK) == VSYSCALL_ADDR);
-}
-
static void
__bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
unsigned long address, u32 pkey, int si_code)
--
2.29.2


2024-02-02 10:39:15

by Hou Tao

[permalink] [raw]
Subject: [PATCH bpf v3 2/3] x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()

From: Hou Tao <[email protected]>

When trying to use copy_from_kernel_nofault() to read vsyscall page
through a bpf program, the following oops was reported:

BUG: unable to handle page fault for address: ffffffffff600000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 3231067 P4D 3231067 PUD 3233067 PMD 3235067 PTE 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 20390 Comm: test_progs ...... 6.7.0+ #58
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
RIP: 0010:copy_from_kernel_nofault+0x6f/0x110
......
Call Trace:
<TASK>
? copy_from_kernel_nofault+0x6f/0x110
bpf_probe_read_kernel+0x1d/0x50
bpf_prog_2061065e56845f08_do_probe_read+0x51/0x8d
trace_call_bpf+0xc5/0x1c0
perf_call_bpf_enter.isra.0+0x69/0xb0
perf_syscall_enter+0x13e/0x200
syscall_trace_enter+0x188/0x1c0
do_syscall_64+0xb5/0xe0
entry_SYSCALL_64_after_hwframe+0x6e/0x76
</TASK>
......
---[ end trace 0000000000000000 ]---

The oops is triggered when:

1) A bpf program uses bpf_probe_read_kernel() to read from the vsyscall
page and invokes copy_from_kernel_nofault() which in turn calls
__get_user_asm().

2) Because the vsyscall page address is not readable from kernel space,
a page fault exception is triggered accordingly.

3) handle_page_fault() considers the vsyscall page address as a user
space address instead of a kernel space address. This results in the
fix-up setup by bpf not being applied and a page_fault_oops() is invoked
due to SMAP.

Considering handle_page_fault() has already considered the vsyscall page
address as a userspace address, fix the problem by disallowing vsyscall
page read for copy_from_kernel_nofault().

Originally-by: Thomas Gleixner <[email protected]>
Reported-by: [email protected]
Closes: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com
Reported-by: xingwei lee <[email protected]>
Closes: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com
Signed-off-by: Hou Tao <[email protected]>
---
arch/x86/mm/maccess.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
index 6993f026adec9..42115ac079cfe 100644
--- a/arch/x86/mm/maccess.c
+++ b/arch/x86/mm/maccess.c
@@ -3,6 +3,8 @@
#include <linux/uaccess.h>
#include <linux/kernel.h>

+#include <asm/vsyscall.h>
+
#ifdef CONFIG_X86_64
bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
{
@@ -15,6 +17,14 @@ bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
if (vaddr < TASK_SIZE_MAX + PAGE_SIZE)
return false;

+ /*
+ * Reading from the vsyscall page may cause an unhandled fault in
+ * certain cases. Though it is at an address above TASK_SIZE_MAX, it is
+ * usually considered as a user space address.
+ */
+ if (is_vsyscall_vaddr(vaddr))
+ return false;
+
/*
* Allow everything during early boot before 'x86_virt_bits'
* is initialized. Needed for instruction decoding in early
--
2.29.2


2024-02-02 11:15:52

by Hou Tao

[permalink] [raw]
Subject: [PATCH bpf v3 3/3] selftest/bpf: Test the read of vsyscall page under x86-64

From: Hou Tao <[email protected]>

Under x86-64, when using bpf_probe_read_kernel{_str}() or
bpf_probe_read{_str}() to read vsyscall page, the read may trigger oops,
so add one test case to ensure that the problem is fixed. Beside those
four bpf helpers mentioned above, testing the read of vsyscall page by
using bpf_probe_read_user{_str} and bpf_copy_from_user{_task}() as well.

The test case passes the address of vsyscall page to these six helpers
and checks whether the returned values are expected:

1) For bpf_probe_read_kernel{_str}()/bpf_probe_read{_str}(), the
expected return value is -ERANGE as shown below:

bpf_probe_read_kernel_common
copy_from_kernel_nofault
// false, return -ERANGE
copy_from_kernel_nofault_allowed

2) For bpf_probe_read_user{_str}(), the expected return value is -EFAULT
as show below:

bpf_probe_read_user_common
copy_from_user_nofault
// false, return -EFAULT
__access_ok

3) For bpf_copy_from_user(), the expected return value is -EFAULT:

// return -EFAULT
bpf_copy_from_user
copy_from_user
_copy_from_user
// return false
access_ok

4) For bpf_copy_from_user_task(), the expected return value is -EFAULT:

// return -EFAULT
bpf_copy_from_user_task
access_process_vm
// return 0
vma_lookup()
// return 0
expand_stack()

The occurrence of oops depends on the availability of CPU SMAP [1]
feature and there are three possible configurations of vsyscall page in
the boot cmd-line: vsyscall={xonly|none|emulate}, so there are a total
of six possible combinations. Under all these combinations, the test
case runs successfully.

[1]: https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention

Acked-by: Yonghong Song <[email protected]>
Signed-off-by: Hou Tao <[email protected]>
---
.../selftests/bpf/prog_tests/read_vsyscall.c | 57 +++++++++++++++++++
.../selftests/bpf/progs/read_vsyscall.c | 45 +++++++++++++++
2 files changed, 102 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/read_vsyscall.c
create mode 100644 tools/testing/selftests/bpf/progs/read_vsyscall.c

diff --git a/tools/testing/selftests/bpf/prog_tests/read_vsyscall.c b/tools/testing/selftests/bpf/prog_tests/read_vsyscall.c
new file mode 100644
index 0000000000000..3405923fe4e65
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/read_vsyscall.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2024. Huawei Technologies Co., Ltd */
+#include "test_progs.h"
+#include "read_vsyscall.skel.h"
+
+#if defined(__x86_64__)
+/* For VSYSCALL_ADDR */
+#include <asm/vsyscall.h>
+#else
+/* To prevent build failure on non-x86 arch */
+#define VSYSCALL_ADDR 0UL
+#endif
+
+struct read_ret_desc {
+ const char *name;
+ int ret;
+} all_read[] = {
+ { .name = "probe_read_kernel", .ret = -ERANGE },
+ { .name = "probe_read_kernel_str", .ret = -ERANGE },
+ { .name = "probe_read", .ret = -ERANGE },
+ { .name = "probe_read_str", .ret = -ERANGE },
+ { .name = "probe_read_user", .ret = -EFAULT },
+ { .name = "probe_read_user_str", .ret = -EFAULT },
+ { .name = "copy_from_user", .ret = -EFAULT },
+ { .name = "copy_from_user_task", .ret = -EFAULT },
+};
+
+void test_read_vsyscall(void)
+{
+ struct read_vsyscall *skel;
+ unsigned int i;
+ int err;
+
+#if !defined(__x86_64__)
+ test__skip();
+ return;
+#endif
+ skel = read_vsyscall__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "read_vsyscall open_load"))
+ return;
+
+ skel->bss->target_pid = getpid();
+ err = read_vsyscall__attach(skel);
+ if (!ASSERT_EQ(err, 0, "read_vsyscall attach"))
+ goto out;
+
+ /* userspace may don't have vsyscall page due to LEGACY_VSYSCALL_NONE,
+ * but it doesn't affect the returned error codes.
+ */
+ skel->bss->user_ptr = (void *)VSYSCALL_ADDR;
+ usleep(1);
+
+ for (i = 0; i < ARRAY_SIZE(all_read); i++)
+ ASSERT_EQ(skel->bss->read_ret[i], all_read[i].ret, all_read[i].name);
+out:
+ read_vsyscall__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/read_vsyscall.c b/tools/testing/selftests/bpf/progs/read_vsyscall.c
new file mode 100644
index 0000000000000..986f96687ae15
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/read_vsyscall.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2024. Huawei Technologies Co., Ltd */
+#include <linux/types.h>
+#include <bpf/bpf_helpers.h>
+
+#include "bpf_misc.h"
+
+int target_pid = 0;
+void *user_ptr = 0;
+int read_ret[8];
+
+char _license[] SEC("license") = "GPL";
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int do_probe_read(void *ctx)
+{
+ char buf[8];
+
+ if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+ return 0;
+
+ read_ret[0] = bpf_probe_read_kernel(buf, sizeof(buf), user_ptr);
+ read_ret[1] = bpf_probe_read_kernel_str(buf, sizeof(buf), user_ptr);
+ read_ret[2] = bpf_probe_read(buf, sizeof(buf), user_ptr);
+ read_ret[3] = bpf_probe_read_str(buf, sizeof(buf), user_ptr);
+ read_ret[4] = bpf_probe_read_user(buf, sizeof(buf), user_ptr);
+ read_ret[5] = bpf_probe_read_user_str(buf, sizeof(buf), user_ptr);
+
+ return 0;
+}
+
+SEC("fentry.s/" SYS_PREFIX "sys_nanosleep")
+int do_copy_from_user(void *ctx)
+{
+ char buf[8];
+
+ if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+ return 0;
+
+ read_ret[6] = bpf_copy_from_user(buf, sizeof(buf), user_ptr);
+ read_ret[7] = bpf_copy_from_user_task(buf, sizeof(buf), user_ptr,
+ bpf_get_current_task_btf(), 0);
+
+ return 0;
+}
--
2.29.2


2024-02-02 19:04:05

by Sohil Mehta

[permalink] [raw]
Subject: Re: [PATCH bpf v3 2/3] x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()

On 2/2/2024 2:39 AM, Hou Tao wrote:
> From: Hou Tao <[email protected]>
>
> When trying to use copy_from_kernel_nofault() to read vsyscall page
> through a bpf program, the following oops was reported:
>
> BUG: unable to handle page fault for address: ffffffffff600000
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 3231067 P4D 3231067 PUD 3233067 PMD 3235067 PTE 0
> Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 1 PID: 20390 Comm: test_progs ...... 6.7.0+ #58
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
> RIP: 0010:copy_from_kernel_nofault+0x6f/0x110
> ......
> Call Trace:
> <TASK>
> ? copy_from_kernel_nofault+0x6f/0x110
> bpf_probe_read_kernel+0x1d/0x50
> bpf_prog_2061065e56845f08_do_probe_read+0x51/0x8d
> trace_call_bpf+0xc5/0x1c0
> perf_call_bpf_enter.isra.0+0x69/0xb0
> perf_syscall_enter+0x13e/0x200
> syscall_trace_enter+0x188/0x1c0
> do_syscall_64+0xb5/0xe0
> entry_SYSCALL_64_after_hwframe+0x6e/0x76
> </TASK>
> ......
> ---[ end trace 0000000000000000 ]---
>
> The oops is triggered when:
>
> 1) A bpf program uses bpf_probe_read_kernel() to read from the vsyscall
> page and invokes copy_from_kernel_nofault() which in turn calls
> __get_user_asm().
>
> 2) Because the vsyscall page address is not readable from kernel space,
> a page fault exception is triggered accordingly.
>
> 3) handle_page_fault() considers the vsyscall page address as a user
> space address instead of a kernel space address. This results in the
> fix-up setup by bpf not being applied and a page_fault_oops() is invoked
> due to SMAP.
>
> Considering handle_page_fault() has already considered the vsyscall page
> address as a userspace address, fix the problem by disallowing vsyscall
> page read for copy_from_kernel_nofault().
>
> Originally-by: Thomas Gleixner <[email protected]>
> Reported-by: [email protected]
> Closes: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com
> Reported-by: xingwei lee <[email protected]>
> Closes: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com
> Signed-off-by: Hou Tao <[email protected]>
> ---
> arch/x86/mm/maccess.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>

Reviewed-by: Sohil Mehta <[email protected]>

2024-02-14 22:52:18

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH bpf v3 2/3] x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()

On Fri, Feb 2, 2024 at 11:03 AM Sohil Mehta <[email protected]> wrote:
>
> On 2/2/2024 2:39 AM, Hou Tao wrote:
> > From: Hou Tao <[email protected]>
> >
> > When trying to use copy_from_kernel_nofault() to read vsyscall page
> > through a bpf program, the following oops was reported:
> >
> > BUG: unable to handle page fault for address: ffffffffff600000
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 3231067 P4D 3231067 PUD 3233067 PMD 3235067 PTE 0
> > Oops: 0000 [#1] PREEMPT SMP PTI
> > CPU: 1 PID: 20390 Comm: test_progs ...... 6.7.0+ #58
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
> > RIP: 0010:copy_from_kernel_nofault+0x6f/0x110
> > ......
> > Call Trace:
> > <TASK>
> > ? copy_from_kernel_nofault+0x6f/0x110
> > bpf_probe_read_kernel+0x1d/0x50
> > bpf_prog_2061065e56845f08_do_probe_read+0x51/0x8d
> > trace_call_bpf+0xc5/0x1c0
> > perf_call_bpf_enter.isra.0+0x69/0xb0
> > perf_syscall_enter+0x13e/0x200
> > syscall_trace_enter+0x188/0x1c0
> > do_syscall_64+0xb5/0xe0
> > entry_SYSCALL_64_after_hwframe+0x6e/0x76
> > </TASK>
> > ......
> > ---[ end trace 0000000000000000 ]---
> >
> > The oops is triggered when:
> >
> > 1) A bpf program uses bpf_probe_read_kernel() to read from the vsyscall
> > page and invokes copy_from_kernel_nofault() which in turn calls
> > __get_user_asm().
> >
> > 2) Because the vsyscall page address is not readable from kernel space,
> > a page fault exception is triggered accordingly.
> >
> > 3) handle_page_fault() considers the vsyscall page address as a user
> > space address instead of a kernel space address. This results in the
> > fix-up setup by bpf not being applied and a page_fault_oops() is invoked
> > due to SMAP.
> >
> > Considering handle_page_fault() has already considered the vsyscall page
> > address as a userspace address, fix the problem by disallowing vsyscall
> > page read for copy_from_kernel_nofault().
> >
> > Originally-by: Thomas Gleixner <[email protected]>

Thomas,

could you please Ack the patch if you're still ok with it,
so we can take through the bpf tree to Linus soon ?

Not only syzbot, but real users are hitting this bug.

Thanks!

> > Reported-by: [email protected]
> > Closes: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com
> > Reported-by: xingwei lee <[email protected]>
> > Closes: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com
> > Signed-off-by: Hou Tao <[email protected]>
> > ---
> > arch/x86/mm/maccess.c | 10 ++++++++++
> > 1 file changed, 10 insertions(+)
> >
>
> Reviewed-by: Sohil Mehta <[email protected]>
>

2024-02-16 03:31:57

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH bpf v3 0/3] Fix the read of vsyscall page through bpf

Hello:

This series was applied to bpf/bpf.git (master)
by Alexei Starovoitov <[email protected]>:

On Fri, 2 Feb 2024 18:39:32 +0800 you wrote:
> From: Hou Tao <[email protected]>
>
> Hi,
>
> As reported by syzboot [1] and [2], when trying to read vsyscall page
> by using bpf_probe_read_kernel() or bpf_probe_read(), oops may happen.
>
> [...]

Here is the summary with links:
- [bpf,v3,1/3] x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h
https://git.kernel.org/bpf/bpf/c/ee0e39a63b78
- [bpf,v3,2/3] x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()
https://git.kernel.org/bpf/bpf/c/32019c659ecf
- [bpf,v3,3/3] selftest/bpf: Test the read of vsyscall page under x86-64
https://git.kernel.org/bpf/bpf/c/be66d79189ec

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html