2020-05-21 15:25:11

by Christoph Hellwig

[permalink] [raw]
Subject: clean up and streamline probe_kernel_* and friends v4

Hi all,

this series start cleaning up the safe kernel and user memory probing
helpers in mm/maccess.c, and then allows architectures to implement
the kernel probing without overriding the address space limit and
temporarily allowing access to user memory. It then switches x86
over to this new mechanism by reusing the unsafe_* uaccess logic.

This version also switches to the saner copy_{from,to}_kernel_nofault
naming suggested by Linus.

I kept the x86 helpers as-is without calling unsage_{get,put}_user as
that avoids a number of hard to trace casts, and it will still work
with the asm-goto based version easily.

Changes since v3:
- cleanup how bpf and trace_kprobe perform the TASK_SIZE checks
- remove the unused dst argument to probe_kernel_read_allowed
- document the -ERANGE return value

Changes since v2:
- rebased on 5.7-rc6 with the bpf trace format string changes
- rename arch_kernel_read to __get_kernel_nofault and arch_kernel_write
to __put_kernel_nofault
- clean up the tracers to only allowd "mixed" reads when the kernel
has non-overlapping address spaces


2020-05-21 15:25:14

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 02/23] maccess: remove various unused weak aliases

maccess tends to define lots of underscore prefixed symbols that then
have other weak aliases. But except for two cases they are never
actually used, so remove them.

Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/uaccess.h | 3 ---
mm/maccess.c | 19 +++----------------
2 files changed, 3 insertions(+), 19 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 67f016010aad5..a2c606a403745 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -324,7 +324,6 @@ extern long __probe_kernel_read(void *dst, const void *src, size_t size);
* happens, handle that and return -EFAULT.
*/
extern long probe_user_read(void *dst, const void __user *src, size_t size);
-extern long __probe_user_read(void *dst, const void __user *src, size_t size);

/*
* probe_kernel_write(): safely attempt to write to a location
@@ -336,7 +335,6 @@ extern long __probe_user_read(void *dst, const void __user *src, size_t size);
* happens, handle that and return -EFAULT.
*/
extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
-extern long notrace __probe_kernel_write(void *dst, const void *src, size_t size);

/*
* probe_user_write(): safely attempt to write to a location in user space
@@ -348,7 +346,6 @@ extern long notrace __probe_kernel_write(void *dst, const void *src, size_t size
* happens, handle that and return -EFAULT.
*/
extern long notrace probe_user_write(void __user *dst, const void *src, size_t size);
-extern long notrace __probe_user_write(void __user *dst, const void *src, size_t size);

extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
extern long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
diff --git a/mm/maccess.c b/mm/maccess.c
index cf21e604f78cb..4e7f3b6eb05ae 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -79,11 +79,7 @@ EXPORT_SYMBOL_GPL(probe_kernel_read);
* Safely read from user address @src to the buffer at @dst. If a kernel fault
* happens, handle that and return -EFAULT.
*/
-
-long __weak probe_user_read(void *dst, const void __user *src, size_t size)
- __attribute__((alias("__probe_user_read")));
-
-long __probe_user_read(void *dst, const void __user *src, size_t size)
+long probe_user_read(void *dst, const void __user *src, size_t size)
{
long ret = -EFAULT;
mm_segment_t old_fs = get_fs();
@@ -106,11 +102,7 @@ EXPORT_SYMBOL_GPL(probe_user_read);
* Safely write to address @dst from the buffer at @src. If a kernel fault
* happens, handle that and return -EFAULT.
*/
-
-long __weak probe_kernel_write(void *dst, const void *src, size_t size)
- __attribute__((alias("__probe_kernel_write")));
-
-long __probe_kernel_write(void *dst, const void *src, size_t size)
+long probe_kernel_write(void *dst, const void *src, size_t size)
{
long ret;
mm_segment_t old_fs = get_fs();
@@ -131,11 +123,7 @@ long __probe_kernel_write(void *dst, const void *src, size_t size)
* Safely write to address @dst from the buffer at @src. If a kernel fault
* happens, handle that and return -EFAULT.
*/
-
-long __weak probe_user_write(void __user *dst, const void *src, size_t size)
- __attribute__((alias("__probe_user_write")));
-
-long __probe_user_write(void __user *dst, const void *src, size_t size)
+long probe_user_write(void __user *dst, const void *src, size_t size)
{
long ret = -EFAULT;
mm_segment_t old_fs = get_fs();
@@ -171,7 +159,6 @@ long __probe_user_write(void __user *dst, const void *src, size_t size)
* probing memory on a user address range where strncpy_from_unsafe_user() is
* supposed to be used instead.
*/
-
long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
__attribute__((alias("__strncpy_from_unsafe")));

--
2.26.2

2020-05-21 15:25:29

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 04/23] maccess: clarify kerneldoc comments

Add proper kerneldoc comments for probe_kernel_read_strict and
probe_kernel_read strncpy_from_unsafe_strict and explain the different
versus the non-strict version.

Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/maccess.c | 61 ++++++++++++++++++++++++++++++++++++----------------
1 file changed, 43 insertions(+), 18 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 4e7f3b6eb05ae..747581ac50dc9 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -31,29 +31,35 @@ probe_write_common(void __user *dst, const void *src, size_t size)
}

/**
- * probe_kernel_read(): safely attempt to read from a kernel-space location
+ * probe_kernel_read(): safely attempt to read from any location
* @dst: pointer to the buffer that shall take the data
* @src: address to read from
* @size: size of the data chunk
*
- * Safely read from address @src to the buffer at @dst. If a kernel fault
- * happens, handle that and return -EFAULT.
+ * Same as probe_kernel_read_strict() except that for architectures with
+ * not fully separated user and kernel address spaces this function also works
+ * for user address tanges.
+ *
+ * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
+ * separate kernel and user address spaces, and also a bad idea otherwise.
+ */
+long __weak probe_kernel_read(void *dst, const void *src, size_t size)
+ __attribute__((alias("__probe_kernel_read")));
+
+/**
+ * probe_kernel_read_strict(): safely attempt to read from kernel-space
+ * @dst: pointer to the buffer that shall take the data
+ * @src: address to read from
+ * @size: size of the data chunk
+ *
+ * Safely read from kernel address @src to the buffer at @dst. If a kernel
+ * fault happens, handle that and return -EFAULT.
*
* We ensure that the copy_from_user is executed in atomic context so that
* do_page_fault() doesn't attempt to take mmap_sem. This makes
* probe_kernel_read() suitable for use within regions where the caller
* already holds mmap_sem, or other locks which nest inside mmap_sem.
- *
- * probe_kernel_read_strict() is the same as probe_kernel_read() except for
- * the case where architectures have non-overlapping user and kernel address
- * ranges: probe_kernel_read_strict() will additionally return -EFAULT for
- * probing memory on a user address range where probe_user_read() is supposed
- * to be used instead.
*/
-
-long __weak probe_kernel_read(void *dst, const void *src, size_t size)
- __attribute__((alias("__probe_kernel_read")));
-
long __weak probe_kernel_read_strict(void *dst, const void *src, size_t size)
__attribute__((alias("__probe_kernel_read")));

@@ -153,15 +159,34 @@ long probe_user_write(void __user *dst, const void *src, size_t size)
* If @count is smaller than the length of the string, copies @count-1 bytes,
* sets the last byte of @dst buffer to NUL and returns @count.
*
- * strncpy_from_unsafe_strict() is the same as strncpy_from_unsafe() except
- * for the case where architectures have non-overlapping user and kernel address
- * ranges: strncpy_from_unsafe_strict() will additionally return -EFAULT for
- * probing memory on a user address range where strncpy_from_unsafe_user() is
- * supposed to be used instead.
+ * Same as strncpy_from_unsafe_strict() except that for architectures with
+ * not fully separated user and kernel address spaces this function also works
+ * for user address tanges.
+ *
+ * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
+ * separate kernel and user address spaces, and also a bad idea otherwise.
*/
long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
__attribute__((alias("__strncpy_from_unsafe")));

+/**
+ * strncpy_from_unsafe_strict: - Copy a NUL terminated string from unsafe
+ * address.
+ * @dst: Destination address, in kernel space. This buffer must be at
+ * least @count bytes long.
+ * @unsafe_addr: Unsafe address.
+ * @count: Maximum number of bytes to copy, including the trailing NUL.
+ *
+ * Copies a NUL-terminated string from unsafe address to kernel buffer.
+ *
+ * On success, returns the length of the string INCLUDING the trailing NUL.
+ *
+ * If access fails, returns -EFAULT (some data may have been copied
+ * and the trailing NUL added).
+ *
+ * If @count is smaller than the length of the string, copies @count-1 bytes,
+ * sets the last byte of @dst buffer to NUL and returns @count.
+ */
long __weak strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
long count)
__attribute__((alias("__strncpy_from_unsafe")));
--
2.26.2

2020-05-21 15:25:38

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 07/23] maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_nofault

This matches the naming of strncpy_from_user_nofault, and also makes it
more clear what the function is supposed to do.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/x86/mm/maccess.c | 2 +-
include/linux/uaccess.h | 4 ++--
kernel/trace/bpf_trace.c | 4 ++--
mm/maccess.c | 6 +++---
4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
index f5b85bdc0535c..62c4017a2473d 100644
--- a/arch/x86/mm/maccess.c
+++ b/arch/x86/mm/maccess.c
@@ -34,7 +34,7 @@ long probe_kernel_read_strict(void *dst, const void *src, size_t size)
return __probe_kernel_read(dst, src, size);
}

-long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr, long count)
+long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
{
if (unlikely(invalid_probe_range((unsigned long)unsafe_addr)))
return -EFAULT;
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index b983cb1c1216a..134ff9c1c151b 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -310,8 +310,8 @@ extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
extern long notrace probe_user_write(void __user *dst, const void *src, size_t size);

extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
-extern long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
- long count);
+long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
+ long count);
extern long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
long count);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 4e20bf1d95832..f5231ffea85b9 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -240,7 +240,7 @@ bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr,
* is returned that can be used for bpf_perf_event_output() et al.
*/
ret = compat ? strncpy_from_unsafe(dst, unsafe_ptr, size) :
- strncpy_from_unsafe_strict(dst, unsafe_ptr, size);
+ strncpy_from_kernel_nofault(dst, unsafe_ptr, size);
if (unlikely(ret < 0))
out:
memset(dst, 0, size);
@@ -412,7 +412,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
break;
#endif
case 'k':
- strncpy_from_unsafe_strict(buf, unsafe_ptr,
+ strncpy_from_kernel_nofault(buf, unsafe_ptr,
sizeof(buf));
break;
case 'u':
diff --git a/mm/maccess.c b/mm/maccess.c
index 457d8f9bf714f..c8748c2809096 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -159,7 +159,7 @@ long probe_user_write(void __user *dst, const void *src, size_t size)
* If @count is smaller than the length of the string, copies @count-1 bytes,
* sets the last byte of @dst buffer to NUL and returns @count.
*
- * Same as strncpy_from_unsafe_strict() except that for architectures with
+ * Same as strncpy_from_kernel_nofault() except that for architectures with
* not fully separated user and kernel address spaces this function also works
* for user address tanges.
*
@@ -170,7 +170,7 @@ long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
__attribute__((alias("__strncpy_from_unsafe")));

/**
- * strncpy_from_unsafe_strict: - Copy a NUL terminated string from unsafe
+ * strncpy_from_kernel_nofault: - Copy a NUL terminated string from unsafe
* address.
* @dst: Destination address, in kernel space. This buffer must be at
* least @count bytes long.
@@ -187,7 +187,7 @@ long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
* If @count is smaller than the length of the string, copies @count-1 bytes,
* sets the last byte of @dst buffer to NUL and returns @count.
*/
-long __weak strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
+long __weak strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
long count)
__attribute__((alias("__strncpy_from_unsafe")));

--
2.26.2

2020-05-21 15:25:41

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 09/23] maccess: remove probe_read_common and probe_write_common

Each of the helpers has just two callers, which also different in
dealing with kernel or userspace pointers. Just open code the logic
in the callers.

Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/maccess.c | 63 ++++++++++++++++++++++++----------------------------
1 file changed, 29 insertions(+), 34 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index e783ebfccd542..31cf6604e7fff 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -6,30 +6,6 @@
#include <linux/mm.h>
#include <linux/uaccess.h>

-static __always_inline long
-probe_read_common(void *dst, const void __user *src, size_t size)
-{
- long ret;
-
- pagefault_disable();
- ret = __copy_from_user_inatomic(dst, src, size);
- pagefault_enable();
-
- return ret ? -EFAULT : 0;
-}
-
-static __always_inline long
-probe_write_common(void __user *dst, const void *src, size_t size)
-{
- long ret;
-
- pagefault_disable();
- ret = __copy_to_user_inatomic(dst, src, size);
- pagefault_enable();
-
- return ret ? -EFAULT : 0;
-}
-
/**
* probe_kernel_read(): safely attempt to read from any location
* @dst: pointer to the buffer that shall take the data
@@ -69,10 +45,15 @@ long __probe_kernel_read(void *dst, const void *src, size_t size)
mm_segment_t old_fs = get_fs();

set_fs(KERNEL_DS);
- ret = probe_read_common(dst, (__force const void __user *)src, size);
+ pagefault_disable();
+ ret = __copy_from_user_inatomic(dst, (__force const void __user *)src,
+ size);
+ pagefault_enable();
set_fs(old_fs);

- return ret;
+ if (ret)
+ return -EFAULT;
+ return 0;
}
EXPORT_SYMBOL_GPL(probe_kernel_read);

@@ -91,11 +72,16 @@ long probe_user_read(void *dst, const void __user *src, size_t size)
mm_segment_t old_fs = get_fs();

set_fs(USER_DS);
- if (access_ok(src, size))
- ret = probe_read_common(dst, src, size);
+ if (access_ok(src, size)) {
+ pagefault_disable();
+ ret = __copy_from_user_inatomic(dst, src, size);
+ pagefault_enable();
+ }
set_fs(old_fs);

- return ret;
+ if (ret)
+ return -EFAULT;
+ return 0;
}
EXPORT_SYMBOL_GPL(probe_user_read);

@@ -114,10 +100,14 @@ long probe_kernel_write(void *dst, const void *src, size_t size)
mm_segment_t old_fs = get_fs();

set_fs(KERNEL_DS);
- ret = probe_write_common((__force void __user *)dst, src, size);
+ pagefault_disable();
+ ret = __copy_to_user_inatomic((__force void __user *)dst, src, size);
+ pagefault_enable();
set_fs(old_fs);

- return ret;
+ if (ret)
+ return -EFAULT;
+ return 0;
}

/**
@@ -135,11 +125,16 @@ long probe_user_write(void __user *dst, const void *src, size_t size)
mm_segment_t old_fs = get_fs();

set_fs(USER_DS);
- if (access_ok(dst, size))
- ret = probe_write_common(dst, src, size);
+ if (access_ok(dst, size)) {
+ pagefault_disable();
+ ret = __copy_to_user_inatomic(dst, src, size);
+ pagefault_enable();
+ }
set_fs(old_fs);

- return ret;
+ if (ret)
+ return -EFAULT;
+ return 0;
}

/**
--
2.26.2

2020-05-21 15:25:46

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 08/23] maccess: rename strnlen_unsafe_user to strnlen_user_nofault

This matches the naming of strnlen_user, and also makes it more clear
what the function is supposed to do.

Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/uaccess.h | 2 +-
kernel/trace/trace_kprobe.c | 2 +-
mm/maccess.c | 4 ++--
3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 134ff9c1c151b..d8366f8468664 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -315,7 +315,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
extern long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
long count);
-extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count);
+long strnlen_user_nofault(const void __user *unsafe_addr, long count);

/**
* probe_kernel_address(): safely attempt to read from a location
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index d600f41fda1ca..4325f9e7fadaa 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1221,7 +1221,7 @@ fetch_store_strlen_user(unsigned long addr)
{
const void __user *uaddr = (__force const void __user *)addr;

- return strnlen_unsafe_user(uaddr, MAX_STRING_SIZE);
+ return strnlen_user_nofault(uaddr, MAX_STRING_SIZE);
}

/*
diff --git a/mm/maccess.c b/mm/maccess.c
index c8748c2809096..e783ebfccd542 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -258,7 +258,7 @@ long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
}

/**
- * strnlen_unsafe_user: - Get the size of a user string INCLUDING final NUL.
+ * strnlen_user_nofault: - Get the size of a user string INCLUDING final NUL.
* @unsafe_addr: The string to measure.
* @count: Maximum count (including NUL)
*
@@ -273,7 +273,7 @@ long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
* Unlike strnlen_user, this can be used from IRQ handler etc. because
* it disables pagefaults.
*/
-long strnlen_unsafe_user(const void __user *unsafe_addr, long count)
+long strnlen_user_nofault(const void __user *unsafe_addr, long count)
{
mm_segment_t old_fs = get_fs();
int ret;
--
2.26.2

2020-05-21 15:25:52

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 10/23] maccess: unify the probe kernel arch hooks

Currently architectures have to override every routine that probes
kernel memory, which includes a pure read and strcpy, both in strict
and not strict variants. Just provide a single arch hooks instead to
make sure all architectures cover all the cases.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/parisc/lib/memcpy.c | 12 ++++------
arch/um/kernel/maccess.c | 10 ++++----
arch/x86/mm/maccess.c | 33 ++++++++++-----------------
include/linux/uaccess.h | 6 +++--
mm/maccess.c | 49 ++++++++++++++++++++++++++++++----------
5 files changed, 61 insertions(+), 49 deletions(-)

diff --git a/arch/parisc/lib/memcpy.c b/arch/parisc/lib/memcpy.c
index beceaab34ecb7..5b75c35d1da0d 100644
--- a/arch/parisc/lib/memcpy.c
+++ b/arch/parisc/lib/memcpy.c
@@ -57,14 +57,10 @@ void * memcpy(void * dst,const void *src, size_t count)
EXPORT_SYMBOL(raw_copy_in_user);
EXPORT_SYMBOL(memcpy);

-long probe_kernel_read(void *dst, const void *src, size_t size)
+bool probe_kernel_read_allowed(const void *unsafe_src, size_t size, bool strict)
{
- unsigned long addr = (unsigned long)src;
-
- if (addr < PAGE_SIZE)
- return -EFAULT;
-
+ if ((unsigned long)unsafe_src < PAGE_SIZE)
+ return false;
/* check for I/O space F_EXTEND(0xfff00000) access as well? */
-
- return __probe_kernel_read(dst, src, size);
+ return true;
}
diff --git a/arch/um/kernel/maccess.c b/arch/um/kernel/maccess.c
index 67b2e0fa92bba..ad2c538ce497c 100644
--- a/arch/um/kernel/maccess.c
+++ b/arch/um/kernel/maccess.c
@@ -7,15 +7,13 @@
#include <linux/kernel.h>
#include <os.h>

-long probe_kernel_read(void *dst, const void *src, size_t size)
+bool probe_kernel_read_allowed(const void *src, size_t size, bool strict)
{
void *psrc = (void *)rounddown((unsigned long)src, PAGE_SIZE);

if ((unsigned long)src < PAGE_SIZE || size <= 0)
- return -EFAULT;
-
+ return false;
if (os_mincore(psrc, size + src - psrc) <= 0)
- return -EFAULT;
-
- return __probe_kernel_read(dst, src, size);
+ return false;
+ return true;
}
diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
index 62c4017a2473d..a96a56ff16109 100644
--- a/arch/x86/mm/maccess.c
+++ b/arch/x86/mm/maccess.c
@@ -9,35 +9,26 @@ static __always_inline u64 canonical_address(u64 vaddr, u8 vaddr_bits)
return ((s64)vaddr << (64 - vaddr_bits)) >> (64 - vaddr_bits);
}

-static __always_inline bool invalid_probe_range(u64 vaddr)
+bool probe_kernel_read_allowed(const void *unsafe_src, size_t size, bool strict)
{
+ unsigned long vaddr = (unsigned long)unsafe_src;
+
+ if (!strict)
+ return true;
+
/*
* Range covering the highest possible canonical userspace address
* as well as non-canonical address range. For the canonical range
* we also need to include the userspace guard page.
*/
- return vaddr < TASK_SIZE_MAX + PAGE_SIZE ||
- canonical_address(vaddr, boot_cpu_data.x86_virt_bits) != vaddr;
+ return vaddr >= TASK_SIZE_MAX + PAGE_SIZE &&
+ canonical_address(vaddr, boot_cpu_data.x86_virt_bits) == vaddr;
}
#else
-static __always_inline bool invalid_probe_range(u64 vaddr)
+bool probe_kernel_read_allowed(const void *unsafe_src, size_t size, bool strict)
{
- return vaddr < TASK_SIZE_MAX;
+ if (!strict)
+ return true;
+ return (unsigned long)vaddr >= TASK_SIZE_MAX;
}
#endif
-
-long probe_kernel_read_strict(void *dst, const void *src, size_t size)
-{
- if (unlikely(invalid_probe_range((unsigned long)src)))
- return -EFAULT;
-
- return __probe_kernel_read(dst, src, size);
-}
-
-long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
-{
- if (unlikely(invalid_probe_range((unsigned long)unsafe_addr)))
- return -EFAULT;
-
- return __strncpy_from_unsafe(dst, unsafe_addr, count);
-}
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index d8366f8468664..65a37ae3b8871 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -301,9 +301,11 @@ copy_struct_from_user(void *dst, size_t ksize, const void __user *src,
return 0;
}

+bool probe_kernel_read_allowed(const void *unsafe_src, size_t size,
+ bool strict);
+
extern long probe_kernel_read(void *dst, const void *src, size_t size);
extern long probe_kernel_read_strict(void *dst, const void *src, size_t size);
-extern long __probe_kernel_read(void *dst, const void *src, size_t size);
extern long probe_user_read(void *dst, const void __user *src, size_t size);

extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
@@ -312,7 +314,7 @@ extern long notrace probe_user_write(void __user *dst, const void *src, size_t s
extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
long count);
-extern long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
+
long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
long count);
long strnlen_user_nofault(const void __user *unsafe_addr, long count);
diff --git a/mm/maccess.c b/mm/maccess.c
index 31cf6604e7fff..6116742608217 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -6,6 +6,17 @@
#include <linux/mm.h>
#include <linux/uaccess.h>

+static long __probe_kernel_read(void *dst, const void *src, size_t size,
+ bool strict);
+static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
+ long count, bool strict);
+
+bool __weak probe_kernel_read_allowed(const void *unsafe_src, size_t size,
+ bool strict)
+{
+ return true;
+}
+
/**
* probe_kernel_read(): safely attempt to read from any location
* @dst: pointer to the buffer that shall take the data
@@ -19,8 +30,11 @@
* DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
* separate kernel and user address spaces, and also a bad idea otherwise.
*/
-long __weak probe_kernel_read(void *dst, const void *src, size_t size)
- __attribute__((alias("__probe_kernel_read")));
+long probe_kernel_read(void *dst, const void *src, size_t size)
+{
+ return __probe_kernel_read(dst, src, size, false);
+}
+EXPORT_SYMBOL_GPL(probe_kernel_read);

/**
* probe_kernel_read_strict(): safely attempt to read from kernel-space
@@ -36,14 +50,20 @@ long __weak probe_kernel_read(void *dst, const void *src, size_t size)
* probe_kernel_read() suitable for use within regions where the caller
* already holds mmap_sem, or other locks which nest inside mmap_sem.
*/
-long __weak probe_kernel_read_strict(void *dst, const void *src, size_t size)
- __attribute__((alias("__probe_kernel_read")));
+long probe_kernel_read_strict(void *dst, const void *src, size_t size)
+{
+ return __probe_kernel_read(dst, src, size, true);
+}

-long __probe_kernel_read(void *dst, const void *src, size_t size)
+static long __probe_kernel_read(void *dst, const void *src, size_t size,
+ bool strict)
{
long ret;
mm_segment_t old_fs = get_fs();

+ if (!probe_kernel_read_allowed(src, size, strict))
+ return -EFAULT;
+
set_fs(KERNEL_DS);
pagefault_disable();
ret = __copy_from_user_inatomic(dst, (__force const void __user *)src,
@@ -55,7 +75,6 @@ long __probe_kernel_read(void *dst, const void *src, size_t size)
return -EFAULT;
return 0;
}
-EXPORT_SYMBOL_GPL(probe_kernel_read);

/**
* probe_user_read(): safely attempt to read from a user-space location
@@ -161,8 +180,10 @@ long probe_user_write(void __user *dst, const void *src, size_t size)
* DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
* separate kernel and user address spaces, and also a bad idea otherwise.
*/
-long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
- __attribute__((alias("__strncpy_from_unsafe")));
+long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
+{
+ return __strncpy_from_unsafe(dst, unsafe_addr, count, false);
+}

/**
* strncpy_from_kernel_nofault: - Copy a NUL terminated string from unsafe
@@ -182,11 +203,13 @@ long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
* If @count is smaller than the length of the string, copies @count-1 bytes,
* sets the last byte of @dst buffer to NUL and returns @count.
*/
-long __weak strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
- long count)
- __attribute__((alias("__strncpy_from_unsafe")));
+long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
+{
+ return __strncpy_from_unsafe(dst, unsafe_addr, count, true);
+}

-long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
+static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
+ long count, bool strict)
{
mm_segment_t old_fs = get_fs();
const void *src = unsafe_addr;
@@ -194,6 +217,8 @@ long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)

if (unlikely(count <= 0))
return 0;
+ if (!probe_kernel_read_allowed(unsafe_addr, count, strict))
+ return -EFAULT;

set_fs(KERNEL_DS);
pagefault_disable();
--
2.26.2

2020-05-21 15:26:16

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 12/23] bpf: handle the compat string in bpf_trace_copy_string better

User the proper helper for kernel or userspace addresses based on
TASK_SIZE instead of the dangerous strncpy_from_unsafe function.

Signed-off-by: Christoph Hellwig <[email protected]>
---
kernel/trace/bpf_trace.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 9d4080590f711..737d739230a6b 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -331,8 +331,11 @@ static void bpf_trace_copy_string(char *buf, void *unsafe_ptr, char fmt_ptype,
switch (fmt_ptype) {
case 's':
#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
- strncpy_from_unsafe(buf, unsafe_ptr, bufsz);
- break;
+ if ((unsigned long)unsafe_ptr < TASK_SIZE) {
+ strncpy_from_user_nofault(buf, user_ptr, bufsz);
+ break;
+ }
+ fallthrough;
#endif
case 'k':
strncpy_from_kernel_nofault(buf, unsafe_ptr, bufsz);
--
2.26.2

2020-05-21 15:26:16

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 14/23] tracing/kprobes: handle mixed kernel/userspace probes better

Instead of using the dangerous probe_kernel_read and strncpy_from_unsafe
helpers, rework probes to try a user probe based on the address if the
architecture has a common address space for kernel and userspace.

Signed-off-by: Christoph Hellwig <[email protected]>
---
kernel/trace/trace_kprobe.c | 72 ++++++++++++++++++++++---------------
1 file changed, 43 insertions(+), 29 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 4325f9e7fadaa..4aeaef53ba970 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1200,6 +1200,15 @@ static const struct file_operations kprobe_profile_ops = {

/* Kprobe specific fetch functions */

+/* Return the length of string -- including null terminal byte */
+static nokprobe_inline int
+fetch_store_strlen_user(unsigned long addr)
+{
+ const void __user *uaddr = (__force const void __user *)addr;
+
+ return strnlen_user_nofault(uaddr, MAX_STRING_SIZE);
+}
+
/* Return the length of string -- including null terminal byte */
static nokprobe_inline int
fetch_store_strlen(unsigned long addr)
@@ -1207,30 +1216,27 @@ fetch_store_strlen(unsigned long addr)
int ret, len = 0;
u8 c;

+#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
+ if (addr < TASK_SIZE)
+ return fetch_store_strlen_user(addr);
+#endif
+
do {
- ret = probe_kernel_read(&c, (u8 *)addr + len, 1);
+ ret = probe_kernel_read_strict(&c, (u8 *)addr + len, 1);
len++;
} while (c && ret == 0 && len < MAX_STRING_SIZE);

return (ret < 0) ? ret : len;
}

-/* Return the length of string -- including null terminal byte */
-static nokprobe_inline int
-fetch_store_strlen_user(unsigned long addr)
-{
- const void __user *uaddr = (__force const void __user *)addr;
-
- return strnlen_user_nofault(uaddr, MAX_STRING_SIZE);
-}
-
/*
- * Fetch a null-terminated string. Caller MUST set *(u32 *)buf with max
- * length and relative data location.
+ * Fetch a null-terminated string from user. Caller MUST set *(u32 *)buf
+ * with max length and relative data location.
*/
static nokprobe_inline int
-fetch_store_string(unsigned long addr, void *dest, void *base)
+fetch_store_string_user(unsigned long addr, void *dest, void *base)
{
+ const void __user *uaddr = (__force const void __user *)addr;
int maxlen = get_loc_len(*(u32 *)dest);
void *__dest;
long ret;
@@ -1240,11 +1246,7 @@ fetch_store_string(unsigned long addr, void *dest, void *base)

__dest = get_loc_data(dest, base);

- /*
- * Try to get string again, since the string can be changed while
- * probing.
- */
- ret = strncpy_from_unsafe(__dest, (void *)addr, maxlen);
+ ret = strncpy_from_user_nofault(__dest, uaddr, maxlen);
if (ret >= 0)
*(u32 *)dest = make_data_loc(ret, __dest - base);

@@ -1252,35 +1254,37 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
}

/*
- * Fetch a null-terminated string from user. Caller MUST set *(u32 *)buf
- * with max length and relative data location.
+ * Fetch a null-terminated string. Caller MUST set *(u32 *)buf with max
+ * length and relative data location.
*/
static nokprobe_inline int
-fetch_store_string_user(unsigned long addr, void *dest, void *base)
+fetch_store_string(unsigned long addr, void *dest, void *base)
{
- const void __user *uaddr = (__force const void __user *)addr;
int maxlen = get_loc_len(*(u32 *)dest);
void *__dest;
long ret;

+#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
+ if ((unsigned long)addr < TASK_SIZE)
+ return fetch_store_string_user(addr, dest, base);
+#endif
+
if (unlikely(!maxlen))
return -ENOMEM;

__dest = get_loc_data(dest, base);

- ret = strncpy_from_user_nofault(__dest, uaddr, maxlen);
+ /*
+ * Try to get string again, since the string can be changed while
+ * probing.
+ */
+ ret = strncpy_from_user_nofault(__dest, (void *)addr, maxlen);
if (ret >= 0)
*(u32 *)dest = make_data_loc(ret, __dest - base);

return ret;
}

-static nokprobe_inline int
-probe_mem_read(void *dest, void *src, size_t size)
-{
- return probe_kernel_read(dest, src, size);
-}
-
static nokprobe_inline int
probe_mem_read_user(void *dest, void *src, size_t size)
{
@@ -1289,6 +1293,16 @@ probe_mem_read_user(void *dest, void *src, size_t size)
return probe_user_read(dest, uaddr, size);
}

+static nokprobe_inline int
+probe_mem_read(void *dest, void *src, size_t size)
+{
+#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
+ if ((unsigned long)src < TASK_SIZE)
+ return probe_mem_read_user(dest, src, size);
+#endif
+ return probe_kernel_read_strict(dest, src, size);
+}
+
/* Note that we don't verify it, since the code does not come from user space */
static int
process_fetch_insn(struct fetch_insn *code, struct pt_regs *regs, void *dest,
--
2.26.2

2020-05-21 15:26:22

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 17/23] maccess: move user access routines together

Move kernel access vs user access routines together to ease upcoming
ifdefs.

Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/maccess.c | 110 +++++++++++++++++++++++++--------------------------
1 file changed, 55 insertions(+), 55 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 81a85c1e71165..ffde1ae86ec51 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -46,34 +46,6 @@ long probe_kernel_read(void *dst, const void *src, size_t size)
}
EXPORT_SYMBOL_GPL(probe_kernel_read);

-/**
- * probe_user_read(): safely attempt to read from a user-space location
- * @dst: pointer to the buffer that shall take the data
- * @src: address to read from. This must be a user address.
- * @size: size of the data chunk
- *
- * Safely read from user address @src to the buffer at @dst. If a kernel fault
- * happens, handle that and return -EFAULT.
- */
-long probe_user_read(void *dst, const void __user *src, size_t size)
-{
- long ret = -EFAULT;
- mm_segment_t old_fs = get_fs();
-
- set_fs(USER_DS);
- if (access_ok(src, size)) {
- pagefault_disable();
- ret = __copy_from_user_inatomic(dst, src, size);
- pagefault_enable();
- }
- set_fs(old_fs);
-
- if (ret)
- return -EFAULT;
- return 0;
-}
-EXPORT_SYMBOL_GPL(probe_user_read);
-
/**
* probe_kernel_write(): safely attempt to write to a location
* @dst: address to write to
@@ -99,33 +71,6 @@ long probe_kernel_write(void *dst, const void *src, size_t size)
return 0;
}

-/**
- * probe_user_write(): safely attempt to write to a user-space location
- * @dst: address to write to
- * @src: pointer to the data that shall be written
- * @size: size of the data chunk
- *
- * Safely write to address @dst from the buffer at @src. If a kernel fault
- * happens, handle that and return -EFAULT.
- */
-long probe_user_write(void __user *dst, const void *src, size_t size)
-{
- long ret = -EFAULT;
- mm_segment_t old_fs = get_fs();
-
- set_fs(USER_DS);
- if (access_ok(dst, size)) {
- pagefault_disable();
- ret = __copy_to_user_inatomic(dst, src, size);
- pagefault_enable();
- }
- set_fs(old_fs);
-
- if (ret)
- return -EFAULT;
- return 0;
-}
-
/**
* strncpy_from_kernel_nofault: - Copy a NUL terminated string from unsafe
* address.
@@ -169,6 +114,61 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
return ret ? -EFAULT : src - unsafe_addr;
}

+/**
+ * probe_user_read(): safely attempt to read from a user-space location
+ * @dst: pointer to the buffer that shall take the data
+ * @src: address to read from. This must be a user address.
+ * @size: size of the data chunk
+ *
+ * Safely read from user address @src to the buffer at @dst. If a kernel fault
+ * happens, handle that and return -EFAULT.
+ */
+long probe_user_read(void *dst, const void __user *src, size_t size)
+{
+ long ret = -EFAULT;
+ mm_segment_t old_fs = get_fs();
+
+ set_fs(USER_DS);
+ if (access_ok(src, size)) {
+ pagefault_disable();
+ ret = __copy_from_user_inatomic(dst, src, size);
+ pagefault_enable();
+ }
+ set_fs(old_fs);
+
+ if (ret)
+ return -EFAULT;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(probe_user_read);
+
+/**
+ * probe_user_write(): safely attempt to write to a user-space location
+ * @dst: address to write to
+ * @src: pointer to the data that shall be written
+ * @size: size of the data chunk
+ *
+ * Safely write to address @dst from the buffer at @src. If a kernel fault
+ * happens, handle that and return -EFAULT.
+ */
+long probe_user_write(void __user *dst, const void *src, size_t size)
+{
+ long ret = -EFAULT;
+ mm_segment_t old_fs = get_fs();
+
+ set_fs(USER_DS);
+ if (access_ok(dst, size)) {
+ pagefault_disable();
+ ret = __copy_to_user_inatomic(dst, src, size);
+ pagefault_enable();
+ }
+ set_fs(old_fs);
+
+ if (ret)
+ return -EFAULT;
+ return 0;
+}
+
/**
* strncpy_from_user_nofault: - Copy a NUL terminated string from unsafe user
* address.
--
2.26.2

2020-05-21 15:26:24

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 18/23] maccess: allow architectures to provide kernel probing directly

Provide alternative versions of probe_kernel_read, probe_kernel_write
and strncpy_from_kernel_unsafe that don't need set_fs magic, but instead
use arch hooks that are modelled after unsafe_{get,put}_user to access
kernel memory in an exception safe way.

Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/maccess.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 76 insertions(+)

diff --git a/mm/maccess.c b/mm/maccess.c
index ffde1ae86ec51..e19d19cd5382c 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -11,6 +11,81 @@ bool __weak probe_kernel_read_allowed(const void *unsafe_src, size_t size)
return true;
}

+#ifdef HAVE_GET_KERNEL_NOFAULT
+
+#define probe_kernel_read_loop(dst, src, len, type, err_label) \
+ while (len >= sizeof(type)) { \
+ __get_kernel_nofault(dst, src, type, err_label); \
+ dst += sizeof(type); \
+ src += sizeof(type); \
+ len -= sizeof(type); \
+ }
+
+long probe_kernel_read(void *dst, const void *src, size_t size)
+{
+ if (!probe_kernel_read_allowed(src, size))
+ return -EFAULT;
+
+ pagefault_disable();
+ probe_kernel_read_loop(dst, src, size, u64, Efault);
+ probe_kernel_read_loop(dst, src, size, u32, Efault);
+ probe_kernel_read_loop(dst, src, size, u16, Efault);
+ probe_kernel_read_loop(dst, src, size, u8, Efault);
+ pagefault_enable();
+ return 0;
+Efault:
+ pagefault_enable();
+ return -EFAULT;
+}
+EXPORT_SYMBOL_GPL(probe_kernel_read);
+
+#define probe_kernel_write_loop(dst, src, len, type, err_label) \
+ while (len >= sizeof(type)) { \
+ __put_kernel_nofault(dst, src, type, err_label); \
+ dst += sizeof(type); \
+ src += sizeof(type); \
+ len -= sizeof(type); \
+ }
+
+long probe_kernel_write(void *dst, const void *src, size_t size)
+{
+ pagefault_disable();
+ probe_kernel_write_loop(dst, src, size, u64, Efault);
+ probe_kernel_write_loop(dst, src, size, u32, Efault);
+ probe_kernel_write_loop(dst, src, size, u16, Efault);
+ probe_kernel_write_loop(dst, src, size, u8, Efault);
+ pagefault_enable();
+ return 0;
+Efault:
+ pagefault_enable();
+ return -EFAULT;
+}
+
+long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
+{
+ const void *src = unsafe_addr;
+
+ if (unlikely(count <= 0))
+ return 0;
+ if (!probe_kernel_read_allowed(unsafe_addr, count))
+ return -EFAULT;
+
+ pagefault_disable();
+ do {
+ __get_kernel_nofault(dst, src, u8, Efault);
+ dst++;
+ src++;
+ } while (dst[-1] && src - unsafe_addr < count);
+ pagefault_enable();
+
+ dst[-1] = '\0';
+ return src - unsafe_addr;
+Efault:
+ pagefault_enable();
+ dst[-1] = '\0';
+ return -EFAULT;
+}
+#else /* HAVE_GET_KERNEL_NOFAULT */
/**
* probe_kernel_read(): safely attempt to read from kernel-space
* @dst: pointer to the buffer that shall take the data
@@ -113,6 +188,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)

return ret ? -EFAULT : src - unsafe_addr;
}
+#endif /* HAVE_GET_KERNEL_NOFAULT */

/**
* probe_user_read(): safely attempt to read from a user-space location
--
2.26.2

2020-05-21 15:26:26

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 19/23] x86: use non-set_fs based maccess routines

Provide arch_kernel_read and arch_kernel_write routines to implement the
maccess routines without messing with set_fs and without stac/clac that
opens up access to user space.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/x86/include/asm/uaccess.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index d8f283b9a569c..b3e0a56ae961a 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -523,5 +523,21 @@ do { \
unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, label); \
} while (0)

+#define HAVE_GET_KERNEL_NOFAULT
+
+#define __get_kernel_nofault(dst, src, type, err_label) \
+do { \
+ int __kr_err; \
+ \
+ __get_user_size(*((type *)dst), (__force type __user *)src, \
+ sizeof(type), __kr_err); \
+ if (unlikely(__kr_err)) \
+ goto err_label; \
+} while (0)
+
+#define __put_kernel_nofault(dst, src, type, err_label) \
+ __put_user_size(*((type *)(src)), (__force type __user *)(dst), \
+ sizeof(type), err_label)
+
#endif /* _ASM_X86_UACCESS_H */

--
2.26.2

2020-05-21 15:26:33

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 21/23] maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault

Better describe what these functions do.

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/powerpc/kernel/process.c | 3 ++-
arch/powerpc/kvm/book3s_64_mmu_radix.c | 4 ++--
arch/powerpc/mm/fault.c | 2 +-
arch/powerpc/oprofile/backtrace.c | 6 ++++--
arch/powerpc/perf/callchain_32.c | 2 +-
arch/powerpc/perf/callchain_64.c | 2 +-
arch/powerpc/perf/core-book3s.c | 3 ++-
arch/powerpc/sysdev/fsl_pci.c | 4 ++--
include/linux/uaccess.h | 4 ++--
kernel/trace/bpf_trace.c | 4 ++--
kernel/trace/trace_kprobe.c | 2 +-
mm/maccess.c | 10 +++++-----
12 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 9c21288f86455..d5d6136b13480 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1294,7 +1294,8 @@ void show_user_instructions(struct pt_regs *regs)
for (i = 0; i < 8 && n; i++, n--, pc += sizeof(int)) {
int instr;

- if (probe_user_read(&instr, (void __user *)pc, sizeof(instr))) {
+ if (copy_from_user_nofault(&instr, (void __user *)pc,
+ sizeof(instr))) {
seq_buf_printf(&s, "XXXXXXXX ");
continue;
}
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index aa12cd4078b32..9d25f2eb5a33a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -64,9 +64,9 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid,
isync();

if (is_load)
- ret = probe_user_read(to, (const void __user *)from, n);
+ ret = copy_from_user_nofault(to, (const void __user *)from, n);
else
- ret = probe_user_write((void __user *)to, from, n);
+ ret = copy_to_user_nofault((void __user *)to, from, n);

/* switch the pid first to avoid running host with unallocated pid */
if (quadrant == 1 && pid != old_pid)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 84af6c8eecf71..231664fe9d126 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -280,7 +280,7 @@ static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
access_ok(nip, sizeof(*nip))) {
unsigned int inst;

- if (!probe_user_read(&inst, nip, sizeof(inst)))
+ if (!copy_from_user_nofault(&inst, nip, sizeof(inst)))
return !store_updates_sp(inst);
*must_retry = true;
}
diff --git a/arch/powerpc/oprofile/backtrace.c b/arch/powerpc/oprofile/backtrace.c
index 6f347fa29f41e..9db7ada79d10d 100644
--- a/arch/powerpc/oprofile/backtrace.c
+++ b/arch/powerpc/oprofile/backtrace.c
@@ -33,7 +33,8 @@ static unsigned int user_getsp32(unsigned int sp, int is_first)
* which means that we've done all that we can do from
* interrupt context.
*/
- if (probe_user_read(stack_frame, (void __user *)p, sizeof(stack_frame)))
+ if (copy_from_user_nofault(stack_frame, (void __user *)p,
+ sizeof(stack_frame)))
return 0;

if (!is_first)
@@ -51,7 +52,8 @@ static unsigned long user_getsp64(unsigned long sp, int is_first)
{
unsigned long stack_frame[3];

- if (probe_user_read(stack_frame, (void __user *)sp, sizeof(stack_frame)))
+ if (copy_from_user_nofault(stack_frame, (void __user *)sp,
+ sizeof(stack_frame)))
return 0;

if (!is_first)
diff --git a/arch/powerpc/perf/callchain_32.c b/arch/powerpc/perf/callchain_32.c
index 8aa9510031415..2e21849f82b18 100644
--- a/arch/powerpc/perf/callchain_32.c
+++ b/arch/powerpc/perf/callchain_32.c
@@ -45,7 +45,7 @@ static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
((unsigned long)ptr & 3))
return -EFAULT;

- rc = probe_user_read(ret, ptr, sizeof(*ret));
+ rc = copy_from_user_nofault(ret, ptr, sizeof(*ret));

if (IS_ENABLED(CONFIG_PPC64) && rc)
return read_user_stack_slow(ptr, ret, 4);
diff --git a/arch/powerpc/perf/callchain_64.c b/arch/powerpc/perf/callchain_64.c
index df1ffd8b20f21..7b0121694ebb7 100644
--- a/arch/powerpc/perf/callchain_64.c
+++ b/arch/powerpc/perf/callchain_64.c
@@ -71,7 +71,7 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
((unsigned long)ptr & 7))
return -EFAULT;

- if (!probe_user_read(ret, ptr, sizeof(*ret)))
+ if (!copy_from_user_nofault(ret, ptr, sizeof(*ret)))
return 0;

return read_user_stack_slow(ptr, ret, 8);
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 50bc9f0eb6be3..f8072d1e5d172 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -426,7 +426,8 @@ static __u64 power_pmu_bhrb_to(u64 addr)
}

/* Userspace: need copy instruction here then translate it */
- if (probe_user_read(&instr, (unsigned int __user *)addr, sizeof(instr)))
+ if (copy_from_user_nofault(&instr, (unsigned int __user *)addr,
+ sizeof(instr)))
return 0;

target = branch_target(&instr);
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 4a8874bc10574..73fa37ca40ef9 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -1066,8 +1066,8 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)

if (is_in_pci_mem_space(addr)) {
if (user_mode(regs))
- ret = probe_user_read(&inst, (void __user *)regs->nip,
- sizeof(inst));
+ ret = copy_from_user_nofault(&inst,
+ (void __user *)regs->nip, sizeof(inst));
else
ret = probe_kernel_address((void *)regs->nip, inst);

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 329e8479798de..5214d2f1f2cef 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -306,8 +306,8 @@ bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size);
long copy_from_kernel_nofault(void *dst, const void *src, size_t size);
long notrace copy_to_kernel_nofault(void *dst, const void *src, size_t size);

-extern long probe_user_read(void *dst, const void __user *src, size_t size);
-extern long notrace probe_user_write(void __user *dst, const void *src,
+long copy_from_user_nofault(void *dst, const void __user *src, size_t size);
+long notrace copy_to_user_nofault(void __user *dst, const void *src,
size_t size);

long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 59fec6e226db7..580ceefe23059 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -141,7 +141,7 @@ bpf_probe_read_user_common(void *dst, u32 size, const void __user *unsafe_ptr)
{
int ret;

- ret = probe_user_read(dst, unsafe_ptr, size);
+ ret = copy_from_user_nofault(dst, unsafe_ptr, size);
if (unlikely(ret < 0))
memset(dst, 0, size);
return ret;
@@ -326,7 +326,7 @@ BPF_CALL_3(bpf_probe_write_user, void __user *, unsafe_ptr, const void *, src,
if (unlikely(!nmi_uaccess_okay()))
return -EPERM;

- return probe_user_write(unsafe_ptr, src, size);
+ return copy_to_user_nofault(unsafe_ptr, src, size);
}

static const struct bpf_func_proto bpf_probe_write_user_proto = {
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 430e64c4117ee..af4dd69bcc398 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1290,7 +1290,7 @@ probe_mem_read_user(void *dest, void *src, size_t size)
{
const void __user *uaddr = (__force const void __user *)src;

- return probe_user_read(dest, uaddr, size);
+ return copy_from_user_nofault(dest, uaddr, size);
}

static nokprobe_inline int
diff --git a/mm/maccess.c b/mm/maccess.c
index 43ed419f38401..349b6cb14426c 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -192,7 +192,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
#endif /* HAVE_GET_KERNEL_NOFAULT */

/**
- * probe_user_read(): safely attempt to read from a user-space location
+ * copy_from_user_nofault(): safely attempt to read from a user-space location
* @dst: pointer to the buffer that shall take the data
* @src: address to read from. This must be a user address.
* @size: size of the data chunk
@@ -200,7 +200,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
* Safely read from user address @src to the buffer at @dst. If a kernel fault
* happens, handle that and return -EFAULT.
*/
-long probe_user_read(void *dst, const void __user *src, size_t size)
+long copy_from_user_nofault(void *dst, const void __user *src, size_t size)
{
long ret = -EFAULT;
mm_segment_t old_fs = get_fs();
@@ -217,10 +217,10 @@ long probe_user_read(void *dst, const void __user *src, size_t size)
return -EFAULT;
return 0;
}
-EXPORT_SYMBOL_GPL(probe_user_read);
+EXPORT_SYMBOL_GPL(copy_from_user_nofault);

/**
- * probe_user_write(): safely attempt to write to a user-space location
+ * copy_to_user_nofault(): safely attempt to write to a user-space location
* @dst: address to write to
* @src: pointer to the data that shall be written
* @size: size of the data chunk
@@ -228,7 +228,7 @@ EXPORT_SYMBOL_GPL(probe_user_read);
* Safely write to address @dst from the buffer at @src. If a kernel fault
* happens, handle that and return -EFAULT.
*/
-long probe_user_write(void __user *dst, const void *src, size_t size)
+long copy_to_user_nofault(void __user *dst, const void *src, size_t size)
{
long ret = -EFAULT;
mm_segment_t old_fs = get_fs();
--
2.26.2

2020-05-21 15:26:44

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 23/23] maccess: return -ERANGE when copy_from_kernel_nofault_allowed fails

Allow the callers to distinguish a real unmapped address vs a range
that can't be probed.

Suggested-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
---
mm/maccess.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 349b6cb14426c..d317f8b8095ca 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -25,7 +25,7 @@ bool __weak copy_from_kernel_nofault_allowed(const void *unsafe_src,
long copy_from_kernel_nofault(void *dst, const void *src, size_t size)
{
if (!copy_from_kernel_nofault_allowed(src, size))
- return -EFAULT;
+ return -ERANGE;

pagefault_disable();
copy_from_kernel_nofault_loop(dst, src, size, u64, Efault);
@@ -69,7 +69,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
if (unlikely(count <= 0))
return 0;
if (!copy_from_kernel_nofault_allowed(unsafe_addr, count))
- return -EFAULT;
+ return -ERANGE;

pagefault_disable();
do {
@@ -94,7 +94,8 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
* @size: size of the data chunk
*
* Safely read from kernel address @src to the buffer at @dst. If a kernel
- * fault happens, handle that and return -EFAULT.
+ * fault happens, handle that and return -EFAULT. If @src is not a valid kernel
+ * address, return -ERANGE.
*
* We ensure that the copy_from_user is executed in atomic context so that
* do_page_fault() doesn't attempt to take mmap_sem. This makes
@@ -107,7 +108,7 @@ long copy_from_kernel_nofault(void *dst, const void *src, size_t size)
mm_segment_t old_fs = get_fs();

if (!copy_from_kernel_nofault_allowed(src, size))
- return -EFAULT;
+ return -ERANGE;

set_fs(KERNEL_DS);
pagefault_disable();
@@ -159,8 +160,9 @@ long copy_to_kernel_nofault(void *dst, const void *src, size_t size)
*
* On success, returns the length of the string INCLUDING the trailing NUL.
*
- * If access fails, returns -EFAULT (some data may have been copied
- * and the trailing NUL added).
+ * If access fails, returns -EFAULT (some data may have been copied and the
+ * trailing NUL added). If @unsafe_addr is not a valid kernel address, return
+ * -ERANGE.
*
* If @count is smaller than the length of the string, copies @count-1 bytes,
* sets the last byte of @dst buffer to NUL and returns @count.
@@ -174,7 +176,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
if (unlikely(count <= 0))
return 0;
if (!copy_from_kernel_nofault_allowed(unsafe_addr, count))
- return -EFAULT;
+ return -ERANGE;

set_fs(KERNEL_DS);
pagefault_disable();
--
2.26.2

2020-05-21 15:27:22

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 03/23] maccess: remove duplicate kerneldoc comments

Many of the maccess routines have a copy of the kerneldoc comment
in the header. Remove it as it is not useful and will get out of
sync sooner or later.

Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/uaccess.h | 38 --------------------------------------
1 file changed, 38 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index a2c606a403745..5a36a298a85f8 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -301,50 +301,12 @@ copy_struct_from_user(void *dst, size_t ksize, const void __user *src,
return 0;
}

-/*
- * probe_kernel_read(): safely attempt to read from a location
- * @dst: pointer to the buffer that shall take the data
- * @src: address to read from
- * @size: size of the data chunk
- *
- * Safely read from address @src to the buffer at @dst. If a kernel fault
- * happens, handle that and return -EFAULT.
- */
extern long probe_kernel_read(void *dst, const void *src, size_t size);
extern long probe_kernel_read_strict(void *dst, const void *src, size_t size);
extern long __probe_kernel_read(void *dst, const void *src, size_t size);
-
-/*
- * probe_user_read(): safely attempt to read from a location in user space
- * @dst: pointer to the buffer that shall take the data
- * @src: address to read from
- * @size: size of the data chunk
- *
- * Safely read from address @src to the buffer at @dst. If a kernel fault
- * happens, handle that and return -EFAULT.
- */
extern long probe_user_read(void *dst, const void __user *src, size_t size);

-/*
- * probe_kernel_write(): safely attempt to write to a location
- * @dst: address to write to
- * @src: pointer to the data that shall be written
- * @size: size of the data chunk
- *
- * Safely write to address @dst from the buffer at @src. If a kernel fault
- * happens, handle that and return -EFAULT.
- */
extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
-
-/*
- * probe_user_write(): safely attempt to write to a location in user space
- * @dst: address to write to
- * @src: pointer to the data that shall be written
- * @size: size of the data chunk
- *
- * Safely write to address @dst from the buffer at @src. If a kernel fault
- * happens, handle that and return -EFAULT.
- */
extern long notrace probe_user_write(void __user *dst, const void *src, size_t size);

extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
--
2.26.2

2020-05-21 15:27:34

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 05/23] maccess: update the top of file comment

This file now also contains several helpers for accessing user memory.

Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/maccess.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 747581ac50dc9..65880ba2ca376 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
- * Access kernel memory without faulting.
+ * Access kernel or user memory without faulting.
*/
#include <linux/export.h>
#include <linux/mm.h>
--
2.26.2

2020-05-21 15:27:58

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 11/23] bpf: factor out a bpf_trace_copy_string helper

Split out a helper to do the fault free access to the string pointer
to get it out of a crazy indentation level.

Signed-off-by: Christoph Hellwig <[email protected]>
---
kernel/trace/bpf_trace.c | 42 +++++++++++++++++++++++-----------------
1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f5231ffea85b9..9d4080590f711 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -321,6 +321,28 @@ static const struct bpf_func_proto *bpf_get_probe_write_proto(void)
return &bpf_probe_write_user_proto;
}

+static void bpf_trace_copy_string(char *buf, void *unsafe_ptr, char fmt_ptype,
+ size_t bufsz)
+{
+ void __user *user_ptr = (__force void __user *)unsafe_ptr;
+
+ buf[0] = 0;
+
+ switch (fmt_ptype) {
+ case 's':
+#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
+ strncpy_from_unsafe(buf, unsafe_ptr, bufsz);
+ break;
+#endif
+ case 'k':
+ strncpy_from_kernel_nofault(buf, unsafe_ptr, bufsz);
+ break;
+ case 'u':
+ strncpy_from_user_nofault(buf, user_ptr, bufsz);
+ break;
+ }
+}
+
/*
* Only limited trace_printk() conversion specifiers allowed:
* %d %i %u %x %ld %li %lu %lx %lld %lli %llu %llx %p %pks %pus %s
@@ -403,24 +425,8 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
break;
}

- buf[0] = 0;
- switch (fmt_ptype) {
- case 's':
-#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
- strncpy_from_unsafe(buf, unsafe_ptr,
- sizeof(buf));
- break;
-#endif
- case 'k':
- strncpy_from_kernel_nofault(buf, unsafe_ptr,
- sizeof(buf));
- break;
- case 'u':
- strncpy_from_user_nofault(buf,
- (__force void __user *)unsafe_ptr,
- sizeof(buf));
- break;
- }
+ bpf_trace_copy_string(buf, unsafe_ptr, fmt_ptype,
+ sizeof(buf));
goto fmt_next;
}

--
2.26.2

2020-05-21 15:28:02

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 15/23] maccess: remove strncpy_from_unsafe

All users are gone now.

Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/uaccess.h | 1 -
mm/maccess.c | 39 +--------------------------------------
2 files changed, 1 insertion(+), 39 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 65a37ae3b8871..d7d98ff345b3d 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -311,7 +311,6 @@ extern long probe_user_read(void *dst, const void __user *src, size_t size);
extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
extern long notrace probe_user_write(void __user *dst, const void *src, size_t size);

-extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
long count);

diff --git a/mm/maccess.c b/mm/maccess.c
index 6116742608217..df82fde34307f 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -8,8 +8,6 @@

static long __probe_kernel_read(void *dst, const void *src, size_t size,
bool strict);
-static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
- long count, bool strict);

bool __weak probe_kernel_read_allowed(const void *unsafe_src, size_t size,
bool strict)
@@ -156,35 +154,6 @@ long probe_user_write(void __user *dst, const void *src, size_t size)
return 0;
}

-/**
- * strncpy_from_unsafe: - Copy a NUL terminated string from unsafe address.
- * @dst: Destination address, in kernel space. This buffer must be at
- * least @count bytes long.
- * @unsafe_addr: Unsafe address.
- * @count: Maximum number of bytes to copy, including the trailing NUL.
- *
- * Copies a NUL-terminated string from unsafe address to kernel buffer.
- *
- * On success, returns the length of the string INCLUDING the trailing NUL.
- *
- * If access fails, returns -EFAULT (some data may have been copied
- * and the trailing NUL added).
- *
- * If @count is smaller than the length of the string, copies @count-1 bytes,
- * sets the last byte of @dst buffer to NUL and returns @count.
- *
- * Same as strncpy_from_kernel_nofault() except that for architectures with
- * not fully separated user and kernel address spaces this function also works
- * for user address tanges.
- *
- * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
- * separate kernel and user address spaces, and also a bad idea otherwise.
- */
-long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
-{
- return __strncpy_from_unsafe(dst, unsafe_addr, count, false);
-}
-
/**
* strncpy_from_kernel_nofault: - Copy a NUL terminated string from unsafe
* address.
@@ -204,12 +173,6 @@ long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
* sets the last byte of @dst buffer to NUL and returns @count.
*/
long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
-{
- return __strncpy_from_unsafe(dst, unsafe_addr, count, true);
-}
-
-static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
- long count, bool strict)
{
mm_segment_t old_fs = get_fs();
const void *src = unsafe_addr;
@@ -217,7 +180,7 @@ static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,

if (unlikely(count <= 0))
return 0;
- if (!probe_kernel_read_allowed(unsafe_addr, count, strict))
+ if (!probe_kernel_read_allowed(unsafe_addr, count, true))
return -EFAULT;

set_fs(KERNEL_DS);
--
2.26.2

2020-05-21 15:28:39

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 06/23] maccess: rename strncpy_from_unsafe_user to strncpy_from_user_nofault

This matches the naming of strncpy_from_user, and also makes it more
clear what the function is supposed to do.

Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/uaccess.h | 4 ++--
kernel/trace/bpf_trace.c | 4 ++--
kernel/trace/trace_kprobe.c | 2 +-
mm/maccess.c | 4 ++--
4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 5a36a298a85f8..b983cb1c1216a 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -313,8 +313,8 @@ extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
extern long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
long count);
extern long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
-extern long strncpy_from_unsafe_user(char *dst, const void __user *unsafe_addr,
- long count);
+long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
+ long count);
extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count);

/**
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index a010edc37ee02..4e20bf1d95832 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -159,7 +159,7 @@ static const struct bpf_func_proto bpf_probe_read_user_proto = {
BPF_CALL_3(bpf_probe_read_user_str, void *, dst, u32, size,
const void __user *, unsafe_ptr)
{
- int ret = strncpy_from_unsafe_user(dst, unsafe_ptr, size);
+ int ret = strncpy_from_user_nofault(dst, unsafe_ptr, size);

if (unlikely(ret < 0))
memset(dst, 0, size);
@@ -416,7 +416,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
sizeof(buf));
break;
case 'u':
- strncpy_from_unsafe_user(buf,
+ strncpy_from_user_nofault(buf,
(__force void __user *)unsafe_ptr,
sizeof(buf));
break;
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 35989383ae113..d600f41fda1ca 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1268,7 +1268,7 @@ fetch_store_string_user(unsigned long addr, void *dest, void *base)

__dest = get_loc_data(dest, base);

- ret = strncpy_from_unsafe_user(__dest, uaddr, maxlen);
+ ret = strncpy_from_user_nofault(__dest, uaddr, maxlen);
if (ret >= 0)
*(u32 *)dest = make_data_loc(ret, __dest - base);

diff --git a/mm/maccess.c b/mm/maccess.c
index 65880ba2ca376..457d8f9bf714f 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -215,7 +215,7 @@ long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
}

/**
- * strncpy_from_unsafe_user: - Copy a NUL terminated string from unsafe user
+ * strncpy_from_user_nofault: - Copy a NUL terminated string from unsafe user
* address.
* @dst: Destination address, in kernel space. This buffer must be at
* least @count bytes long.
@@ -232,7 +232,7 @@ long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
* If @count is smaller than the length of the string, copies @count-1 bytes,
* sets the last byte of @dst buffer to NUL and returns @count.
*/
-long strncpy_from_unsafe_user(char *dst, const void __user *unsafe_addr,
+long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
long count)
{
mm_segment_t old_fs = get_fs();
--
2.26.2

2020-05-21 15:28:41

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 22/23] maccess: rename probe_kernel_address to get_kernel_nofault

Better describe what this helper does, and match the naming of
copy_from_kernel_nofault.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/arm/kernel/traps.c | 2 +-
arch/arm/mm/alignment.c | 4 ++--
arch/arm64/kernel/traps.c | 2 +-
arch/ia64/include/asm/sections.h | 2 +-
arch/parisc/kernel/process.c | 2 +-
arch/powerpc/include/asm/sections.h | 2 +-
arch/powerpc/kernel/kgdb.c | 2 +-
arch/powerpc/kernel/process.c | 2 +-
arch/powerpc/sysdev/fsl_pci.c | 2 +-
arch/riscv/kernel/traps.c | 4 ++--
arch/s390/mm/fault.c | 2 +-
arch/sh/kernel/traps.c | 2 +-
arch/x86/kernel/probe_roms.c | 20 ++++++++++----------
arch/x86/kernel/traps.c | 2 +-
arch/x86/mm/fault.c | 6 +++---
arch/x86/pci/pcbios.c | 2 +-
include/linux/uaccess.h | 4 ++--
lib/test_lockup.c | 6 +++---
18 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 1e70e7227f0ff..61b91872b3e74 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -391,7 +391,7 @@ int is_valid_bugaddr(unsigned long pc)
u32 insn = __opcode_to_mem_arm(BUG_INSTR_VALUE);
#endif

- if (probe_kernel_address((unsigned *)pc, bkpt))
+ if (get_kernel_nofault((unsigned *)pc, bkpt))
return 0;

return bkpt == insn;
diff --git a/arch/arm/mm/alignment.c b/arch/arm/mm/alignment.c
index 84718eddae603..236016f13bcd3 100644
--- a/arch/arm/mm/alignment.c
+++ b/arch/arm/mm/alignment.c
@@ -774,7 +774,7 @@ static int alignment_get_arm(struct pt_regs *regs, u32 *ip, u32 *inst)
if (user_mode(regs))
fault = get_user(instr, ip);
else
- fault = probe_kernel_address(ip, instr);
+ fault = get_kernel_nofault(ip, instr);

*inst = __mem_to_opcode_arm(instr);

@@ -789,7 +789,7 @@ static int alignment_get_thumb(struct pt_regs *regs, u16 *ip, u16 *inst)
if (user_mode(regs))
fault = get_user(instr, ip);
else
- fault = probe_kernel_address(ip, instr);
+ fault = get_kernel_nofault(ip, instr);

*inst = __mem_to_opcode_thumb16(instr);

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index cf402be5c573f..2245af23973a2 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -315,7 +315,7 @@ static int call_undef_hook(struct pt_regs *regs)

if (!user_mode(regs)) {
__le32 instr_le;
- if (probe_kernel_address((__force __le32 *)pc, instr_le))
+ if (get_kernel_nofault((__force __le32 *)pc, instr_le))
goto exit;
instr = le32_to_cpu(instr_le);
} else if (compat_thumb_mode(regs)) {
diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h
index cea15f2dd38df..ef03eec8666f8 100644
--- a/arch/ia64/include/asm/sections.h
+++ b/arch/ia64/include/asm/sections.h
@@ -35,7 +35,7 @@ static inline void *dereference_function_descriptor(void *ptr)
struct fdesc *desc = ptr;
void *p;

- if (!probe_kernel_address(&desc->ip, p))
+ if (!get_kernel_nofault(&desc->ip, p))
ptr = p;
return ptr;
}
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index 230a6422b99f3..c3b0f03bd0bf1 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -293,7 +293,7 @@ void *dereference_function_descriptor(void *ptr)
Elf64_Fdesc *desc = ptr;
void *p;

- if (!probe_kernel_address(&desc->addr, p))
+ if (!get_kernel_nofault(&desc->addr, p))
ptr = p;
return ptr;
}
diff --git a/arch/powerpc/include/asm/sections.h b/arch/powerpc/include/asm/sections.h
index d19871763ed4a..636bb1633de18 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -85,7 +85,7 @@ static inline void *dereference_function_descriptor(void *ptr)
struct ppc64_opd_entry *desc = ptr;
void *p;

- if (!probe_kernel_address(&desc->funcaddr, p))
+ if (!get_kernel_nofault(&desc->funcaddr, p))
ptr = p;
return ptr;
}
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index 7dd55eb1259dc..f0045a74e7cea 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -420,7 +420,7 @@ int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
unsigned int instr;
unsigned int *addr = (unsigned int *)bpt->bpt_addr;

- err = probe_kernel_address(addr, instr);
+ err = get_kernel_nofault(addr, instr);
if (err)
return err;

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index d5d6136b13480..b77f97073200c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1260,7 +1260,7 @@ static void show_instructions(struct pt_regs *regs)
#endif

if (!__kernel_text_address(pc) ||
- probe_kernel_address((const void *)pc, instr)) {
+ get_kernel_nofault((const void *)pc, instr)) {
pr_cont("XXXXXXXX ");
} else {
if (regs->nip == pc)
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 73fa37ca40ef9..483412d5a1973 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -1069,7 +1069,7 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
ret = copy_from_user_nofault(&inst,
(void __user *)regs->nip, sizeof(inst));
else
- ret = probe_kernel_address((void *)regs->nip, inst);
+ ret = get_kernel_nofault((void *)regs->nip, inst);

if (!ret && mcheck_handle_load(regs, inst)) {
regs->nip += 4;
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 7f58fa53033f6..d807d507bd95a 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -137,7 +137,7 @@ static inline unsigned long get_break_insn_length(unsigned long pc)
{
bug_insn_t insn;

- if (probe_kernel_address((bug_insn_t *)pc, insn))
+ if (get_kernel_nofault((bug_insn_t *)pc, insn))
return 0;

return GET_INSN_LENGTH(insn);
@@ -160,7 +160,7 @@ int is_valid_bugaddr(unsigned long pc)

if (pc < VMALLOC_START)
return 0;
- if (probe_kernel_address((bug_insn_t *)pc, insn))
+ if (get_kernel_nofault((bug_insn_t *)pc, insn))
return 0;
if ((insn & __INSN_LENGTH_MASK) == __INSN_LENGTH_32)
return (insn == __BUG_INSN_32);
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index dedc28be27ab4..ab68eca6b2480 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -106,7 +106,7 @@ static int bad_address(void *p)
{
unsigned long dummy;

- return probe_kernel_address((unsigned long *)p, dummy);
+ return get_kernel_nofault((unsigned long *)p, dummy);
}

static void dump_pagetable(unsigned long asce, unsigned long address)
diff --git a/arch/sh/kernel/traps.c b/arch/sh/kernel/traps.c
index 63cf17bc760da..27c53c5b84e6b 100644
--- a/arch/sh/kernel/traps.c
+++ b/arch/sh/kernel/traps.c
@@ -118,7 +118,7 @@ int is_valid_bugaddr(unsigned long addr)

if (addr < PAGE_OFFSET)
return 0;
- if (probe_kernel_address((insn_size_t *)addr, opcode))
+ if (get_kernel_nofault((insn_size_t *)addr, opcode))
return 0;
if (opcode == TRAPA_BUG_OPCODE)
return 1;
diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c
index ee0286390a4c1..77f3341570e84 100644
--- a/arch/x86/kernel/probe_roms.c
+++ b/arch/x86/kernel/probe_roms.c
@@ -99,7 +99,7 @@ static bool probe_list(struct pci_dev *pdev, unsigned short vendor,
unsigned short device;

do {
- if (probe_kernel_address(rom_list, device) != 0)
+ if (get_kernel_nofault(rom_list, device) != 0)
device = 0;

if (device && match_id(pdev, vendor, device))
@@ -125,13 +125,13 @@ static struct resource *find_oprom(struct pci_dev *pdev)
break;

rom = isa_bus_to_virt(res->start);
- if (probe_kernel_address(rom + 0x18, offset) != 0)
+ if (get_kernel_nofault(rom + 0x18, offset) != 0)
continue;

- if (probe_kernel_address(rom + offset + 0x4, vendor) != 0)
+ if (get_kernel_nofault(rom + offset + 0x4, vendor) != 0)
continue;

- if (probe_kernel_address(rom + offset + 0x6, device) != 0)
+ if (get_kernel_nofault(rom + offset + 0x6, device) != 0)
continue;

if (match_id(pdev, vendor, device)) {
@@ -139,8 +139,8 @@ static struct resource *find_oprom(struct pci_dev *pdev)
break;
}

- if (probe_kernel_address(rom + offset + 0x8, list) == 0 &&
- probe_kernel_address(rom + offset + 0xc, rev) == 0 &&
+ if (get_kernel_nofault(rom + offset + 0x8, list) == 0 &&
+ get_kernel_nofault(rom + offset + 0xc, rev) == 0 &&
rev >= 3 && list &&
probe_list(pdev, vendor, rom + offset + list)) {
oprom = res;
@@ -183,14 +183,14 @@ static int __init romsignature(const unsigned char *rom)
const unsigned short * const ptr = (const unsigned short *)rom;
unsigned short sig;

- return probe_kernel_address(ptr, sig) == 0 && sig == ROMSIGNATURE;
+ return get_kernel_nofault(ptr, sig) == 0 && sig == ROMSIGNATURE;
}

static int __init romchecksum(const unsigned char *rom, unsigned long length)
{
unsigned char sum, c;

- for (sum = 0; length && probe_kernel_address(rom++, c) == 0; length--)
+ for (sum = 0; length && get_kernel_nofault(rom++, c) == 0; length--)
sum += c;
return !length && !sum;
}
@@ -211,7 +211,7 @@ void __init probe_roms(void)

video_rom_resource.start = start;

- if (probe_kernel_address(rom + 2, c) != 0)
+ if (get_kernel_nofault(rom + 2, c) != 0)
continue;

/* 0 < length <= 0x7f * 512, historically */
@@ -249,7 +249,7 @@ void __init probe_roms(void)
if (!romsignature(rom))
continue;

- if (probe_kernel_address(rom + 2, c) != 0)
+ if (get_kernel_nofault(rom + 2, c) != 0)
continue;

/* 0 < length <= 0x7f * 512, historically */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 96809f6aad67d..07739b67da335 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -161,7 +161,7 @@ int is_valid_bugaddr(unsigned long addr)
if (addr < TASK_SIZE_MAX)
return 0;

- if (probe_kernel_address((unsigned short *)addr, ud))
+ if (get_kernel_nofault((unsigned short *)addr, ud))
return 0;

return ud == INSN_UD0 || ud == INSN_UD2;
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 994e207abdf64..7c5e23e7407d8 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -98,7 +98,7 @@ check_prefetch_opcode(struct pt_regs *regs, unsigned char *instr,
return !instr_lo || (instr_lo>>1) == 1;
case 0x00:
/* Prefetch instruction is 0x0F0D or 0x0F18 */
- if (probe_kernel_address(instr, opcode))
+ if (get_kernel_nofault(instr, opcode))
return 0;

*prefetch = (instr_lo == 0xF) &&
@@ -132,7 +132,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr)
while (instr < max_instr) {
unsigned char opcode;

- if (probe_kernel_address(instr, opcode))
+ if (get_kernel_nofault(instr, opcode))
break;

instr++;
@@ -441,7 +441,7 @@ static int bad_address(void *p)
{
unsigned long dummy;

- return probe_kernel_address((unsigned long *)p, dummy);
+ return get_kernel_nofault((unsigned long *)p, dummy);
}

static void dump_pagetable(unsigned long address)
diff --git a/arch/x86/pci/pcbios.c b/arch/x86/pci/pcbios.c
index 9c97d814125eb..b7f8699b18c1f 100644
--- a/arch/x86/pci/pcbios.c
+++ b/arch/x86/pci/pcbios.c
@@ -302,7 +302,7 @@ static const struct pci_raw_ops *__init pci_find_bios(void)
check <= (union bios32 *) __va(0xffff0);
++check) {
long sig;
- if (probe_kernel_address(&check->fields.signature, sig))
+ if (get_kernel_nofault(&check->fields.signature, sig))
continue;

if (check->fields.signature != BIOS32_SIGNATURE)
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 5214d2f1f2cef..73b79f06cad8b 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -318,13 +318,13 @@ long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
long strnlen_user_nofault(const void __user *unsafe_addr, long count);

/**
- * probe_kernel_address(): safely attempt to read from a location
+ * get_kernel_nofault(): safely attempt to read from a location
* @addr: address to read from
* @retval: read into this variable
*
* Returns 0 on success, or -EFAULT.
*/
-#define probe_kernel_address(addr, retval) \
+#define get_kernel_nofault(addr, retval) \
copy_from_kernel_nofault(&retval, addr, sizeof(retval))

#ifndef user_access_begin
diff --git a/lib/test_lockup.c b/lib/test_lockup.c
index ea09ca335b214..71f7a4c89ed4f 100644
--- a/lib/test_lockup.c
+++ b/lib/test_lockup.c
@@ -419,8 +419,8 @@ static bool test_kernel_ptr(unsigned long addr, int size)
/* should be at least readable kernel address */
if (access_ok(ptr, 1) ||
access_ok(ptr + size - 1, 1) ||
- probe_kernel_address(ptr, buf) ||
- probe_kernel_address(ptr + size - 1, buf)) {
+ get_kernel_nofault(ptr, buf) ||
+ get_kernel_nofault(ptr + size - 1, buf)) {
pr_err("invalid kernel ptr: %#lx\n", addr);
return true;
}
@@ -437,7 +437,7 @@ static bool __maybe_unused test_magic(unsigned long addr, int offset,
if (!addr)
return false;

- if (probe_kernel_address(ptr, magic) || magic != expected) {
+ if (get_kernel_nofault(ptr, magic) || magic != expected) {
pr_err("invalid magic at %#lx + %#x = %#x, expected %#x\n",
addr, offset, magic, expected);
return true;
--
2.26.2

2020-05-21 15:28:50

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 13/23] bpf: rework the compat kernel probe handling

Instead of using the dangerous probe_kernel_read and strncpy_from_unsafe
helpers, rework the compat probes to check if an address is a kernel or
userspace one, and then use the low-level kernel or user probe helper
shared by the proper kernel and user probe helpers. This slightly
changes behavior as the compat probe on a user address doesn't check
the lockdown flags, just as the pure user probes do.

Signed-off-by: Christoph Hellwig <[email protected]>
---
kernel/trace/bpf_trace.c | 109 ++++++++++++++++++++++++---------------
1 file changed, 67 insertions(+), 42 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 737d739230a6b..43566cd2a8180 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -136,17 +136,23 @@ static const struct bpf_func_proto bpf_override_return_proto = {
};
#endif

-BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size,
- const void __user *, unsafe_ptr)
+static __always_inline int
+bpf_probe_read_user_common(void *dst, u32 size, const void __user *unsafe_ptr)
{
- int ret = probe_user_read(dst, unsafe_ptr, size);
+ int ret;

+ ret = probe_user_read(dst, unsafe_ptr, size);
if (unlikely(ret < 0))
memset(dst, 0, size);
-
return ret;
}

+BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size,
+ const void __user *, unsafe_ptr)
+{
+ return bpf_probe_read_user_common(dst, size, unsafe_ptr);
+}
+
static const struct bpf_func_proto bpf_probe_read_user_proto = {
.func = bpf_probe_read_user,
.gpl_only = true,
@@ -156,17 +162,24 @@ static const struct bpf_func_proto bpf_probe_read_user_proto = {
.arg3_type = ARG_ANYTHING,
};

-BPF_CALL_3(bpf_probe_read_user_str, void *, dst, u32, size,
- const void __user *, unsafe_ptr)
+static __always_inline int
+bpf_probe_read_user_str_common(void *dst, u32 size,
+ const void __user *unsafe_ptr)
{
- int ret = strncpy_from_user_nofault(dst, unsafe_ptr, size);
+ int ret;

+ ret = strncpy_from_user_nofault(dst, unsafe_ptr, size);
if (unlikely(ret < 0))
memset(dst, 0, size);
-
return ret;
}

+BPF_CALL_3(bpf_probe_read_user_str, void *, dst, u32, size,
+ const void __user *, unsafe_ptr)
+{
+ return bpf_probe_read_user_str_common(dst, size, unsafe_ptr);
+}
+
static const struct bpf_func_proto bpf_probe_read_user_str_proto = {
.func = bpf_probe_read_user_str,
.gpl_only = true,
@@ -177,25 +190,25 @@ static const struct bpf_func_proto bpf_probe_read_user_str_proto = {
};

static __always_inline int
-bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr,
- const bool compat)
+bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr)
{
int ret = security_locked_down(LOCKDOWN_BPF_READ);

if (unlikely(ret < 0))
- goto out;
- ret = compat ? probe_kernel_read(dst, unsafe_ptr, size) :
- probe_kernel_read_strict(dst, unsafe_ptr, size);
+ goto fail;
+ ret = probe_kernel_read_strict(dst, unsafe_ptr, size);
if (unlikely(ret < 0))
-out:
- memset(dst, 0, size);
+ goto fail;
+ return ret;
+fail:
+ memset(dst, 0, size);
return ret;
}

BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size,
const void *, unsafe_ptr)
{
- return bpf_probe_read_kernel_common(dst, size, unsafe_ptr, false);
+ return bpf_probe_read_kernel_common(dst, size, unsafe_ptr);
}

static const struct bpf_func_proto bpf_probe_read_kernel_proto = {
@@ -207,50 +220,37 @@ static const struct bpf_func_proto bpf_probe_read_kernel_proto = {
.arg3_type = ARG_ANYTHING,
};

-BPF_CALL_3(bpf_probe_read_compat, void *, dst, u32, size,
- const void *, unsafe_ptr)
-{
- return bpf_probe_read_kernel_common(dst, size, unsafe_ptr, true);
-}
-
-static const struct bpf_func_proto bpf_probe_read_compat_proto = {
- .func = bpf_probe_read_compat,
- .gpl_only = true,
- .ret_type = RET_INTEGER,
- .arg1_type = ARG_PTR_TO_UNINIT_MEM,
- .arg2_type = ARG_CONST_SIZE_OR_ZERO,
- .arg3_type = ARG_ANYTHING,
-};
-
static __always_inline int
-bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr,
- const bool compat)
+bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr)
{
int ret = security_locked_down(LOCKDOWN_BPF_READ);

if (unlikely(ret < 0))
- goto out;
+ goto fail;
+
/*
- * The strncpy_from_unsafe_*() call will likely not fill the entire
- * buffer, but that's okay in this circumstance as we're probing
+ * The strncpy_from_kernel_nofault() call will likely not fill the
+ * entire buffer, but that's okay in this circumstance as we're probing
* arbitrary memory anyway similar to bpf_probe_read_*() and might
* as well probe the stack. Thus, memory is explicitly cleared
* only in error case, so that improper users ignoring return
* code altogether don't copy garbage; otherwise length of string
* is returned that can be used for bpf_perf_event_output() et al.
*/
- ret = compat ? strncpy_from_unsafe(dst, unsafe_ptr, size) :
- strncpy_from_kernel_nofault(dst, unsafe_ptr, size);
+ ret = strncpy_from_kernel_nofault(dst, unsafe_ptr, size);
if (unlikely(ret < 0))
-out:
- memset(dst, 0, size);
+ goto fail;
+
+ return 0;
+fail:
+ memset(dst, 0, size);
return ret;
}

BPF_CALL_3(bpf_probe_read_kernel_str, void *, dst, u32, size,
const void *, unsafe_ptr)
{
- return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr, false);
+ return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr);
}

static const struct bpf_func_proto bpf_probe_read_kernel_str_proto = {
@@ -262,10 +262,34 @@ static const struct bpf_func_proto bpf_probe_read_kernel_str_proto = {
.arg3_type = ARG_ANYTHING,
};

+#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
+BPF_CALL_3(bpf_probe_read_compat, void *, dst, u32, size,
+ const void *, unsafe_ptr)
+{
+ if ((unsigned long)unsafe_ptr < TASK_SIZE) {
+ return bpf_probe_read_user_common(dst, size,
+ (__force void __user *)unsafe_ptr);
+ }
+ return bpf_probe_read_kernel_common(dst, size, unsafe_ptr);
+}
+
+static const struct bpf_func_proto bpf_probe_read_compat_proto = {
+ .func = bpf_probe_read_compat,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_UNINIT_MEM,
+ .arg2_type = ARG_CONST_SIZE_OR_ZERO,
+ .arg3_type = ARG_ANYTHING,
+};
+
BPF_CALL_3(bpf_probe_read_compat_str, void *, dst, u32, size,
const void *, unsafe_ptr)
{
- return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr, true);
+ if ((unsigned long)unsafe_ptr < TASK_SIZE) {
+ return bpf_probe_read_user_str_common(dst, size,
+ (__force void __user *)unsafe_ptr);
+ }
+ return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr);
}

static const struct bpf_func_proto bpf_probe_read_compat_str_proto = {
@@ -276,6 +300,7 @@ static const struct bpf_func_proto bpf_probe_read_compat_str_proto = {
.arg2_type = ARG_CONST_SIZE_OR_ZERO,
.arg3_type = ARG_ANYTHING,
};
+#endif /* CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE */

BPF_CALL_3(bpf_probe_write_user, void __user *, unsafe_ptr, const void *, src,
u32, size)
--
2.26.2

2020-05-21 15:29:22

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 20/23] maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault

Better describe what these functions do.

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/arm/kernel/ftrace.c | 3 +-
arch/arm/kernel/kgdb.c | 2 +-
arch/arm64/kernel/insn.c | 4 +--
arch/csky/kernel/ftrace.c | 5 +--
arch/ia64/kernel/ftrace.c | 6 ++--
arch/mips/kernel/kprobes.c | 6 ++--
arch/nds32/kernel/ftrace.c | 5 +--
arch/parisc/kernel/ftrace.c | 2 +-
arch/parisc/kernel/kgdb.c | 4 +--
arch/parisc/lib/memcpy.c | 2 +-
arch/powerpc/kernel/module_64.c | 5 +--
arch/powerpc/kernel/trace/ftrace.c | 24 +++++++--------
arch/powerpc/perf/core-book3s.c | 3 +-
arch/riscv/kernel/ftrace.c | 3 +-
arch/riscv/kernel/patch.c | 4 +--
arch/s390/kernel/ftrace.c | 4 +--
arch/sh/kernel/ftrace.c | 6 ++--
arch/um/kernel/maccess.c | 2 +-
arch/x86/include/asm/ptrace.h | 4 +--
arch/x86/kernel/dumpstack.c | 2 +-
arch/x86/kernel/ftrace.c | 8 ++---
arch/x86/kernel/kgdb.c | 6 ++--
arch/x86/kernel/kprobes/core.c | 5 +--
arch/x86/kernel/kprobes/opt.c | 2 +-
arch/x86/kernel/traps.c | 3 +-
arch/x86/mm/fault.c | 2 +-
arch/x86/mm/init_32.c | 2 +-
arch/x86/mm/maccess.c | 4 +--
arch/x86/xen/enlighten_pv.c | 2 +-
drivers/char/mem.c | 2 +-
drivers/dio/dio.c | 6 ++--
drivers/input/serio/hp_sdc.c | 2 +-
drivers/misc/kgdbts.c | 6 ++--
drivers/video/fbdev/hpfb.c | 2 +-
fs/proc/kcore.c | 3 +-
include/linux/uaccess.h | 13 ++++----
kernel/debug/debug_core.c | 6 ++--
kernel/debug/gdbstub.c | 6 ++--
kernel/debug/kdb/kdb_main.c | 3 +-
kernel/debug/kdb/kdb_support.c | 7 +++--
kernel/kthread.c | 2 +-
kernel/trace/bpf_trace.c | 2 +-
kernel/trace/trace_kprobe.c | 4 +--
kernel/workqueue.c | 10 +++---
mm/maccess.c | 49 +++++++++++++++---------------
mm/rodata_test.c | 2 +-
mm/slub.c | 2 +-
47 files changed, 136 insertions(+), 121 deletions(-)

diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
index 10499d44964a2..9a79ef6b1876c 100644
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -84,7 +84,8 @@ static int ftrace_modify_code(unsigned long pc, unsigned long old,
old = __opcode_to_mem_arm(old);

if (validate) {
- if (probe_kernel_read(&replaced, (void *)pc, MCOUNT_INSN_SIZE))
+ if (copy_from_kernel_nofault(&replaced, (void *)pc,
+ MCOUNT_INSN_SIZE))
return -EFAULT;

if (replaced != old)
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index 6a95b92966406..7bd30c0a4280d 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -236,7 +236,7 @@ int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
/* patch_text() only supports int-sized breakpoints */
BUILD_BUG_ON(sizeof(int) != BREAK_INSTR_SIZE);

- err = probe_kernel_read(bpt->saved_instr, (char *)bpt->bpt_addr,
+ err = copy_from_kernel_nofault(bpt->saved_instr, (char *)bpt->bpt_addr,
BREAK_INSTR_SIZE);
if (err)
return err;
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 4a9e773a177f0..c5f7a7b8d2c3e 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -123,7 +123,7 @@ int __kprobes aarch64_insn_read(void *addr, u32 *insnp)
int ret;
__le32 val;

- ret = probe_kernel_read(&val, addr, AARCH64_INSN_SIZE);
+ ret = copy_from_kernel_nofault(&val, addr, AARCH64_INSN_SIZE);
if (!ret)
*insnp = le32_to_cpu(val);

@@ -139,7 +139,7 @@ static int __kprobes __aarch64_insn_write(void *addr, __le32 insn)
raw_spin_lock_irqsave(&patch_lock, flags);
waddr = patch_map(addr, FIX_TEXT_POKE0);

- ret = probe_kernel_write(waddr, &insn, AARCH64_INSN_SIZE);
+ ret = copy_to_kernel_nofault(waddr, &insn, AARCH64_INSN_SIZE);

patch_unmap(FIX_TEXT_POKE0);
raw_spin_unlock_irqrestore(&patch_lock, flags);
diff --git a/arch/csky/kernel/ftrace.c b/arch/csky/kernel/ftrace.c
index 3c425b84e3be6..b4a7ec1517ff7 100644
--- a/arch/csky/kernel/ftrace.c
+++ b/arch/csky/kernel/ftrace.c
@@ -72,7 +72,8 @@ static int ftrace_check_current_nop(unsigned long hook)
uint16_t olds[7];
unsigned long hook_pos = hook - 2;

- if (probe_kernel_read((void *)olds, (void *)hook_pos, sizeof(nops)))
+ if (copy_from_kernel_nofault((void *)olds, (void *)hook_pos,
+ sizeof(nops)))
return -EFAULT;

if (memcmp((void *)nops, (void *)olds, sizeof(nops))) {
@@ -97,7 +98,7 @@ static int ftrace_modify_code(unsigned long hook, unsigned long target,

make_jbsr(target, hook, call, nolr);

- ret = probe_kernel_write((void *)hook_pos, enable ? call : nops,
+ ret = copy_to_kernel_nofault((void *)hook_pos, enable ? call : nops,
sizeof(nops));
if (ret)
return -EPERM;
diff --git a/arch/ia64/kernel/ftrace.c b/arch/ia64/kernel/ftrace.c
index cee411e647ca0..b2ab2d58fb30c 100644
--- a/arch/ia64/kernel/ftrace.c
+++ b/arch/ia64/kernel/ftrace.c
@@ -108,7 +108,7 @@ ftrace_modify_code(unsigned long ip, unsigned char *old_code,
goto skip_check;

/* read the text we want to modify */
- if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+ if (copy_from_kernel_nofault(replaced, (void *)ip, MCOUNT_INSN_SIZE))
return -EFAULT;

/* Make sure it is what we expect it to be */
@@ -117,7 +117,7 @@ ftrace_modify_code(unsigned long ip, unsigned char *old_code,

skip_check:
/* replace the text with the new text */
- if (probe_kernel_write(((void *)ip), new_code, MCOUNT_INSN_SIZE))
+ if (copy_to_kernel_nofault(((void *)ip), new_code, MCOUNT_INSN_SIZE))
return -EPERM;
flush_icache_range(ip, ip + MCOUNT_INSN_SIZE);

@@ -129,7 +129,7 @@ static int ftrace_make_nop_check(struct dyn_ftrace *rec, unsigned long addr)
unsigned char __attribute__((aligned(8))) replaced[MCOUNT_INSN_SIZE];
unsigned long ip = rec->ip;

- if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+ if (copy_from_kernel_nofault(replaced, (void *)ip, MCOUNT_INSN_SIZE))
return -EFAULT;
if (rec->flags & FTRACE_FL_CONVERTED) {
struct ftrace_call_insn *call_insn, *tmp_call;
diff --git a/arch/mips/kernel/kprobes.c b/arch/mips/kernel/kprobes.c
index 6cfae2411c044..d043c2f897fc2 100644
--- a/arch/mips/kernel/kprobes.c
+++ b/arch/mips/kernel/kprobes.c
@@ -86,9 +86,9 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
goto out;
}

- if ((probe_kernel_read(&prev_insn, p->addr - 1,
- sizeof(mips_instruction)) == 0) &&
- insn_has_delayslot(prev_insn)) {
+ if (copy_from_kernel_nofault(&prev_insn, p->addr - 1,
+ sizeof(mips_instruction)) == 0 &&
+ insn_has_delayslot(prev_insn)) {
pr_notice("Kprobes for branch delayslot are not supported\n");
ret = -EINVAL;
goto out;
diff --git a/arch/nds32/kernel/ftrace.c b/arch/nds32/kernel/ftrace.c
index 22ab77ea27ad3..3763b3f8c3db5 100644
--- a/arch/nds32/kernel/ftrace.c
+++ b/arch/nds32/kernel/ftrace.c
@@ -131,13 +131,14 @@ static int __ftrace_modify_code(unsigned long pc, unsigned long *old_insn,
unsigned long orig_insn[3];

if (validate) {
- if (probe_kernel_read(orig_insn, (void *)pc, MCOUNT_INSN_SIZE))
+ if (copy_from_kernel_nofault(orig_insn, (void *)pc,
+ MCOUNT_INSN_SIZE))
return -EFAULT;
if (memcmp(orig_insn, old_insn, MCOUNT_INSN_SIZE))
return -EINVAL;
}

- if (probe_kernel_write((void *)pc, new_insn, MCOUNT_INSN_SIZE))
+ if (copy_to_kernel_nofault((void *)pc, new_insn, MCOUNT_INSN_SIZE))
return -EPERM;

return 0;
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index b836fc61a24f4..1df0f67ed6671 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -172,7 +172,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)

ip = (void *)(rec->ip + 4 - size);

- ret = probe_kernel_read(insn, ip, size);
+ ret = copy_from_kernel_nofault(insn, ip, size);
if (ret)
return ret;

diff --git a/arch/parisc/kernel/kgdb.c b/arch/parisc/kernel/kgdb.c
index 664278db9b977..c4554ac13eac7 100644
--- a/arch/parisc/kernel/kgdb.c
+++ b/arch/parisc/kernel/kgdb.c
@@ -154,8 +154,8 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long ip)

int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
{
- int ret = probe_kernel_read(bpt->saved_instr, (char *)bpt->bpt_addr,
- BREAK_INSTR_SIZE);
+ int ret = copy_from_kernel_nofault(bpt->saved_instr,
+ (char *)bpt->bpt_addr, BREAK_INSTR_SIZE);
if (ret)
return ret;

diff --git a/arch/parisc/lib/memcpy.c b/arch/parisc/lib/memcpy.c
index 94a9fe2702c2f..4b75388190b4e 100644
--- a/arch/parisc/lib/memcpy.c
+++ b/arch/parisc/lib/memcpy.c
@@ -57,7 +57,7 @@ void * memcpy(void * dst,const void *src, size_t count)
EXPORT_SYMBOL(raw_copy_in_user);
EXPORT_SYMBOL(memcpy);

-bool probe_kernel_read_allowed(const void *unsafe_src, size_t size)
+bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
{
if ((unsigned long)unsafe_src < PAGE_SIZE)
return false;
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 007606a48fd98..29067ef485411 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -159,7 +159,7 @@ int module_trampoline_target(struct module *mod, unsigned long addr,

stub = (struct ppc64_stub_entry *)addr;

- if (probe_kernel_read(&magic, &stub->magic, sizeof(magic))) {
+ if (copy_from_kernel_nofault(&magic, &stub->magic, sizeof(magic))) {
pr_err("%s: fault reading magic for stub %lx for %s\n", __func__, addr, mod->name);
return -EFAULT;
}
@@ -169,7 +169,8 @@ int module_trampoline_target(struct module *mod, unsigned long addr,
return -EFAULT;
}

- if (probe_kernel_read(&funcdata, &stub->funcdata, sizeof(funcdata))) {
+ if (copy_from_kernel_nofault(&funcdata, &stub->funcdata,
+ sizeof(funcdata))) {
pr_err("%s: fault reading funcdata for stub %lx for %s\n", __func__, addr, mod->name);
return -EFAULT;
}
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 7ea0ca044b650..60ee6813830c3 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -67,7 +67,7 @@ ftrace_modify_code(unsigned long ip, unsigned int old, unsigned int new)
*/

/* read the text we want to modify */
- if (probe_kernel_read(&replaced, (void *)ip, MCOUNT_INSN_SIZE))
+ if (copy_from_kernel_nofault(&replaced, (void *)ip, MCOUNT_INSN_SIZE))
return -EFAULT;

/* Make sure it is what we expect it to be */
@@ -128,7 +128,7 @@ __ftrace_make_nop(struct module *mod,
unsigned int op, pop;

/* read where this goes */
- if (probe_kernel_read(&op, (void *)ip, sizeof(int))) {
+ if (copy_from_kernel_nofault(&op, (void *)ip, sizeof(int))) {
pr_err("Fetching opcode failed.\n");
return -EFAULT;
}
@@ -162,7 +162,7 @@ __ftrace_make_nop(struct module *mod,
/* When using -mkernel_profile there is no load to jump over */
pop = PPC_INST_NOP;

- if (probe_kernel_read(&op, (void *)(ip - 4), 4)) {
+ if (copy_from_kernel_nofault(&op, (void *)(ip - 4), 4)) {
pr_err("Fetching instruction at %lx failed.\n", ip - 4);
return -EFAULT;
}
@@ -193,7 +193,7 @@ __ftrace_make_nop(struct module *mod,
* Check what is in the next instruction. We can see ld r2,40(r1), but
* on first pass after boot we will see mflr r0.
*/
- if (probe_kernel_read(&op, (void *)(ip+4), MCOUNT_INSN_SIZE)) {
+ if (copy_from_kernel_nofault(&op, (void *)(ip+4), MCOUNT_INSN_SIZE)) {
pr_err("Fetching op failed.\n");
return -EFAULT;
}
@@ -222,7 +222,7 @@ __ftrace_make_nop(struct module *mod,
unsigned long ip = rec->ip;
unsigned long tramp;

- if (probe_kernel_read(&op, (void *)ip, MCOUNT_INSN_SIZE))
+ if (copy_from_kernel_nofault(&op, (void *)ip, MCOUNT_INSN_SIZE))
return -EFAULT;

/* Make sure that that this is still a 24bit jump */
@@ -245,7 +245,7 @@ __ftrace_make_nop(struct module *mod,
pr_devel("ip:%lx jumps to %lx", ip, tramp);

/* Find where the trampoline jumps to */
- if (probe_kernel_read(jmp, (void *)tramp, sizeof(jmp))) {
+ if (copy_from_kernel_nofault(jmp, (void *)tramp, sizeof(jmp))) {
pr_err("Failed to read %lx\n", tramp);
return -EFAULT;
}
@@ -341,7 +341,7 @@ static int setup_mcount_compiler_tramp(unsigned long tramp)
return -1;

/* New trampoline -- read where this goes */
- if (probe_kernel_read(&op, (void *)tramp, sizeof(int))) {
+ if (copy_from_kernel_nofault(&op, (void *)tramp, sizeof(int))) {
pr_debug("Fetching opcode failed.\n");
return -1;
}
@@ -391,7 +391,7 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr)
unsigned int op;

/* Read where this goes */
- if (probe_kernel_read(&op, (void *)ip, sizeof(int))) {
+ if (copy_from_kernel_nofault(&op, (void *)ip, sizeof(int))) {
pr_err("Fetching opcode failed.\n");
return -EFAULT;
}
@@ -516,7 +516,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
struct module *mod = rec->arch.mod;

/* read where this goes */
- if (probe_kernel_read(op, ip, sizeof(op)))
+ if (copy_from_kernel_nofault(op, ip, sizeof(op)))
return -EFAULT;

if (!expected_nop_sequence(ip, op[0], op[1])) {
@@ -578,7 +578,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
unsigned long ip = rec->ip;

/* read where this goes */
- if (probe_kernel_read(&op, (void *)ip, MCOUNT_INSN_SIZE))
+ if (copy_from_kernel_nofault(&op, (void *)ip, MCOUNT_INSN_SIZE))
return -EFAULT;

/* It should be pointing to a nop */
@@ -634,7 +634,7 @@ static int __ftrace_make_call_kernel(struct dyn_ftrace *rec, unsigned long addr)
}

/* Make sure we have a nop */
- if (probe_kernel_read(&op, ip, sizeof(op))) {
+ if (copy_from_kernel_nofault(&op, ip, sizeof(op))) {
pr_err("Unable to read ftrace location %p\n", ip);
return -EFAULT;
}
@@ -712,7 +712,7 @@ __ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
}

/* read where this goes */
- if (probe_kernel_read(&op, (void *)ip, sizeof(int))) {
+ if (copy_from_kernel_nofault(&op, (void *)ip, sizeof(int))) {
pr_err("Fetching opcode failed.\n");
return -EFAULT;
}
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 3dcfecf858f36..50bc9f0eb6be3 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -418,7 +418,8 @@ static __u64 power_pmu_bhrb_to(u64 addr)
__u64 target;

if (is_kernel_addr(addr)) {
- if (probe_kernel_read(&instr, (void *)addr, sizeof(instr)))
+ if (copy_from_kernel_nofault(&instr, (void *)addr,
+ sizeof(instr)))
return 0;

return branch_target(&instr);
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index ce69b34ff55d0..143a22ab8853e 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -25,7 +25,8 @@ static int ftrace_check_current_call(unsigned long hook_pos,
* Read the text we want to modify;
* return must be -EFAULT on read error
*/
- if (probe_kernel_read(replaced, (void *)hook_pos, MCOUNT_INSN_SIZE))
+ if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
+ MCOUNT_INSN_SIZE))
return -EFAULT;

/*
diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
index 8a4fc65ee0222..9a0ade9fa10b0 100644
--- a/arch/riscv/kernel/patch.c
+++ b/arch/riscv/kernel/patch.c
@@ -57,7 +57,7 @@ static int __kprobes riscv_insn_write(void *addr, const void *insn, size_t len)

waddr = patch_map(addr, FIX_TEXT_POKE0);

- ret = probe_kernel_write(waddr, insn, len);
+ ret = copy_to_kernel_nofault(waddr, insn, len);

patch_unmap(FIX_TEXT_POKE0);

@@ -71,7 +71,7 @@ static int __kprobes riscv_insn_write(void *addr, const void *insn, size_t len)
#else
static int __kprobes riscv_insn_write(void *addr, const void *insn, size_t len)
{
- return probe_kernel_write(addr, insn, len);
+ return copy_to_kernel_nofault(addr, insn, len);
}
#endif /* CONFIG_MMU */

diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index 4cd9b1ada8340..1576f0eede771 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -99,7 +99,7 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
{
struct ftrace_insn orig, new, old;

- if (probe_kernel_read(&old, (void *) rec->ip, sizeof(old)))
+ if (copy_from_kernel_nofault(&old, (void *) rec->ip, sizeof(old)))
return -EFAULT;
if (addr == MCOUNT_ADDR) {
/* Initial code replacement */
@@ -121,7 +121,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
{
struct ftrace_insn orig, new, old;

- if (probe_kernel_read(&old, (void *) rec->ip, sizeof(old)))
+ if (copy_from_kernel_nofault(&old, (void *) rec->ip, sizeof(old)))
return -EFAULT;
/* Replace nop with an ftrace call. */
ftrace_generate_nop_insn(&orig);
diff --git a/arch/sh/kernel/ftrace.c b/arch/sh/kernel/ftrace.c
index 1b04270e5460e..0646c59618466 100644
--- a/arch/sh/kernel/ftrace.c
+++ b/arch/sh/kernel/ftrace.c
@@ -119,7 +119,7 @@ static void ftrace_mod_code(void)
* But if one were to fail, then they all should, and if one were
* to succeed, then they all should.
*/
- mod_code_status = probe_kernel_write(mod_code_ip, mod_code_newcode,
+ mod_code_status = copy_to_kernel_nofault(mod_code_ip, mod_code_newcode,
MCOUNT_INSN_SIZE);

/* if we fail, then kill any new writers */
@@ -203,7 +203,7 @@ static int ftrace_modify_code(unsigned long ip, unsigned char *old_code,
*/

/* read the text we want to modify */
- if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+ if (copy_from_kernel_nofault(replaced, (void *)ip, MCOUNT_INSN_SIZE))
return -EFAULT;

/* Make sure it is what we expect it to be */
@@ -268,7 +268,7 @@ static int ftrace_mod(unsigned long ip, unsigned long old_addr,
{
unsigned char code[MCOUNT_INSN_SIZE];

- if (probe_kernel_read(code, (void *)ip, MCOUNT_INSN_SIZE))
+ if (copy_from_kernel_nofault(code, (void *)ip, MCOUNT_INSN_SIZE))
return -EFAULT;

if (old_addr != __raw_readl((unsigned long *)code))
diff --git a/arch/um/kernel/maccess.c b/arch/um/kernel/maccess.c
index e929c0966696c..8ccd56813f684 100644
--- a/arch/um/kernel/maccess.c
+++ b/arch/um/kernel/maccess.c
@@ -7,7 +7,7 @@
#include <linux/kernel.h>
#include <os.h>

-bool probe_kernel_read_allowed(const void *src, size_t size)
+bool copy_from_kernel_nofault_allowed(const void *src, size_t size)
{
void *psrc = (void *)rounddown((unsigned long)src, PAGE_SIZE);

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 6d6475fdd3278..62ac40751276f 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -278,7 +278,7 @@ static inline unsigned long *regs_get_kernel_stack_nth_addr(struct pt_regs *regs
}

/* To avoid include hell, we can't include uaccess.h */
-extern long probe_kernel_read(void *dst, const void *src, size_t size);
+extern long copy_from_kernel_nofault(void *dst, const void *src, size_t size);

/**
* regs_get_kernel_stack_nth() - get Nth entry of the stack
@@ -298,7 +298,7 @@ static inline unsigned long regs_get_kernel_stack_nth(struct pt_regs *regs,

addr = regs_get_kernel_stack_nth_addr(regs, n);
if (addr) {
- ret = probe_kernel_read(&val, addr, sizeof(val));
+ ret = copy_from_kernel_nofault(&val, addr, sizeof(val));
if (!ret)
return val;
}
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index ae64ec7f752f4..f8dd06ccd8d2e 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -106,7 +106,7 @@ void show_opcodes(struct pt_regs *regs, const char *loglvl)
bad_ip = user_mode(regs) &&
__chk_range_not_ok(prologue, OPCODE_BUFSIZE, TASK_SIZE_MAX);

- if (bad_ip || probe_kernel_read(opcodes, (u8 *)prologue,
+ if (bad_ip || copy_from_kernel_nofault(opcodes, (u8 *)prologue,
OPCODE_BUFSIZE)) {
printk("%sCode: Bad RIP value.\n", loglvl);
} else {
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index b0e641793be4f..3a27647d00560 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -86,7 +86,7 @@ static int ftrace_verify_code(unsigned long ip, const char *old_code)
* sure what we read is what we expected it to be before modifying it.
*/
/* read the text we want to modify */
- if (probe_kernel_read(cur_code, (void *)ip, MCOUNT_INSN_SIZE)) {
+ if (copy_from_kernel_nofault(cur_code, (void *)ip, MCOUNT_INSN_SIZE)) {
WARN_ON(1);
return -EFAULT;
}
@@ -354,7 +354,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
npages = DIV_ROUND_UP(*tramp_size, PAGE_SIZE);

/* Copy ftrace_caller onto the trampoline memory */
- ret = probe_kernel_read(trampoline, (void *)start_offset, size);
+ ret = copy_from_kernel_nofault(trampoline, (void *)start_offset, size);
if (WARN_ON(ret < 0))
goto fail;

@@ -362,7 +362,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)

/* The trampoline ends with ret(q) */
retq = (unsigned long)ftrace_stub;
- ret = probe_kernel_read(ip, (void *)retq, RET_SIZE);
+ ret = copy_from_kernel_nofault(ip, (void *)retq, RET_SIZE);
if (WARN_ON(ret < 0))
goto fail;

@@ -498,7 +498,7 @@ static void *addr_from_call(void *ptr)
union text_poke_insn call;
int ret;

- ret = probe_kernel_read(&call, ptr, CALL_INSN_SIZE);
+ ret = copy_from_kernel_nofault(&call, ptr, CALL_INSN_SIZE);
if (WARN_ON_ONCE(ret < 0))
return NULL;

diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index c44fe7d8d9a4e..68acd30c6b878 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -732,11 +732,11 @@ int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
int err;

bpt->type = BP_BREAKPOINT;
- err = probe_kernel_read(bpt->saved_instr, (char *)bpt->bpt_addr,
+ err = copy_from_kernel_nofault(bpt->saved_instr, (char *)bpt->bpt_addr,
BREAK_INSTR_SIZE);
if (err)
return err;
- err = probe_kernel_write((char *)bpt->bpt_addr,
+ err = copy_to_kernel_nofault((char *)bpt->bpt_addr,
arch_kgdb_ops.gdb_bpt_instr, BREAK_INSTR_SIZE);
if (!err)
return err;
@@ -768,7 +768,7 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
return 0;

knl_write:
- return probe_kernel_write((char *)bpt->bpt_addr,
+ return copy_to_kernel_nofault((char *)bpt->bpt_addr,
(char *)bpt->saved_instr, BREAK_INSTR_SIZE);
}

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 4d7022a740ab0..ecffda528f86f 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -243,7 +243,7 @@ __recover_probed_insn(kprobe_opcode_t *buf, unsigned long addr)
* Fortunately, we know that the original code is the ideal 5-byte
* long NOP.
*/
- if (probe_kernel_read(buf, (void *)addr,
+ if (copy_from_kernel_nofault(buf, (void *)addr,
MAX_INSN_SIZE * sizeof(kprobe_opcode_t)))
return 0UL;

@@ -346,7 +346,8 @@ int __copy_instruction(u8 *dest, u8 *src, u8 *real, struct insn *insn)
return 0;

/* This can access kernel text if given address is not recovered */
- if (probe_kernel_read(dest, (void *)recovered_insn, MAX_INSN_SIZE))
+ if (copy_from_kernel_nofault(dest, (void *)recovered_insn,
+ MAX_INSN_SIZE))
return 0;

kernel_insn_init(insn, dest, MAX_INSN_SIZE);
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index ea13f68882849..85696911093f4 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -56,7 +56,7 @@ unsigned long __recover_optprobed_insn(kprobe_opcode_t *buf, unsigned long addr)
* overwritten by jump destination address. In this case, original
* bytes must be recovered from op->optinsn.copied_insn buffer.
*/
- if (probe_kernel_read(buf, (void *)addr,
+ if (copy_from_kernel_nofault(buf, (void *)addr,
MAX_INSN_SIZE * sizeof(kprobe_opcode_t)))
return 0UL;

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index d54cffdc7cac2..96809f6aad67d 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -483,7 +483,8 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
u8 insn_buf[MAX_INSN_SIZE];
struct insn insn;

- if (probe_kernel_read(insn_buf, (void *)regs->ip, MAX_INSN_SIZE))
+ if (copy_from_kernel_nofault(insn_buf, (void *)regs->ip,
+ MAX_INSN_SIZE))
return GP_NO_HINT;

kernel_insn_init(&insn, insn_buf, MAX_INSN_SIZE);
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a51df516b87bf..994e207abdf64 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -590,7 +590,7 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
return;
}

- if (probe_kernel_read(&desc, (void *)(gdt->address + offset),
+ if (copy_from_kernel_nofault(&desc, (void *)(gdt->address + offset),
sizeof(struct ldttss_desc))) {
pr_alert("%s: 0x%hx -- GDT entry is not readable\n",
name, index);
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 4222a010057a9..4253c95968932 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -758,7 +758,7 @@ static void __init test_wp_bit(void)

__set_fixmap(FIX_WP_TEST, __pa_symbol(empty_zero_page), PAGE_KERNEL_RO);

- if (probe_kernel_write((char *)fix_to_virt(FIX_WP_TEST), &z, 1)) {
+ if (copy_to_kernel_nofault((char *)fix_to_virt(FIX_WP_TEST), &z, 1)) {
clear_fixmap(FIX_WP_TEST);
printk(KERN_CONT "Ok.\n");
return;
diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
index a5ed03ac9b10f..70730904c2c1c 100644
--- a/arch/x86/mm/maccess.c
+++ b/arch/x86/mm/maccess.c
@@ -9,7 +9,7 @@ static __always_inline u64 canonical_address(u64 vaddr, u8 vaddr_bits)
return ((s64)vaddr << (64 - vaddr_bits)) >> (64 - vaddr_bits);
}

-bool probe_kernel_read_allowed(const void *unsafe_src, size_t size)
+bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
{
unsigned long vaddr = (unsigned long)unsafe_src;

@@ -22,7 +22,7 @@ bool probe_kernel_read_allowed(const void *unsafe_src, size_t size)
canonical_address(vaddr, boot_cpu_data.x86_virt_bits) == vaddr;
}
#else
-bool probe_kernel_read_allowed(const void *unsafe_src, size_t size)
+bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
{
return (unsigned long)vaddr >= TASK_SIZE_MAX;
}
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 507f4fb88fa7f..0a76fd298d896 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -387,7 +387,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)

preempt_disable();

- probe_kernel_read(&dummy, v, 1);
+ copy_from_kernel_nofault(&dummy, v, 1);

if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0))
BUG();
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 43dd0891ca1ed..12774655aff45 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -167,7 +167,7 @@ static ssize_t read_mem(struct file *file, char __user *buf,
if (!ptr)
goto failed;

- probe = probe_kernel_read(bounce, ptr, sz);
+ probe = copy_from_kernel_nofault(bounce, ptr, sz);
unxlate_dev_mem_ptr(p, ptr);
if (probe)
goto failed;
diff --git a/drivers/dio/dio.c b/drivers/dio/dio.c
index c9aa15fb86a9a..193b40e7aec03 100644
--- a/drivers/dio/dio.c
+++ b/drivers/dio/dio.c
@@ -135,7 +135,8 @@ int __init dio_find(int deviceid)
else
va = ioremap(pa, PAGE_SIZE);

- if (probe_kernel_read(&i, (unsigned char *)va + DIO_IDOFF, 1)) {
+ if (copy_from_kernel_nofault(&i,
+ (unsigned char *)va + DIO_IDOFF, 1)) {
if (scode >= DIOII_SCBASE)
iounmap(va);
continue; /* no board present at that select code */
@@ -208,7 +209,8 @@ static int __init dio_init(void)
else
va = ioremap(pa, PAGE_SIZE);

- if (probe_kernel_read(&i, (unsigned char *)va + DIO_IDOFF, 1)) {
+ if (copy_from_kernel_nofault(&i,
+ (unsigned char *)va + DIO_IDOFF, 1)) {
if (scode >= DIOII_SCBASE)
iounmap(va);
continue; /* no board present at that select code */
diff --git a/drivers/input/serio/hp_sdc.c b/drivers/input/serio/hp_sdc.c
index 654252361653d..13eacf6ab4310 100644
--- a/drivers/input/serio/hp_sdc.c
+++ b/drivers/input/serio/hp_sdc.c
@@ -1021,7 +1021,7 @@ static int __init hp_sdc_register(void)
hp_sdc.base_io = (unsigned long) 0xf0428000;
hp_sdc.data_io = (unsigned long) hp_sdc.base_io + 1;
hp_sdc.status_io = (unsigned long) hp_sdc.base_io + 3;
- if (!probe_kernel_read(&i, (unsigned char *)hp_sdc.data_io, 1))
+ if (!copy_from_kernel_nofault(&i, (unsigned char *)hp_sdc.data_io, 1))
hp_sdc.dev = (void *)1;
hp_sdc.dev_err = hp_sdc_init();
#endif
diff --git a/drivers/misc/kgdbts.c b/drivers/misc/kgdbts.c
index bccd341e9ae16..d5d2af4d10e66 100644
--- a/drivers/misc/kgdbts.c
+++ b/drivers/misc/kgdbts.c
@@ -828,7 +828,7 @@ static void run_plant_and_detach_test(int is_early)
char before[BREAK_INSTR_SIZE];
char after[BREAK_INSTR_SIZE];

- probe_kernel_read(before, (char *)kgdbts_break_test,
+ copy_from_kernel_nofault(before, (char *)kgdbts_break_test,
BREAK_INSTR_SIZE);
init_simple_test();
ts.tst = plant_and_detach_test;
@@ -836,8 +836,8 @@ static void run_plant_and_detach_test(int is_early)
/* Activate test with initial breakpoint */
if (!is_early)
kgdb_breakpoint();
- probe_kernel_read(after, (char *)kgdbts_break_test,
- BREAK_INSTR_SIZE);
+ copy_from_kernel_nofault(after, (char *)kgdbts_break_test,
+ BREAK_INSTR_SIZE);
if (memcmp(before, after, BREAK_INSTR_SIZE)) {
printk(KERN_CRIT "kgdbts: ERROR kgdb corrupted memory\n");
panic("kgdb memory corruption");
diff --git a/drivers/video/fbdev/hpfb.c b/drivers/video/fbdev/hpfb.c
index f02be0db335e9..8d418abdd7678 100644
--- a/drivers/video/fbdev/hpfb.c
+++ b/drivers/video/fbdev/hpfb.c
@@ -402,7 +402,7 @@ int __init hpfb_init(void)
if (err)
return err;

- err = probe_kernel_read(&i, (unsigned char *)INTFBVADDR + DIO_IDOFF, 1);
+ err = copy_from_kernel_nofault(&i, (unsigned char *)INTFBVADDR + DIO_IDOFF, 1);

if (!err && (i == DIO_ID_FBUFFER) && topcat_sid_ok(sid = DIO_SECID(INTFBVADDR))) {
if (!request_mem_region(INTFBPADDR, DIO_DEVSIZE, "Internal Topcat"))
diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 8ba492d44e689..e502414b35564 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -512,7 +512,8 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
* Using bounce buffer to bypass the
* hardened user copy kernel text checks.
*/
- if (probe_kernel_read(buf, (void *) start, tsz)) {
+ if (copy_from_kernel_nofault(buf, (void *)start,
+ tsz)) {
if (clear_user(buffer, tsz)) {
ret = -EFAULT;
goto out;
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 58e9f3dc1cf13..329e8479798de 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -301,13 +301,14 @@ copy_struct_from_user(void *dst, size_t ksize, const void __user *src,
return 0;
}

-bool probe_kernel_read_allowed(const void *unsafe_src, size_t size);
+bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size);

-extern long probe_kernel_read(void *dst, const void *src, size_t size);
-extern long probe_user_read(void *dst, const void __user *src, size_t size);
+long copy_from_kernel_nofault(void *dst, const void *src, size_t size);
+long notrace copy_to_kernel_nofault(void *dst, const void *src, size_t size);

-extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
-extern long notrace probe_user_write(void __user *dst, const void *src, size_t size);
+extern long probe_user_read(void *dst, const void __user *src, size_t size);
+extern long notrace probe_user_write(void __user *dst, const void *src,
+ size_t size);

long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
long count);
@@ -324,7 +325,7 @@ long strnlen_user_nofault(const void __user *unsafe_addr, long count);
* Returns 0 on success, or -EFAULT.
*/
#define probe_kernel_address(addr, retval) \
- probe_kernel_read(&retval, addr, sizeof(retval))
+ copy_from_kernel_nofault(&retval, addr, sizeof(retval))

#ifndef user_access_begin
#define user_access_begin(ptr,len) access_ok(ptr, len)
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 2b7c9b67931d6..b5c2492e0b010 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -171,18 +171,18 @@ int __weak kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
{
int err;

- err = probe_kernel_read(bpt->saved_instr, (char *)bpt->bpt_addr,
+ err = copy_from_kernel_nofault(bpt->saved_instr, (char *)bpt->bpt_addr,
BREAK_INSTR_SIZE);
if (err)
return err;
- err = probe_kernel_write((char *)bpt->bpt_addr,
+ err = copy_to_kernel_nofault((char *)bpt->bpt_addr,
arch_kgdb_ops.gdb_bpt_instr, BREAK_INSTR_SIZE);
return err;
}

int __weak kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
{
- return probe_kernel_write((char *)bpt->bpt_addr,
+ return copy_to_kernel_nofault((char *)bpt->bpt_addr,
(char *)bpt->saved_instr, BREAK_INSTR_SIZE);
}

diff --git a/kernel/debug/gdbstub.c b/kernel/debug/gdbstub.c
index 4b280fc7dd675..61774aec46b4c 100644
--- a/kernel/debug/gdbstub.c
+++ b/kernel/debug/gdbstub.c
@@ -247,7 +247,7 @@ char *kgdb_mem2hex(char *mem, char *buf, int count)
*/
tmp = buf + count;

- err = probe_kernel_read(tmp, mem, count);
+ err = copy_from_kernel_nofault(tmp, mem, count);
if (err)
return NULL;
while (count > 0) {
@@ -283,7 +283,7 @@ int kgdb_hex2mem(char *buf, char *mem, int count)
*tmp_raw |= hex_to_bin(*tmp_hex--) << 4;
}

- return probe_kernel_write(mem, tmp_raw, count);
+ return copy_to_kernel_nofault(mem, tmp_raw, count);
}

/*
@@ -335,7 +335,7 @@ static int kgdb_ebin2mem(char *buf, char *mem, int count)
size++;
}

- return probe_kernel_write(mem, c, size);
+ return copy_to_kernel_nofault(mem, c, size);
}

#if DBG_MAX_REG_NUM > 0
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 515379cbf2092..31858c3839ca5 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2325,7 +2325,8 @@ void kdb_ps1(const struct task_struct *p)
int cpu;
unsigned long tmp;

- if (!p || probe_kernel_read(&tmp, (char *)p, sizeof(unsigned long)))
+ if (!p ||
+ copy_from_kernel_nofault(&tmp, (char *)p, sizeof(unsigned long)))
return;

cpu = kdb_process_cpu(p);
diff --git a/kernel/debug/kdb/kdb_support.c b/kernel/debug/kdb/kdb_support.c
index b8e6306e7e133..004c5b6c87f89 100644
--- a/kernel/debug/kdb/kdb_support.c
+++ b/kernel/debug/kdb/kdb_support.c
@@ -325,7 +325,7 @@ char *kdb_strdup(const char *str, gfp_t type)
*/
int kdb_getarea_size(void *res, unsigned long addr, size_t size)
{
- int ret = probe_kernel_read((char *)res, (char *)addr, size);
+ int ret = copy_from_kernel_nofault((char *)res, (char *)addr, size);
if (ret) {
if (!KDB_STATE(SUPPRESS)) {
kdb_printf("kdb_getarea: Bad address 0x%lx\n", addr);
@@ -350,7 +350,7 @@ int kdb_getarea_size(void *res, unsigned long addr, size_t size)
*/
int kdb_putarea_size(unsigned long addr, void *res, size_t size)
{
- int ret = probe_kernel_read((char *)addr, (char *)res, size);
+ int ret = copy_from_kernel_nofault((char *)addr, (char *)res, size);
if (ret) {
if (!KDB_STATE(SUPPRESS)) {
kdb_printf("kdb_putarea: Bad address 0x%lx\n", addr);
@@ -624,7 +624,8 @@ char kdb_task_state_char (const struct task_struct *p)
char state;
unsigned long tmp;

- if (!p || probe_kernel_read(&tmp, (char *)p, sizeof(unsigned long)))
+ if (!p ||
+ copy_from_kernel_nofault(&tmp, (char *)p, sizeof(unsigned long)))
return 'E';

cpu = kdb_process_cpu(p);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index bfbfa481be3a5..e19ebe61232ca 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -179,7 +179,7 @@ void *kthread_probe_data(struct task_struct *task)
struct kthread *kthread = to_kthread(task);
void *data = NULL;

- probe_kernel_read(&data, &kthread->data, sizeof(data));
+ copy_from_kernel_nofault(&data, &kthread->data, sizeof(data));
return data;
}

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d9781c894c38b..59fec6e226db7 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -196,7 +196,7 @@ bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr)

if (unlikely(ret < 0))
goto fail;
- ret = probe_kernel_read(dst, unsafe_ptr, size);
+ ret = copy_from_kernel_nofault(dst, unsafe_ptr, size);
if (unlikely(ret < 0))
goto fail;
return ret;
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index b1f21d558e454..430e64c4117ee 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1222,7 +1222,7 @@ fetch_store_strlen(unsigned long addr)
#endif

do {
- ret = probe_kernel_read(&c, (u8 *)addr + len, 1);
+ ret = copy_from_kernel_nofault(&c, (u8 *)addr + len, 1);
len++;
} while (c && ret == 0 && len < MAX_STRING_SIZE);

@@ -1300,7 +1300,7 @@ probe_mem_read(void *dest, void *src, size_t size)
if ((unsigned long)src < TASK_SIZE)
return probe_mem_read_user(dest, src, size);
#endif
- return probe_kernel_read(dest, src, size);
+ return copy_from_kernel_nofault(dest, src, size);
}

/* Note that we don't verify it, since the code does not come from user space */
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 891ccad5f2716..a1345ffa3a0ba 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4627,11 +4627,11 @@ void print_worker_info(const char *log_lvl, struct task_struct *task)
* Carefully copy the associated workqueue's workfn, name and desc.
* Keep the original last '\0' in case the original is garbage.
*/
- probe_kernel_read(&fn, &worker->current_func, sizeof(fn));
- probe_kernel_read(&pwq, &worker->current_pwq, sizeof(pwq));
- probe_kernel_read(&wq, &pwq->wq, sizeof(wq));
- probe_kernel_read(name, wq->name, sizeof(name) - 1);
- probe_kernel_read(desc, worker->desc, sizeof(desc) - 1);
+ copy_from_kernel_nofault(&fn, &worker->current_func, sizeof(fn));
+ copy_from_kernel_nofault(&pwq, &worker->current_pwq, sizeof(pwq));
+ copy_from_kernel_nofault(&wq, &pwq->wq, sizeof(wq));
+ copy_from_kernel_nofault(name, wq->name, sizeof(name) - 1);
+ copy_from_kernel_nofault(desc, worker->desc, sizeof(desc) - 1);

if (fn || name[0] || desc[0]) {
printk("%sWorkqueue: %s %ps", log_lvl, name, fn);
diff --git a/mm/maccess.c b/mm/maccess.c
index e19d19cd5382c..43ed419f38401 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -6,14 +6,15 @@
#include <linux/mm.h>
#include <linux/uaccess.h>

-bool __weak probe_kernel_read_allowed(const void *unsafe_src, size_t size)
+bool __weak copy_from_kernel_nofault_allowed(const void *unsafe_src,
+ size_t size)
{
return true;
}

#ifdef HAVE_GET_KERNEL_NOFAULT

-#define probe_kernel_read_loop(dst, src, len, type, err_label) \
+#define copy_from_kernel_nofault_loop(dst, src, len, type, err_label) \
while (len >= sizeof(type)) { \
__get_kernel_nofault(dst, src, type, err_label); \
dst += sizeof(type); \
@@ -21,25 +22,25 @@ bool __weak probe_kernel_read_allowed(const void *unsafe_src, size_t size)
len -= sizeof(type); \
}

-long probe_kernel_read(void *dst, const void *src, size_t size)
+long copy_from_kernel_nofault(void *dst, const void *src, size_t size)
{
- if (!probe_kernel_read_allowed(src, size))
+ if (!copy_from_kernel_nofault_allowed(src, size))
return -EFAULT;

pagefault_disable();
- probe_kernel_read_loop(dst, src, size, u64, Efault);
- probe_kernel_read_loop(dst, src, size, u32, Efault);
- probe_kernel_read_loop(dst, src, size, u16, Efault);
- probe_kernel_read_loop(dst, src, size, u8, Efault);
+ copy_from_kernel_nofault_loop(dst, src, size, u64, Efault);
+ copy_from_kernel_nofault_loop(dst, src, size, u32, Efault);
+ copy_from_kernel_nofault_loop(dst, src, size, u16, Efault);
+ copy_from_kernel_nofault_loop(dst, src, size, u8, Efault);
pagefault_enable();
return 0;
Efault:
pagefault_enable();
return -EFAULT;
}
-EXPORT_SYMBOL_GPL(probe_kernel_read);
+EXPORT_SYMBOL_GPL(copy_from_kernel_nofault);

-#define probe_kernel_write_loop(dst, src, len, type, err_label) \
+#define copy_to_kernel_nofault_loop(dst, src, len, type, err_label) \
while (len >= sizeof(type)) { \
__put_kernel_nofault(dst, src, type, err_label); \
dst += sizeof(type); \
@@ -47,13 +48,13 @@ EXPORT_SYMBOL_GPL(probe_kernel_read);
len -= sizeof(type); \
}

-long probe_kernel_write(void *dst, const void *src, size_t size)
+long copy_to_kernel_nofault(void *dst, const void *src, size_t size)
{
pagefault_disable();
- probe_kernel_write_loop(dst, src, size, u64, Efault);
- probe_kernel_write_loop(dst, src, size, u32, Efault);
- probe_kernel_write_loop(dst, src, size, u16, Efault);
- probe_kernel_write_loop(dst, src, size, u8, Efault);
+ copy_to_kernel_nofault_loop(dst, src, size, u64, Efault);
+ copy_to_kernel_nofault_loop(dst, src, size, u32, Efault);
+ copy_to_kernel_nofault_loop(dst, src, size, u16, Efault);
+ copy_to_kernel_nofault_loop(dst, src, size, u8, Efault);
pagefault_enable();
return 0;
Efault:
@@ -67,7 +68,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)

if (unlikely(count <= 0))
return 0;
- if (!probe_kernel_read_allowed(unsafe_addr, count))
+ if (!copy_from_kernel_nofault_allowed(unsafe_addr, count))
return -EFAULT;

pagefault_disable();
@@ -87,7 +88,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
}
#else /* HAVE_GET_KERNEL_NOFAULT */
/**
- * probe_kernel_read(): safely attempt to read from kernel-space
+ * copy_from_kernel_nofault(): safely attempt to read from kernel-space
* @dst: pointer to the buffer that shall take the data
* @src: address to read from
* @size: size of the data chunk
@@ -97,15 +98,15 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
*
* We ensure that the copy_from_user is executed in atomic context so that
* do_page_fault() doesn't attempt to take mmap_sem. This makes
- * probe_kernel_read() suitable for use within regions where the caller
+ * copy_from_kernel_nofault() suitable for use within regions where the caller
* already holds mmap_sem, or other locks which nest inside mmap_sem.
*/
-long probe_kernel_read(void *dst, const void *src, size_t size)
+long copy_from_kernel_nofault(void *dst, const void *src, size_t size)
{
long ret;
mm_segment_t old_fs = get_fs();

- if (!probe_kernel_read_allowed(src, size))
+ if (!copy_from_kernel_nofault_allowed(src, size))
return -EFAULT;

set_fs(KERNEL_DS);
@@ -119,10 +120,10 @@ long probe_kernel_read(void *dst, const void *src, size_t size)
return -EFAULT;
return 0;
}
-EXPORT_SYMBOL_GPL(probe_kernel_read);
+EXPORT_SYMBOL_GPL(copy_from_kernel_nofault);

/**
- * probe_kernel_write(): safely attempt to write to a location
+ * copy_to_kernel_nofault(): safely attempt to write to a location
* @dst: address to write to
* @src: pointer to the data that shall be written
* @size: size of the data chunk
@@ -130,7 +131,7 @@ EXPORT_SYMBOL_GPL(probe_kernel_read);
* Safely write to address @dst from the buffer at @src. If a kernel fault
* happens, handle that and return -EFAULT.
*/
-long probe_kernel_write(void *dst, const void *src, size_t size)
+long copy_to_kernel_nofault(void *dst, const void *src, size_t size)
{
long ret;
mm_segment_t old_fs = get_fs();
@@ -172,7 +173,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)

if (unlikely(count <= 0))
return 0;
- if (!probe_kernel_read_allowed(unsafe_addr, count))
+ if (!copy_from_kernel_nofault_allowed(unsafe_addr, count))
return -EFAULT;

set_fs(KERNEL_DS);
diff --git a/mm/rodata_test.c b/mm/rodata_test.c
index 5e313fa93276d..2a99df7beeb35 100644
--- a/mm/rodata_test.c
+++ b/mm/rodata_test.c
@@ -25,7 +25,7 @@ void rodata_test(void)
}

/* test 2: write to the variable; this should fault */
- if (!probe_kernel_write((void *)&rodata_test_data,
+ if (!copy_to_kernel_nofault((void *)&rodata_test_data,
(void *)&zero, sizeof(zero))) {
pr_err("test data was not read only\n");
return;
diff --git a/mm/slub.c b/mm/slub.c
index b762450fc9f07..4af02f7af7118 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -292,7 +292,7 @@ static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
return get_freepointer(s, object);

freepointer_addr = (unsigned long)object + s->offset;
- probe_kernel_read(&p, (void **)freepointer_addr, sizeof(p));
+ copy_from_kernel_nofault(&p, (void **)freepointer_addr, sizeof(p));
return freelist_ptr(s, p, freepointer_addr);
}

--
2.26.2

2020-05-21 15:29:32

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 16/23] maccess: always use strict semantics for probe_kernel_read

Except for historical confusion in the kprobes/uprobes and bpf tracers,
which has been fixed now, there is no good reason to ever allow user
memory accesses from probe_kernel_read. Switch probe_kernel_read to only
read from kernel memory.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/parisc/lib/memcpy.c | 2 +-
arch/um/kernel/maccess.c | 2 +-
arch/x86/mm/maccess.c | 9 ++-------
include/linux/uaccess.h | 4 +---
kernel/trace/bpf_trace.c | 2 +-
kernel/trace/trace_kprobe.c | 4 ++--
mm/maccess.c | 40 ++++++-------------------------------
7 files changed, 14 insertions(+), 49 deletions(-)

diff --git a/arch/parisc/lib/memcpy.c b/arch/parisc/lib/memcpy.c
index 5b75c35d1da0d..94a9fe2702c2f 100644
--- a/arch/parisc/lib/memcpy.c
+++ b/arch/parisc/lib/memcpy.c
@@ -57,7 +57,7 @@ void * memcpy(void * dst,const void *src, size_t count)
EXPORT_SYMBOL(raw_copy_in_user);
EXPORT_SYMBOL(memcpy);

-bool probe_kernel_read_allowed(const void *unsafe_src, size_t size, bool strict)
+bool probe_kernel_read_allowed(const void *unsafe_src, size_t size)
{
if ((unsigned long)unsafe_src < PAGE_SIZE)
return false;
diff --git a/arch/um/kernel/maccess.c b/arch/um/kernel/maccess.c
index ad2c538ce497c..e929c0966696c 100644
--- a/arch/um/kernel/maccess.c
+++ b/arch/um/kernel/maccess.c
@@ -7,7 +7,7 @@
#include <linux/kernel.h>
#include <os.h>

-bool probe_kernel_read_allowed(const void *src, size_t size, bool strict)
+bool probe_kernel_read_allowed(const void *src, size_t size)
{
void *psrc = (void *)rounddown((unsigned long)src, PAGE_SIZE);

diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
index a96a56ff16109..a5ed03ac9b10f 100644
--- a/arch/x86/mm/maccess.c
+++ b/arch/x86/mm/maccess.c
@@ -9,13 +9,10 @@ static __always_inline u64 canonical_address(u64 vaddr, u8 vaddr_bits)
return ((s64)vaddr << (64 - vaddr_bits)) >> (64 - vaddr_bits);
}

-bool probe_kernel_read_allowed(const void *unsafe_src, size_t size, bool strict)
+bool probe_kernel_read_allowed(const void *unsafe_src, size_t size)
{
unsigned long vaddr = (unsigned long)unsafe_src;

- if (!strict)
- return true;
-
/*
* Range covering the highest possible canonical userspace address
* as well as non-canonical address range. For the canonical range
@@ -25,10 +22,8 @@ bool probe_kernel_read_allowed(const void *unsafe_src, size_t size, bool strict)
canonical_address(vaddr, boot_cpu_data.x86_virt_bits) == vaddr;
}
#else
-bool probe_kernel_read_allowed(const void *unsafe_src, size_t size, bool strict)
+bool probe_kernel_read_allowed(const void *unsafe_src, size_t size)
{
- if (!strict)
- return true;
return (unsigned long)vaddr >= TASK_SIZE_MAX;
}
#endif
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index d7d98ff345b3d..58e9f3dc1cf13 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -301,11 +301,9 @@ copy_struct_from_user(void *dst, size_t ksize, const void __user *src,
return 0;
}

-bool probe_kernel_read_allowed(const void *unsafe_src, size_t size,
- bool strict);
+bool probe_kernel_read_allowed(const void *unsafe_src, size_t size);

extern long probe_kernel_read(void *dst, const void *src, size_t size);
-extern long probe_kernel_read_strict(void *dst, const void *src, size_t size);
extern long probe_user_read(void *dst, const void __user *src, size_t size);

extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 43566cd2a8180..d9781c894c38b 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -196,7 +196,7 @@ bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr)

if (unlikely(ret < 0))
goto fail;
- ret = probe_kernel_read_strict(dst, unsafe_ptr, size);
+ ret = probe_kernel_read(dst, unsafe_ptr, size);
if (unlikely(ret < 0))
goto fail;
return ret;
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 4aeaef53ba970..b1f21d558e454 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1222,7 +1222,7 @@ fetch_store_strlen(unsigned long addr)
#endif

do {
- ret = probe_kernel_read_strict(&c, (u8 *)addr + len, 1);
+ ret = probe_kernel_read(&c, (u8 *)addr + len, 1);
len++;
} while (c && ret == 0 && len < MAX_STRING_SIZE);

@@ -1300,7 +1300,7 @@ probe_mem_read(void *dest, void *src, size_t size)
if ((unsigned long)src < TASK_SIZE)
return probe_mem_read_user(dest, src, size);
#endif
- return probe_kernel_read_strict(dest, src, size);
+ return probe_kernel_read(dest, src, size);
}

/* Note that we don't verify it, since the code does not come from user space */
diff --git a/mm/maccess.c b/mm/maccess.c
index df82fde34307f..81a85c1e71165 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -6,36 +6,13 @@
#include <linux/mm.h>
#include <linux/uaccess.h>

-static long __probe_kernel_read(void *dst, const void *src, size_t size,
- bool strict);
-
-bool __weak probe_kernel_read_allowed(const void *unsafe_src, size_t size,
- bool strict)
+bool __weak probe_kernel_read_allowed(const void *unsafe_src, size_t size)
{
return true;
}

/**
- * probe_kernel_read(): safely attempt to read from any location
- * @dst: pointer to the buffer that shall take the data
- * @src: address to read from
- * @size: size of the data chunk
- *
- * Same as probe_kernel_read_strict() except that for architectures with
- * not fully separated user and kernel address spaces this function also works
- * for user address tanges.
- *
- * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
- * separate kernel and user address spaces, and also a bad idea otherwise.
- */
-long probe_kernel_read(void *dst, const void *src, size_t size)
-{
- return __probe_kernel_read(dst, src, size, false);
-}
-EXPORT_SYMBOL_GPL(probe_kernel_read);
-
-/**
- * probe_kernel_read_strict(): safely attempt to read from kernel-space
+ * probe_kernel_read(): safely attempt to read from kernel-space
* @dst: pointer to the buffer that shall take the data
* @src: address to read from
* @size: size of the data chunk
@@ -48,18 +25,12 @@ EXPORT_SYMBOL_GPL(probe_kernel_read);
* probe_kernel_read() suitable for use within regions where the caller
* already holds mmap_sem, or other locks which nest inside mmap_sem.
*/
-long probe_kernel_read_strict(void *dst, const void *src, size_t size)
-{
- return __probe_kernel_read(dst, src, size, true);
-}
-
-static long __probe_kernel_read(void *dst, const void *src, size_t size,
- bool strict)
+long probe_kernel_read(void *dst, const void *src, size_t size)
{
long ret;
mm_segment_t old_fs = get_fs();

- if (!probe_kernel_read_allowed(src, size, strict))
+ if (!probe_kernel_read_allowed(src, size))
return -EFAULT;

set_fs(KERNEL_DS);
@@ -73,6 +44,7 @@ static long __probe_kernel_read(void *dst, const void *src, size_t size,
return -EFAULT;
return 0;
}
+EXPORT_SYMBOL_GPL(probe_kernel_read);

/**
* probe_user_read(): safely attempt to read from a user-space location
@@ -180,7 +152,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)

if (unlikely(count <= 0))
return 0;
- if (!probe_kernel_read_allowed(unsafe_addr, count, true))
+ if (!probe_kernel_read_allowed(unsafe_addr, count))
return -EFAULT;

set_fs(KERNEL_DS);
--
2.26.2

2020-05-21 15:31:52

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 01/23] maccess: unexport probe_kernel_write and probe_user_write

These two functions are not used by any modular code.

Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/maccess.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 3ca8d97e50106..cf21e604f78cb 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -121,7 +121,6 @@ long __probe_kernel_write(void *dst, const void *src, size_t size)

return ret;
}
-EXPORT_SYMBOL_GPL(probe_kernel_write);

/**
* probe_user_write(): safely attempt to write to a user-space location
@@ -148,7 +147,6 @@ long __probe_user_write(void __user *dst, const void *src, size_t size)

return ret;
}
-EXPORT_SYMBOL_GPL(probe_user_write);

/**
* strncpy_from_unsafe: - Copy a NUL terminated string from unsafe address.
--
2.26.2

2020-05-21 18:07:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: clean up and streamline probe_kernel_* and friends v4

On Thu, May 21, 2020 at 8:23 AM Christoph Hellwig <[email protected]> wrote:
>
> this series start cleaning up the safe kernel and user memory probing
> helpers in mm/maccess.c, and then allows architectures to implement
> the kernel probing without overriding the address space limit and
> temporarily allowing access to user memory. It then switches x86
> over to this new mechanism by reusing the unsafe_* uaccess logic.

I could not see anything to object to in this version. So Ack from me,
but obviously I'm hoping others will try to read it through as well.

Linus

2020-05-21 22:12:32

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH 11/23] bpf: factor out a bpf_trace_copy_string helper

On Thu, May 21, 2020 at 8:25 AM Christoph Hellwig <[email protected]> wrote:
>
> Split out a helper to do the fault free access to the string pointer
> to get it out of a crazy indentation level.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---

LGTM.

Acked-by: Andrii Nakryiko <[email protected]>

> kernel/trace/bpf_trace.c | 42 +++++++++++++++++++++++-----------------
> 1 file changed, 24 insertions(+), 18 deletions(-)
>

[...]

2020-05-21 22:12:37

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH 12/23] bpf: handle the compat string in bpf_trace_copy_string better

On Thu, May 21, 2020 at 8:24 AM Christoph Hellwig <[email protected]> wrote:
>
> User the proper helper for kernel or userspace addresses based on
> TASK_SIZE instead of the dangerous strncpy_from_unsafe function.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---

Acked-by: Andrii Nakryiko <[email protected]>

> kernel/trace/bpf_trace.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>

[...]

2020-05-21 22:13:13

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH 13/23] bpf: rework the compat kernel probe handling

On Thu, May 21, 2020 at 8:26 AM Christoph Hellwig <[email protected]> wrote:
>
> Instead of using the dangerous probe_kernel_read and strncpy_from_unsafe
> helpers, rework the compat probes to check if an address is a kernel or
> userspace one, and then use the low-level kernel or user probe helper
> shared by the proper kernel and user probe helpers. This slightly
> changes behavior as the compat probe on a user address doesn't check
> the lockdown flags, just as the pure user probes do.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---

Looks good, thanks.

Acked-by: Andrii Nakryiko <[email protected]>

> kernel/trace/bpf_trace.c | 109 ++++++++++++++++++++++++---------------
> 1 file changed, 67 insertions(+), 42 deletions(-)
>

[...]

2020-05-22 00:09:00

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH 14/23] tracing/kprobes: handle mixed kernel/userspace probes better

On Thu, 21 May 2020 17:22:52 +0200
Christoph Hellwig <[email protected]> wrote:

> Instead of using the dangerous probe_kernel_read and strncpy_from_unsafe
> helpers, rework probes to try a user probe based on the address if the
> architecture has a common address space for kernel and userspace.
>

This looks good to me.

Acked-by: Masami Hiramatsu <[email protected]>

Thank you!

> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> kernel/trace/trace_kprobe.c | 72 ++++++++++++++++++++++---------------
> 1 file changed, 43 insertions(+), 29 deletions(-)
>
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 4325f9e7fadaa..4aeaef53ba970 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -1200,6 +1200,15 @@ static const struct file_operations kprobe_profile_ops = {
>
> /* Kprobe specific fetch functions */
>
> +/* Return the length of string -- including null terminal byte */
> +static nokprobe_inline int
> +fetch_store_strlen_user(unsigned long addr)
> +{
> + const void __user *uaddr = (__force const void __user *)addr;
> +
> + return strnlen_user_nofault(uaddr, MAX_STRING_SIZE);
> +}
> +
> /* Return the length of string -- including null terminal byte */
> static nokprobe_inline int
> fetch_store_strlen(unsigned long addr)
> @@ -1207,30 +1216,27 @@ fetch_store_strlen(unsigned long addr)
> int ret, len = 0;
> u8 c;
>
> +#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> + if (addr < TASK_SIZE)
> + return fetch_store_strlen_user(addr);
> +#endif
> +
> do {
> - ret = probe_kernel_read(&c, (u8 *)addr + len, 1);
> + ret = probe_kernel_read_strict(&c, (u8 *)addr + len, 1);
> len++;
> } while (c && ret == 0 && len < MAX_STRING_SIZE);
>
> return (ret < 0) ? ret : len;
> }
>
> -/* Return the length of string -- including null terminal byte */
> -static nokprobe_inline int
> -fetch_store_strlen_user(unsigned long addr)
> -{
> - const void __user *uaddr = (__force const void __user *)addr;
> -
> - return strnlen_user_nofault(uaddr, MAX_STRING_SIZE);
> -}
> -
> /*
> - * Fetch a null-terminated string. Caller MUST set *(u32 *)buf with max
> - * length and relative data location.
> + * Fetch a null-terminated string from user. Caller MUST set *(u32 *)buf
> + * with max length and relative data location.
> */
> static nokprobe_inline int
> -fetch_store_string(unsigned long addr, void *dest, void *base)
> +fetch_store_string_user(unsigned long addr, void *dest, void *base)
> {
> + const void __user *uaddr = (__force const void __user *)addr;
> int maxlen = get_loc_len(*(u32 *)dest);
> void *__dest;
> long ret;
> @@ -1240,11 +1246,7 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
>
> __dest = get_loc_data(dest, base);
>
> - /*
> - * Try to get string again, since the string can be changed while
> - * probing.
> - */
> - ret = strncpy_from_unsafe(__dest, (void *)addr, maxlen);
> + ret = strncpy_from_user_nofault(__dest, uaddr, maxlen);
> if (ret >= 0)
> *(u32 *)dest = make_data_loc(ret, __dest - base);
>
> @@ -1252,35 +1254,37 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
> }
>
> /*
> - * Fetch a null-terminated string from user. Caller MUST set *(u32 *)buf
> - * with max length and relative data location.
> + * Fetch a null-terminated string. Caller MUST set *(u32 *)buf with max
> + * length and relative data location.
> */
> static nokprobe_inline int
> -fetch_store_string_user(unsigned long addr, void *dest, void *base)
> +fetch_store_string(unsigned long addr, void *dest, void *base)
> {
> - const void __user *uaddr = (__force const void __user *)addr;
> int maxlen = get_loc_len(*(u32 *)dest);
> void *__dest;
> long ret;
>
> +#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> + if ((unsigned long)addr < TASK_SIZE)
> + return fetch_store_string_user(addr, dest, base);
> +#endif
> +
> if (unlikely(!maxlen))
> return -ENOMEM;
>
> __dest = get_loc_data(dest, base);
>
> - ret = strncpy_from_user_nofault(__dest, uaddr, maxlen);
> + /*
> + * Try to get string again, since the string can be changed while
> + * probing.
> + */
> + ret = strncpy_from_user_nofault(__dest, (void *)addr, maxlen);
> if (ret >= 0)
> *(u32 *)dest = make_data_loc(ret, __dest - base);
>
> return ret;
> }
>
> -static nokprobe_inline int
> -probe_mem_read(void *dest, void *src, size_t size)
> -{
> - return probe_kernel_read(dest, src, size);
> -}
> -
> static nokprobe_inline int
> probe_mem_read_user(void *dest, void *src, size_t size)
> {
> @@ -1289,6 +1293,16 @@ probe_mem_read_user(void *dest, void *src, size_t size)
> return probe_user_read(dest, uaddr, size);
> }
>
> +static nokprobe_inline int
> +probe_mem_read(void *dest, void *src, size_t size)
> +{
> +#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> + if ((unsigned long)src < TASK_SIZE)
> + return probe_mem_read_user(dest, src, size);
> +#endif
> + return probe_kernel_read_strict(dest, src, size);
> +}
> +
> /* Note that we don't verify it, since the code does not come from user space */
> static int
> process_fetch_insn(struct fetch_insn *code, struct pt_regs *regs, void *dest,
> --
> 2.26.2
>


--
Masami Hiramatsu <[email protected]>

2020-05-22 00:25:47

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: clean up and streamline probe_kernel_* and friends v4

On Thu, 21 May 2020 17:22:38 +0200
Christoph Hellwig <[email protected]> wrote:

> Hi all,
>
> this series start cleaning up the safe kernel and user memory probing
> helpers in mm/maccess.c, and then allows architectures to implement
> the kernel probing without overriding the address space limit and
> temporarily allowing access to user memory. It then switches x86
> over to this new mechanism by reusing the unsafe_* uaccess logic.
>
> This version also switches to the saner copy_{from,to}_kernel_nofault
> naming suggested by Linus.
>
> I kept the x86 helpers as-is without calling unsage_{get,put}_user as
> that avoids a number of hard to trace casts, and it will still work
> with the asm-goto based version easily.
>
> Changes since v3:
> - cleanup how bpf and trace_kprobe perform the TASK_SIZE checks
> - remove the unused dst argument to probe_kernel_read_allowed
> - document the -ERANGE return value

This series looks good to me.

Reviewed-by: Masami Hiramatsu <[email protected]>

for this series.

Thank you!

>
> Changes since v2:
> - rebased on 5.7-rc6 with the bpf trace format string changes
> - rename arch_kernel_read to __get_kernel_nofault and arch_kernel_write
> to __put_kernel_nofault
> - clean up the tracers to only allowd "mixed" reads when the kernel
> has non-overlapping address spaces


--
Masami Hiramatsu <[email protected]>

2020-05-25 22:52:00

by Andrew Morton

[permalink] [raw]
Subject: Re: clean up and streamline probe_kernel_* and friends v4

On Thu, 21 May 2020 17:22:38 +0200 Christoph Hellwig <[email protected]> wrote:

> this series start cleaning up the safe kernel and user memory probing
> helpers in mm/maccess.c, and then allows architectures to implement
> the kernel probing without overriding the address space limit and
> temporarily allowing access to user memory. It then switches x86
> over to this new mechanism by reusing the unsafe_* uaccess logic.
>
> This version also switches to the saner copy_{from,to}_kernel_nofault
> naming suggested by Linus.
>
> I kept the x86 helpers as-is without calling unsage_{get,put}_user as
> that avoids a number of hard to trace casts, and it will still work
> with the asm-goto based version easily.

hm. Applying linux-next to this series generates a lot of rejects against
powerpc:

-rw-rw-r-- 1 akpm akpm 493 May 25 15:06 arch/powerpc/kernel/kgdb.c.rej
-rw-rw-r-- 1 akpm akpm 6461 May 25 15:06 arch/powerpc/kernel/trace/ftrace.c.rej
-rw-rw-r-- 1 akpm akpm 447 May 25 15:06 arch/powerpc/mm/fault.c.rej
-rw-rw-r-- 1 akpm akpm 623 May 25 15:06 arch/powerpc/perf/core-book3s.c.rej
-rw-rw-r-- 1 akpm akpm 1408 May 25 15:06 arch/riscv/kernel/patch.c.rej

the arch/powerpc/kernel/trace/ftrace.c ones aren't very trivial.

It's -rc7. Perhaps we should park all this until 5.8-rc1?

2020-05-26 06:17:37

by Christoph Hellwig

[permalink] [raw]
Subject: Re: clean up and streamline probe_kernel_* and friends v4

On Mon, May 25, 2020 at 03:19:12PM -0700, Andrew Morton wrote:
> hm. Applying linux-next to this series generates a lot of rejects against
> powerpc:
>
> -rw-rw-r-- 1 akpm akpm 493 May 25 15:06 arch/powerpc/kernel/kgdb.c.rej
> -rw-rw-r-- 1 akpm akpm 6461 May 25 15:06 arch/powerpc/kernel/trace/ftrace.c.rej
> -rw-rw-r-- 1 akpm akpm 447 May 25 15:06 arch/powerpc/mm/fault.c.rej
> -rw-rw-r-- 1 akpm akpm 623 May 25 15:06 arch/powerpc/perf/core-book3s.c.rej
> -rw-rw-r-- 1 akpm akpm 1408 May 25 15:06 arch/riscv/kernel/patch.c.rej
>
> the arch/powerpc/kernel/trace/ftrace.c ones aren't very trivial.
>
> It's -rc7. Perhaps we should park all this until 5.8-rc1?

As this is a pre-condition for the set_fs removal I'd really like to
get the actual changes in. All these conflicts seem to be about the
last three cleanup patches just doing renaming, so can we just skip
those three for now? Then we can do the rename right after 5.8-rc1
when we have the least chances for conflicts.

2020-05-28 00:38:45

by Andrew Morton

[permalink] [raw]
Subject: Re: clean up and streamline probe_kernel_* and friends v4

On Tue, 26 May 2020 08:13:09 +0200 Christoph Hellwig <[email protected]> wrote:

> On Mon, May 25, 2020 at 03:19:12PM -0700, Andrew Morton wrote:
> > hm. Applying linux-next to this series generates a lot of rejects against
> > powerpc:
> >
> > -rw-rw-r-- 1 akpm akpm 493 May 25 15:06 arch/powerpc/kernel/kgdb.c.rej
> > -rw-rw-r-- 1 akpm akpm 6461 May 25 15:06 arch/powerpc/kernel/trace/ftrace.c.rej
> > -rw-rw-r-- 1 akpm akpm 447 May 25 15:06 arch/powerpc/mm/fault.c.rej
> > -rw-rw-r-- 1 akpm akpm 623 May 25 15:06 arch/powerpc/perf/core-book3s.c.rej
> > -rw-rw-r-- 1 akpm akpm 1408 May 25 15:06 arch/riscv/kernel/patch.c.rej
> >
> > the arch/powerpc/kernel/trace/ftrace.c ones aren't very trivial.
> >
> > It's -rc7. Perhaps we should park all this until 5.8-rc1?
>
> As this is a pre-condition for the set_fs removal I'd really like to
> get the actual changes in. All these conflicts seem to be about the
> last three cleanup patches just doing renaming, so can we just skip
> those three for now? Then we can do the rename right after 5.8-rc1
> when we have the least chances for conflicts.

That seems to have worked. "[PATCH 23/23] maccess: return -ERANGE when
copy_from_kernel_nofault_allowed fails" needed a bit of massaging to both
the patch and to the patch title.

2020-05-28 00:58:09

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 10/23] maccess: unify the probe kernel arch hooks

On Thu, 21 May 2020 17:22:48 +0200 Christoph Hellwig <[email protected]> wrote:

> Currently architectures have to override every routine that probes
> kernel memory, which includes a pure read and strcpy, both in strict
> and not strict variants. Just provide a single arch hooks instead to
> make sure all architectures cover all the cases.

Fix a buildo.

--- a/arch/x86/mm/maccess.c~maccess-unify-the-probe-kernel-arch-hooks-fix
+++ a/arch/x86/mm/maccess.c
@@ -29,6 +29,6 @@ bool probe_kernel_read_allowed(const voi
{
if (!strict)
return true;
- return (unsigned long)vaddr >= TASK_SIZE_MAX;
+ return (unsigned long)unsafe_src >= TASK_SIZE_MAX;
}
#endif
_

2020-05-28 02:08:40

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 12/23] bpf: handle the compat string in bpf_trace_copy_string better

On Thu, 21 May 2020 17:22:50 +0200 Christoph Hellwig <[email protected]> wrote:

> User the proper helper for kernel or userspace addresses based on
> TASK_SIZE instead of the dangerous strncpy_from_unsafe function.
>
> ...
>
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -331,8 +331,11 @@ static void bpf_trace_copy_string(char *buf, void *unsafe_ptr, char fmt_ptype,
> switch (fmt_ptype) {
> case 's':
> #ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> - strncpy_from_unsafe(buf, unsafe_ptr, bufsz);
> - break;
> + if ((unsigned long)unsafe_ptr < TASK_SIZE) {
> + strncpy_from_user_nofault(buf, user_ptr, bufsz);
> + break;
> + }
> + fallthrough;
> #endif
> case 'k':
> strncpy_from_kernel_nofault(buf, unsafe_ptr, bufsz);

Another user of strncpy_from_unsafe() has popped up in linux-next's
bpf. I did the below, but didn't try very hard - it's probably wrong
if CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE=n?

Anyway, please take a look at all the bpf_trace.c changes in
linux-next.


From: Andrew Morton <[email protected]>
Subject: bpf:bpf_seq_printf(): handle potentially unsafe format string better

User the proper helper for kernel or userspace addresses based on
TASK_SIZE instead of the dangerous strncpy_from_unsafe function.

Cc: Christoph Hellwig <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

kernel/trace/bpf_trace.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

--- a/kernel/trace/bpf_trace.c~xxx
+++ a/kernel/trace/bpf_trace.c
@@ -588,15 +588,22 @@ BPF_CALL_5(bpf_seq_printf, struct seq_fi
}

if (fmt[i] == 's') {
+ void *unsafe_ptr;
+
/* try our best to copy */
if (memcpy_cnt >= MAX_SEQ_PRINTF_MAX_MEMCPY) {
err = -E2BIG;
goto out;
}

- err = strncpy_from_unsafe(bufs->buf[memcpy_cnt],
- (void *) (long) args[fmt_cnt],
- MAX_SEQ_PRINTF_STR_LEN);
+ unsafe_ptr = (void *)(long)args[fmt_cnt];
+ if ((unsigned long)unsafe_ptr < TASK_SIZE) {
+ err = strncpy_from_user_nofault(
+ bufs->buf[memcpy_cnt], unsafe_ptr,
+ MAX_SEQ_PRINTF_STR_LEN);
+ } else {
+ err = -EFAULT;
+ }
if (err < 0)
bufs->buf[memcpy_cnt][0] = '\0';
params[fmt_cnt] = (u64)(long)bufs->buf[memcpy_cnt];
_

2020-05-28 02:29:36

by Yonghong Song

[permalink] [raw]
Subject: Re: [PATCH 12/23] bpf: handle the compat string in bpf_trace_copy_string better



On 5/27/20 7:04 PM, Andrew Morton wrote:
> On Thu, 21 May 2020 17:22:50 +0200 Christoph Hellwig <[email protected]> wrote:
>
>> User the proper helper for kernel or userspace addresses based on
>> TASK_SIZE instead of the dangerous strncpy_from_unsafe function.
>>
>> ...
>>
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -331,8 +331,11 @@ static void bpf_trace_copy_string(char *buf, void *unsafe_ptr, char fmt_ptype,
>> switch (fmt_ptype) {
>> case 's':
>> #ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
>> - strncpy_from_unsafe(buf, unsafe_ptr, bufsz);
>> - break;
>> + if ((unsigned long)unsafe_ptr < TASK_SIZE) {
>> + strncpy_from_user_nofault(buf, user_ptr, bufsz);
>> + break;
>> + }
>> + fallthrough;
>> #endif
>> case 'k':
>> strncpy_from_kernel_nofault(buf, unsafe_ptr, bufsz);
>
> Another user of strncpy_from_unsafe() has popped up in linux-next's
> bpf. I did the below, but didn't try very hard - it's probably wrong
> if CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE=n?
>
> Anyway, please take a look at all the bpf_trace.c changes in
> linux-next.
>
>
> From: Andrew Morton <[email protected]>
> Subject: bpf:bpf_seq_printf(): handle potentially unsafe format string better
>
> User the proper helper for kernel or userspace addresses based on
> TASK_SIZE instead of the dangerous strncpy_from_unsafe function.
>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Alexei Starovoitov <[email protected]>
> Cc: Daniel Borkmann <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Masami Hiramatsu <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
> kernel/trace/bpf_trace.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> --- a/kernel/trace/bpf_trace.c~xxx
> +++ a/kernel/trace/bpf_trace.c
> @@ -588,15 +588,22 @@ BPF_CALL_5(bpf_seq_printf, struct seq_fi
> }
>
> if (fmt[i] == 's') {
> + void *unsafe_ptr;
> +
> /* try our best to copy */
> if (memcpy_cnt >= MAX_SEQ_PRINTF_MAX_MEMCPY) {
> err = -E2BIG;
> goto out;
> }
>
> - err = strncpy_from_unsafe(bufs->buf[memcpy_cnt],
> - (void *) (long) args[fmt_cnt],
> - MAX_SEQ_PRINTF_STR_LEN);
> + unsafe_ptr = (void *)(long)args[fmt_cnt];
> + if ((unsigned long)unsafe_ptr < TASK_SIZE) {
> + err = strncpy_from_user_nofault(
> + bufs->buf[memcpy_cnt], unsafe_ptr,
> + MAX_SEQ_PRINTF_STR_LEN);
> + } else {
> + err = -EFAULT;
> + }

This probably not right.
The pointer stored at args[fmt_cnt] is a kernel pointer,
but it could be an invalid address and we do not want to fault.
Not sure whether it exists or not, we should use
strncpy_from_kernel_nofault()?

> if (err < 0)
> bufs->buf[memcpy_cnt][0] = '\0';
> params[fmt_cnt] = (u64)(long)bufs->buf[memcpy_cnt];
> _
>

2020-05-28 04:43:56

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 12/23] bpf: handle the compat string in bpf_trace_copy_string better

On Wed, May 27, 2020 at 07:26:30PM -0700, Yonghong Song wrote:
>> --- a/kernel/trace/bpf_trace.c~xxx
>> +++ a/kernel/trace/bpf_trace.c
>> @@ -588,15 +588,22 @@ BPF_CALL_5(bpf_seq_printf, struct seq_fi
>> }
>> if (fmt[i] == 's') {
>> + void *unsafe_ptr;
>> +
>> /* try our best to copy */
>> if (memcpy_cnt >= MAX_SEQ_PRINTF_MAX_MEMCPY) {
>> err = -E2BIG;
>> goto out;
>> }
>> - err = strncpy_from_unsafe(bufs->buf[memcpy_cnt],
>> - (void *) (long) args[fmt_cnt],
>> - MAX_SEQ_PRINTF_STR_LEN);
>> + unsafe_ptr = (void *)(long)args[fmt_cnt];
>> + if ((unsigned long)unsafe_ptr < TASK_SIZE) {
>> + err = strncpy_from_user_nofault(
>> + bufs->buf[memcpy_cnt], unsafe_ptr,
>> + MAX_SEQ_PRINTF_STR_LEN);
>> + } else {
>> + err = -EFAULT;
>> + }
>
> This probably not right.
> The pointer stored at args[fmt_cnt] is a kernel pointer,
> but it could be an invalid address and we do not want to fault.
> Not sure whether it exists or not, we should use
> strncpy_from_kernel_nofault()?

If you know it is a kernel pointer with this series it should be
strncpy_from_kernel_nofault. But even before the series it should have
been strncpy_from_unsafe_strict.

2020-05-28 17:12:48

by Yonghong Song

[permalink] [raw]
Subject: Re: [PATCH 12/23] bpf: handle the compat string in bpf_trace_copy_string better



On 5/27/20 9:39 PM, Christoph Hellwig wrote:
> On Wed, May 27, 2020 at 07:26:30PM -0700, Yonghong Song wrote:
>>> --- a/kernel/trace/bpf_trace.c~xxx
>>> +++ a/kernel/trace/bpf_trace.c
>>> @@ -588,15 +588,22 @@ BPF_CALL_5(bpf_seq_printf, struct seq_fi
>>> }
>>> if (fmt[i] == 's') {
>>> + void *unsafe_ptr;
>>> +
>>> /* try our best to copy */
>>> if (memcpy_cnt >= MAX_SEQ_PRINTF_MAX_MEMCPY) {
>>> err = -E2BIG;
>>> goto out;
>>> }
>>> - err = strncpy_from_unsafe(bufs->buf[memcpy_cnt],
>>> - (void *) (long) args[fmt_cnt],
>>> - MAX_SEQ_PRINTF_STR_LEN);
>>> + unsafe_ptr = (void *)(long)args[fmt_cnt];
>>> + if ((unsigned long)unsafe_ptr < TASK_SIZE) {
>>> + err = strncpy_from_user_nofault(
>>> + bufs->buf[memcpy_cnt], unsafe_ptr,
>>> + MAX_SEQ_PRINTF_STR_LEN);
>>> + } else {
>>> + err = -EFAULT;
>>> + }
>>
>> This probably not right.
>> The pointer stored at args[fmt_cnt] is a kernel pointer,
>> but it could be an invalid address and we do not want to fault.
>> Not sure whether it exists or not, we should use
>> strncpy_from_kernel_nofault()?
>
> If you know it is a kernel pointer with this series it should be
> strncpy_from_kernel_nofault. But even before the series it should have
> been strncpy_from_unsafe_strict.

The use of strncpy_from_unsafe() mimics old bpf_trace_printk()
implementation which just changed to _strict version:
https://lkml.org/lkml/2020/5/18/1309

Agreed that we should change to strncpy_from_unsafe_strict().
I can submit a patch for this.

Thanks!