This series adds tests for the usercopy functions, which have found a
few issues on most architectures I've tested on. The last two patches
are fixes for arm64 (which pass all the tests, and are at least as good
performance-wise as the existing implementation).
Arch maintainers: if you are Cc'd on patches 1 or 2, I believe that your
architecture has an issue (which should be called out explicitly in the
commit message). I've only been able to test on some architectures, so
other architectures may be affected.
Al, Linus: could you double-check that I have the requirements right?
I'm going by Al's prior message at:
https://lore.kernel.org/all/[email protected]/
Which mentions the commentary in include/linux/uaccess.h:
* If raw_copy_{to,from}_user(to, from, size) returns N, size - N bytes starting
* at to must become equal to the bytes fetched from the corresponding area
* starting at from. All data past to + size - N must be left unmodified.
*
* If copying succeeds, the return value must be 0. If some data cannot be
* fetched, it is permitted to copy less than had been fetched; the only
* hard requirement is that not storing anything at all (i.e. returning size)
* should happen only when nothing could be copied. In other words, you don't
* have to squeeze as much as possible - it is allowed, but not necessary.
I've made two additional assumptions:
1) That a usercopy in a non-atomic context shouldn't have a partial
failure if it's possible for it to succeed completely. i.e. callers
don't need to call it in a loop if they only care about it entirely
succeeding or failing.
Code all over the place seems to rely upon that (e.g. signal code
using copy_to_user() to place signal frames), so I assume this must
be the case or we have bigger issues.
2) That clear_user() has the same requirements, given it has the same
rough shape as raw_copy_{to,from}_user(), and I believe it's used in
the same way in the iov_iter code (but I could be mistaken).
Since v1 [1]:
* Trivial rebase to v6.3-rc3
* Simplify the test harness
* Improve comments
* Improve commit messages
* Expand Cc list
[1] https://lore.kernel.org/lkml/[email protected]/
Thanks,
Mark.
Mark Rutland (4):
lib: test copy_{to,from}_user()
lib: test clear_user()
arm64: fix __raw_copy_to_user semantics
arm64: fix clear_user() semantics
arch/arm64/lib/clear_user.S | 195 ++++++++++---
arch/arm64/lib/copy_to_user.S | 203 +++++++++++---
lib/Kconfig.debug | 9 +
lib/Makefile | 1 +
lib/usercopy_kunit.c | 506 ++++++++++++++++++++++++++++++++++
5 files changed, 834 insertions(+), 80 deletions(-)
create mode 100644 lib/usercopy_kunit.c
--
2.30.2
Historically, implementations of raw_copy_{to,from}_user() haven't
correctly handled the requirements laid forth in <linux/uaccess.h>, e.g.
as Al spotted on arm64:
https://lore.kernel.org/all/[email protected]/
Similarly, we've had other issues where incorrect fault handling logic
lead to memory corruption, e.g.
https://lore.kernel.org/linux-arm-kernel/[email protected]/
These issues are easy to introduce and hard to spot as we don't have any
systematic tests for the faulting paths.
This patch adds KUnit tests for raw_copy_{to,from}_user() which
systematically test the faulting paths, allowing us to detect and avoid
problems such as those above. The test checks copy sizes of 0 to 128
bytes at every possible alignment relative to leading/trailing page
boundaries to exhaustively check faulting and success paths.
I've given this a spin on a few architectures using the KUnit QEMU
harness, and it looks like most get *something* wrong. From those
initial runs:
* arm64's copy_to_user() under-reports the number of bytes copied in
some cases, e.g.
| post-destination bytes modified (dst_page[4082]=0x1, offset=4081, size=16, ret=15)
* arm's copy_to_user() under-reports the number of bytes copied in some
cases, and both copy_to_user() and copy_from_user() don't guarantee
that at least a single byte is copied when a partial copy is possible,
e.g.
| post-destination bytes modified (dst_page[4068]=0x1, offset=4067, size=33, ret=32)
| too few bytes consumed (offset=4068, size=32, ret=32)
* i386's copy_from_user does not guarantee that at least a single byte
is copied when a partial copit is possible, e.g.
| too few bytes consumed (offset=4093, size=8, ret=8)
* riscv's copy_to_user() and copy_from_user() don't guarantee that at
least a single byte is copied when a partial copy is possible, e.g.
| too few bytes consumed (offset=4095, size=2, ret=2)
* s390 passes all tests
* sparc's copy_from_user() over-reports the number of bbytes copied in
some caes, e.g.
| copied bytes incorrect (dst_page[0+0]=0x0, src_page[4095+0]=0xff, offset=4095, size=2, ret=1
* x86_64 passes all tests
Signed-off-by: Mark Rutland <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
---
lib/Kconfig.debug | 9 +
lib/Makefile | 1 +
lib/usercopy_kunit.c | 431 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 441 insertions(+)
create mode 100644 lib/usercopy_kunit.c
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c8b379e2e9adc..bd737db4ef06a 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2647,6 +2647,15 @@ config SIPHASH_KUNIT_TEST
This is intended to help people writing architecture-specific
optimized versions. If unsure, say N.
+config USERCOPY_KUNIT_TEST
+ bool "KUnit tests for usercopy functions" if !KUNIT_ALL_TESTS
+ depends on KUNIT=y
+ default KUNIT_ALL_TESTS
+ help
+ Tests for faulting behaviour of copy_{to,from}_user().
+
+ If unsure, say N.
+
config TEST_UDELAY
tristate "udelay test driver"
help
diff --git a/lib/Makefile b/lib/Makefile
index baf2821f7a00f..37fb438e6c433 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -391,6 +391,7 @@ CFLAGS_fortify_kunit.o += $(DISABLE_STRUCTLEAK_PLUGIN)
obj-$(CONFIG_FORTIFY_KUNIT_TEST) += fortify_kunit.o
obj-$(CONFIG_STRSCPY_KUNIT_TEST) += strscpy_kunit.o
obj-$(CONFIG_SIPHASH_KUNIT_TEST) += siphash_kunit.o
+obj-$(CONFIG_USERCOPY_KUNIT_TEST) += usercopy_kunit.o
obj-$(CONFIG_GENERIC_LIB_DEVMEM_IS_ALLOWED) += devmem_is_allowed.o
diff --git a/lib/usercopy_kunit.c b/lib/usercopy_kunit.c
new file mode 100644
index 0000000000000..45983952cc079
--- /dev/null
+++ b/lib/usercopy_kunit.c
@@ -0,0 +1,431 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test cases for usercopy faulting behaviour
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <kunit/test.h>
+
+#include <linux/highmem.h>
+#include <linux/kthread.h>
+#include <linux/mm.h>
+#include <linux/sched/mm.h>
+#include <linux/uaccess.h>
+#include <linux/vmalloc.h>
+
+/*
+ * Arbitrarily chosen user address for the test page.
+ */
+#define UBUF_ADDR_BASE SZ_2M
+
+struct usercopy_env {
+ struct mm_struct *mm;
+ void *kbuf;
+ struct page *ubuf_page;
+ void *ubuf;
+};
+
+struct usercopy_params {
+ long offset;
+ unsigned long size;
+};
+
+static void *usercopy_env_alloc(void)
+{
+ struct usercopy_env *env = kzalloc(sizeof(*env), GFP_KERNEL);
+ if (!env)
+ return NULL;
+
+ env->kbuf = vmalloc(PAGE_SIZE);
+ if (!env->kbuf)
+ goto out_free_env;
+
+ return env;
+
+out_free_env:
+ kfree(env);
+ return NULL;
+}
+
+static void usercopy_env_free(struct usercopy_env *env)
+{
+ vfree(env->kbuf);
+ kfree(env);
+}
+
+static void *usercopy_mm_alloc(struct usercopy_env *env)
+{
+ struct mm_struct *mm;
+ struct vm_area_struct *vma;
+ mm = mm_alloc();
+ if (!mm)
+ return NULL;
+
+ if (mmap_write_lock_killable(mm))
+ goto out_free;
+
+ vma = vm_area_alloc(mm);
+ if (!vma)
+ goto out_unlock;
+
+ vma_set_anonymous(vma);
+ vma->vm_start = UBUF_ADDR_BASE;
+ vma->vm_end = UBUF_ADDR_BASE + PAGE_SIZE;
+ vm_flags_init(vma, VM_READ | VM_MAYREAD | VM_WRITE | VM_MAYWRITE);
+ vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+
+ if (insert_vm_struct(mm, vma))
+ goto out_free_vma;
+
+ mmap_write_unlock(mm);
+ return mm;
+
+out_free_vma:
+ vm_area_free(vma);
+out_unlock:
+ mmap_write_unlock(mm);
+out_free:
+ mmput(mm);
+ return NULL;
+}
+
+static void usercopy_mm_free(struct mm_struct *mm)
+{
+ mmput(mm);
+}
+
+static struct page *usercopy_ubuf_pin(struct usercopy_env *env)
+{
+ struct page *p = NULL;
+
+ kthread_use_mm(env->mm);
+ pin_user_pages_unlocked(UBUF_ADDR_BASE, 1, &p, FOLL_LONGTERM);
+ kthread_unuse_mm(env->mm);
+
+ return p;
+}
+
+static void usercopy_ubuf_unpin(struct usercopy_env *env)
+{
+ unpin_user_page(env->ubuf_page);
+}
+
+static int usercopy_test_init(struct kunit *test)
+{
+ struct usercopy_env *env;
+
+ env = usercopy_env_alloc();
+ if (!env)
+ return -ENOMEM;
+
+ env->mm = usercopy_mm_alloc(env);
+ if (!env->mm)
+ goto out_free_env;
+
+ env->ubuf_page = usercopy_ubuf_pin(env);
+ if (!env->ubuf_page)
+ goto out_free_mm;
+
+ env->ubuf = kmap(env->ubuf_page);
+ if (!env->ubuf)
+ goto out_unpin_ubuf;
+
+ test->priv = env;
+
+ return 0;
+
+out_unpin_ubuf:
+ usercopy_ubuf_unpin(env);
+out_free_mm:
+ usercopy_mm_free(env->mm);
+out_free_env:
+ usercopy_env_free(env);
+ return -ENOMEM;
+}
+
+static void usercopy_test_exit(struct kunit *test)
+{
+ struct usercopy_env *env = test->priv;
+
+ kunmap(env->ubuf);
+
+ usercopy_ubuf_unpin(env);
+ usercopy_mm_free(env->mm);
+ usercopy_env_free(env);
+}
+
+static char buf_pattern(int offset)
+{
+ return offset & 0xff;
+}
+
+static void buf_init_pattern(char *buf)
+{
+ for (int i = 0; i < PAGE_SIZE; i++)
+ buf[i] = buf_pattern(i);
+}
+
+static void buf_init_zero(char *buf)
+{
+ memset(buf, 0, PAGE_SIZE);
+}
+
+static void assert_size_valid(struct kunit *test,
+ const struct usercopy_params *params,
+ unsigned long ret)
+{
+ const unsigned long size = params->size;
+ const unsigned long offset = params->offset;
+
+ if (ret > size) {
+ KUNIT_ASSERT_FAILURE(test,
+ "return value is impossibly large (offset=%ld, size=%lu, ret=%lu)",
+ offset, size, ret);
+ }
+
+ /*
+ * When the user buffer starts within a faulting page, no bytes can be
+ * copied, so ret must equal size.
+ */
+ if (offset < 0 || offset >= PAGE_SIZE) {
+ if (ret == size)
+ return;
+
+ KUNIT_ASSERT_FAILURE(test,
+ "impossible copy did not fail (offset=%ld, size=%lu, ret=%lu)",
+ offset, size, ret);
+ }
+
+ /*
+ * When the user buffer fits entirely within a non-faulting page, all
+ * bytes must be copied, so ret must equal 0.
+ */
+ if (offset + size <= PAGE_SIZE) {
+ if (ret == 0)
+ return;
+
+ KUNIT_ASSERT_FAILURE(test,
+ "completely possible copy failed (offset=%ld, size=%lu, ret=%lu)",
+ offset, size, ret);
+ }
+
+ /*
+ * The buffer starts in a non-faulting page, but continues into a
+ * faulting page. At least one byte must be copied, and at most all the
+ * non-faulting bytes may be copied.
+ */
+ if (ret == size) {
+ KUNIT_ASSERT_FAILURE(test,
+ "too few bytes consumed (offset=%ld, size=%lu, ret=%lu)",
+ offset, size, ret);
+ }
+
+ if (ret < (offset + size) - PAGE_SIZE) {
+ KUNIT_ASSERT_FAILURE(test,
+ "too many bytes consumed (offset=%ld, size=%lu, ret=%lu)",
+ offset, size, ret);
+ }
+}
+
+static void assert_src_valid(struct kunit *test,
+ const struct usercopy_params *params,
+ const char *src, long src_offset,
+ unsigned long ret)
+{
+ const unsigned long size = params->size;
+ const unsigned long offset = params->offset;
+
+ /*
+ * A usercopy MUST NOT modify the source buffer.
+ */
+ for (int i = 0; i < PAGE_SIZE; i++) {
+ char val = src[i];
+
+ if (val == buf_pattern(i))
+ continue;
+
+ KUNIT_ASSERT_FAILURE(test,
+ "source bytes modified (src[%d]=0x%x, offset=%ld, size=%lu, ret=%lu)",
+ i, offset, size, ret);
+ }
+}
+
+static void assert_dst_valid(struct kunit *test,
+ const struct usercopy_params *params,
+ const char *dst, long dst_offset,
+ unsigned long ret)
+{
+ const unsigned long size = params->size;
+ const unsigned long offset = params->offset;
+
+ /*
+ * A usercopy MUST NOT modify any bytes before the destination buffer.
+ */
+ for (int i = 0; i < dst_offset; i++) {
+ char val = dst[i];
+
+ if (val == 0)
+ continue;
+
+ KUNIT_ASSERT_FAILURE(test,
+ "pre-destination bytes modified (dst_page[%d]=0x%x, offset=%ld, size=%lu, ret=%lu)",
+ i, val, offset, size, ret);
+ }
+
+ /*
+ * A usercopy MUST NOT modify any bytes after the end of the destination
+ * buffer.
+ */
+ for (int i = dst_offset + size - ret; i < PAGE_SIZE; i++) {
+ char val = dst[i];
+
+ if (val == 0)
+ continue;
+
+ KUNIT_ASSERT_FAILURE(test,
+ "post-destination bytes modified (dst_page[%d]=0x%x, offset=%ld, size=%lu, ret=%lu)",
+ i, val, offset, size, ret);
+ }
+}
+
+static void assert_copy_valid(struct kunit *test,
+ const struct usercopy_params *params,
+ const char *dst, long dst_offset,
+ const char *src, long src_offset,
+ unsigned long ret)
+{
+ const unsigned long size = params->size;
+ const unsigned long offset = params->offset;
+
+ /*
+ * Have we actually copied the bytes we expected to?
+ */
+ for (int i = 0; i < params->size - ret; i++) {
+ char dst_val = dst[dst_offset + i];
+ char src_val = src[src_offset + i];
+
+ if (dst_val == src_val)
+ continue;
+
+ KUNIT_ASSERT_FAILURE(test,
+ "copied bytes incorrect (dst_page[%ld+%d]=0x%x, src_page[%ld+%d]=0x%x, offset=%ld, size=%lu, ret=%lu",
+ dst_offset, i, dst_val,
+ src_offset, i, src_val,
+ offset, size, ret);
+ }
+}
+
+static unsigned long do_copy_to_user(const struct usercopy_env *env,
+ const struct usercopy_params *params)
+{
+ void __user *uptr = (void __user *)UBUF_ADDR_BASE + params->offset;
+ void *kptr = env->kbuf;
+ unsigned long ret;
+
+ kthread_use_mm(env->mm);
+ ret = raw_copy_to_user(uptr, kptr, params->size);
+ kthread_unuse_mm(env->mm);
+
+ return ret;
+}
+
+static unsigned long do_copy_from_user(const struct usercopy_env *env,
+ const struct usercopy_params *params)
+{
+ void __user *uptr = (void __user *)UBUF_ADDR_BASE + params->offset;
+ void *kptr = env->kbuf;
+ unsigned long ret;
+
+ kthread_use_mm(env->mm);
+ ret = raw_copy_from_user(kptr, uptr, params->size);
+ kthread_unuse_mm(env->mm);
+
+ return ret;
+}
+
+/*
+ * Generate the size and offset combinations to test.
+ *
+ * Usercopies may have different paths for small/medium/large copies, but
+ * typically max out at 64 byte chunks. We test sizes 0 to 128 bytes to check
+ * all combinations of leading/trailing chunks and bulk copies.
+ *
+ * For each size, we test every offset relative to a leading and trailing page
+ * boundary (i.e. [size, 0] and [PAGE_SIZE - size, PAGE_SIZE]) to check every
+ * possible faulting boundary.
+ */
+#define for_each_size_offset(size, offset) \
+ for (unsigned long size = 0; size <= 128; size++) \
+ for (long offset = -size; \
+ offset <= (long)PAGE_SIZE; \
+ offset = (offset ? offset + 1: (PAGE_SIZE - size)))
+
+static void test_copy_to_user(struct kunit *test)
+{
+ const struct usercopy_env *env = test->priv;
+
+ for_each_size_offset(size, offset) {
+ const struct usercopy_params params = {
+ .size = size,
+ .offset = offset,
+ };
+ unsigned long ret;
+
+ buf_init_zero(env->ubuf);
+ buf_init_pattern(env->kbuf);
+
+ ret = do_copy_to_user(env, ¶ms);
+
+ assert_size_valid(test, ¶ms, ret);
+ assert_src_valid(test, ¶ms, env->kbuf, 0, ret);
+ assert_dst_valid(test, ¶ms, env->ubuf, params.offset, ret);
+ assert_copy_valid(test, ¶ms,
+ env->ubuf, params.offset,
+ env->kbuf, 0,
+ ret);
+ }
+}
+
+static void test_copy_from_user(struct kunit *test)
+{
+ const struct usercopy_env *env = test->priv;
+
+ for_each_size_offset(size, offset) {
+ const struct usercopy_params params = {
+ .size = size,
+ .offset = offset,
+ };
+ unsigned long ret;
+
+ buf_init_zero(env->kbuf);
+ buf_init_pattern(env->ubuf);
+
+ ret = do_copy_from_user(env, ¶ms);
+
+ assert_size_valid(test, ¶ms, ret);
+ assert_src_valid(test, ¶ms, env->ubuf, params.offset, ret);
+ assert_dst_valid(test, ¶ms, env->kbuf, 0, ret);
+ assert_copy_valid(test, ¶ms,
+ env->kbuf, 0,
+ env->ubuf, params.offset,
+ ret);
+ }
+}
+
+static struct kunit_case usercopy_cases[] = {
+ KUNIT_CASE(test_copy_to_user),
+ KUNIT_CASE(test_copy_from_user),
+ { /* sentinel */ }
+};
+
+static struct kunit_suite usercopy_suite = {
+ .name = "usercopy",
+ .init = usercopy_test_init,
+ .exit = usercopy_test_exit,
+ .test_cases = usercopy_cases,
+};
+kunit_test_suites(&usercopy_suite);
+
+MODULE_AUTHOR("Mark Rutland <[email protected]>");
+MODULE_LICENSE("GPL v2");
--
2.30.2
The clear_user() function follows the same conventions as
copy_{to,from}_user(), and presumably has identical requirements on the
return value. Test it in the same way.
I've given this a spin on a few architectures using the KUnit QEMU
harness, and it looks like most get *something* wrong, or I've
misunderstood and clear_user() doesn't have the same requirements as
copy_{to,from}_user()). From those initial runs:
* arm, arm64, i386, riscv, x86_64 don't ensure that at least 1 byte is
zeroed when a partial zeroing is possible, e.g.
| too few bytes consumed (offset=4095, size=2, ret=2)
| too few bytes consumed (offset=4093, size=4, ret=4)
| too few bytes consumed (offset=4089, size=8, ret=8)
* s390 reports that some bytes have been zeroed even when they haven't,
e.g.
| zeroed bytes incorrect (dst_page[4031+64]=0xca, offset=4031, size=66, ret=1
* sparc passses all tests
Signed-off-by: Mark Rutland <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
---
lib/usercopy_kunit.c | 89 ++++++++++++++++++++++++++++++++++++++++----
1 file changed, 82 insertions(+), 7 deletions(-)
diff --git a/lib/usercopy_kunit.c b/lib/usercopy_kunit.c
index 45983952cc079..1ec0d5bbc179a 100644
--- a/lib/usercopy_kunit.c
+++ b/lib/usercopy_kunit.c
@@ -155,6 +155,11 @@ static void usercopy_test_exit(struct kunit *test)
usercopy_env_free(env);
}
+static char buf_zero(int offset)
+{
+ return 0;
+}
+
static char buf_pattern(int offset)
{
return offset & 0xff;
@@ -230,6 +235,7 @@ static void assert_size_valid(struct kunit *test,
static void assert_src_valid(struct kunit *test,
const struct usercopy_params *params,
+ char (*buf_expected)(int),
const char *src, long src_offset,
unsigned long ret)
{
@@ -240,9 +246,10 @@ static void assert_src_valid(struct kunit *test,
* A usercopy MUST NOT modify the source buffer.
*/
for (int i = 0; i < PAGE_SIZE; i++) {
+ char expected = buf_expected(i);
char val = src[i];
- if (val == buf_pattern(i))
+ if (val == expected)
continue;
KUNIT_ASSERT_FAILURE(test,
@@ -253,6 +260,7 @@ static void assert_src_valid(struct kunit *test,
static void assert_dst_valid(struct kunit *test,
const struct usercopy_params *params,
+ char (*buf_expected)(int),
const char *dst, long dst_offset,
unsigned long ret)
{
@@ -263,9 +271,10 @@ static void assert_dst_valid(struct kunit *test,
* A usercopy MUST NOT modify any bytes before the destination buffer.
*/
for (int i = 0; i < dst_offset; i++) {
+ char expected = buf_expected(i);
char val = dst[i];
- if (val == 0)
+ if (val == expected)
continue;
KUNIT_ASSERT_FAILURE(test,
@@ -278,9 +287,10 @@ static void assert_dst_valid(struct kunit *test,
* buffer.
*/
for (int i = dst_offset + size - ret; i < PAGE_SIZE; i++) {
+ char expected = buf_expected(i);
char val = dst[i];
- if (val == 0)
+ if (val == expected)
continue;
KUNIT_ASSERT_FAILURE(test,
@@ -316,6 +326,29 @@ static void assert_copy_valid(struct kunit *test,
}
}
+static void assert_clear_valid(struct kunit *test,
+ const struct usercopy_params *params,
+ const char *dst, long dst_offset,
+ unsigned long ret)
+{
+ const unsigned long size = params->size;
+ const unsigned long offset = params->offset;
+
+ /*
+ * Have we actually zeroed the bytes we expected to?
+ */
+ for (int i = 0; i < params->size - ret; i++) {
+ char dst_val = dst[dst_offset + i];
+
+ if (dst_val == 0)
+ continue;
+
+ KUNIT_ASSERT_FAILURE(test,
+ "zeroed bytes incorrect (dst_page[%ld+%d]=0x%x, offset=%ld, size=%lu, ret=%lu",
+ dst_offset, i, dst_val,
+ offset, size, ret);
+ }
+}
static unsigned long do_copy_to_user(const struct usercopy_env *env,
const struct usercopy_params *params)
{
@@ -344,6 +377,19 @@ static unsigned long do_copy_from_user(const struct usercopy_env *env,
return ret;
}
+static unsigned long do_clear_user(const struct usercopy_env *env,
+ const struct usercopy_params *params)
+{
+ void __user *uptr = (void __user *)UBUF_ADDR_BASE + params->offset;
+ unsigned long ret;
+
+ kthread_use_mm(env->mm);
+ ret = clear_user(uptr, params->size);
+ kthread_unuse_mm(env->mm);
+
+ return ret;
+}
+
/*
* Generate the size and offset combinations to test.
*
@@ -378,8 +424,10 @@ static void test_copy_to_user(struct kunit *test)
ret = do_copy_to_user(env, ¶ms);
assert_size_valid(test, ¶ms, ret);
- assert_src_valid(test, ¶ms, env->kbuf, 0, ret);
- assert_dst_valid(test, ¶ms, env->ubuf, params.offset, ret);
+ assert_src_valid(test, ¶ms, buf_pattern,
+ env->kbuf, 0, ret);
+ assert_dst_valid(test, ¶ms, buf_zero,
+ env->ubuf, params.offset, ret);
assert_copy_valid(test, ¶ms,
env->ubuf, params.offset,
env->kbuf, 0,
@@ -404,8 +452,10 @@ static void test_copy_from_user(struct kunit *test)
ret = do_copy_from_user(env, ¶ms);
assert_size_valid(test, ¶ms, ret);
- assert_src_valid(test, ¶ms, env->ubuf, params.offset, ret);
- assert_dst_valid(test, ¶ms, env->kbuf, 0, ret);
+ assert_src_valid(test, ¶ms, buf_pattern,
+ env->ubuf, params.offset, ret);
+ assert_dst_valid(test, ¶ms, buf_zero,
+ env->kbuf, 0, ret);
assert_copy_valid(test, ¶ms,
env->kbuf, 0,
env->ubuf, params.offset,
@@ -413,9 +463,34 @@ static void test_copy_from_user(struct kunit *test)
}
}
+static void test_clear_user(struct kunit *test)
+{
+ const struct usercopy_env *env = test->priv;
+
+ for_each_size_offset(size, offset) {
+ const struct usercopy_params params = {
+ .size = size,
+ .offset = offset,
+ };
+ unsigned long ret;
+
+ buf_init_pattern(env->ubuf);
+
+ ret = do_clear_user(env, ¶ms);
+
+ assert_size_valid(test, ¶ms, ret);
+ assert_dst_valid(test, ¶ms, buf_pattern,
+ env->ubuf, params.offset, ret);
+ assert_clear_valid(test, ¶ms,
+ env->ubuf, params.offset,
+ ret);
+ }
+}
+
static struct kunit_case usercopy_cases[] = {
KUNIT_CASE(test_copy_to_user),
KUNIT_CASE(test_copy_from_user),
+ KUNIT_CASE(test_clear_user),
{ /* sentinel */ }
};
--
2.30.2
For some combinations of sizes and alignments __{arch,raw}_copy_to_user
will copy some bytes between (to + size - N) and (to + size), but will
never modify bytes past (to + size).
This violates the documentation in <linux/uaccess.h>, which states:
> If raw_copy_{to,from}_user(to, from, size) returns N, size - N bytes
> starting at to must become equal to the bytes fetched from the
> corresponding area starting at from. All data past to + size - N must
> be left unmodified.
This can be demonstrated through testing, e.g.
| # test_copy_to_user: EXPECTATION FAILED at lib/usercopy_kunit.c:287
| post-destination bytes modified (dst_page[4082]=0x1, offset=4081, size=16, ret=15)
| [FAILED] 16 byte copy
This happens because the __arch_copy_to_user() can make unaligned stores
to the userspace buffer, and the ARM architecture permits (but does not
require) that such unaligned stores write some bytes before raising a
fault (per ARM DDI 0487I.a Section B2.2.1 and Section B2.7.1). The
extable fixup handlers in __arch_copy_to_user() assume that any faulting
store has failed entirely, and so under-report the number of bytes
copied when an unaligned store writes some bytes before faulting.
The only architecturally guaranteed way to avoid this is to only use
aligned stores to write to user memory. This patch rewrites
__arch_copy_to_user() to only access the user buffer with aligned
stores, such that the bytes written can always be determined reliably.
For correctness, I've tested this exhaustively for sizes 0 to 128
against every possible alignment relative to a leading and trailing page
boundary. I've also boot tested and run a few kernel builds with the new
implementations.
For performance, I've benchmarked this on a variety of CPU
implementations, and across the board this appears at least as good as
(or marginally better than) the current implementation of
copy_to_user(). Timing a kernel build indicates the same, though the
difference is very close to noise.
We do not have a similar bug in __{arch,raw}_copy_from_user(), as faults
taken on loads from user memory do not have side-effects. We do have
similar issues in __arch_clear_user(), which will be addresssed in a
subsequent patch.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/lib/copy_to_user.S | 202 +++++++++++++++++++++++++++-------
1 file changed, 161 insertions(+), 41 deletions(-)
diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S
index 8022317726085..fa603487e8571 100644
--- a/arch/arm64/lib/copy_to_user.S
+++ b/arch/arm64/lib/copy_to_user.S
@@ -9,6 +9,14 @@
#include <asm/assembler.h>
#include <asm/cache.h>
+#define USER_OFF(off, x...) USER(fixup_offset_##off, x)
+
+#define FIXUP_OFFSET(n) \
+fixup_offset_##n: \
+ sub x0, x3, x0; \
+ sub x0, x0, n; \
+ ret
+
/*
* Copy to user space from a kernel buffer (alignment handled by the hardware)
*
@@ -18,56 +26,168 @@
* x2 - n
* Returns:
* x0 - bytes not copied
+ *
+ * Unlike a memcpy(), we need to handle faults on user addresses, and we need
+ * to precisely report the number of bytes (not) copied. We must only use
+ * aligned single-copy-atomic stores to write to user memory, as stores which
+ * are not single-copy-atomic (e.g. unaligned stores, STP, ASMID stores) can be
+ * split into separate byte accesses (per ARM DDI 0487I.a, Section B2.2.1) and
+ * some arbitatrary set of those byte accesses might occur prior to a fault
+ * being raised (per per ARM DDI 0487I.a, Section B2.7.1).
+ *
+ * We use STTR to write to user memory, which has 1/2/4/8 byte forms, and the
+ * user address ('to') might have arbitrary alignment, so we must handle
+ * misalignment up to 8 bytes.
*/
- .macro ldrb1 reg, ptr, val
- ldrb \reg, [\ptr], \val
- .endm
+SYM_FUNC_START(__arch_copy_to_user)
+ /*
+ * The end address. Fixup handlers will use this to calculate
+ * the number of bytes copied.
+ */
+ add x3, x0, x2
- .macro strb1 reg, ptr, val
- user_ldst 9998f, sttrb, \reg, \ptr, \val
- .endm
+ /*
+ * Tracing of a kernel build indicates that for the vast
+ * majority of calls to copy_to_user(), 'to' is aligned to 8
+ * bytes. When this is the case, we want to skip to the bulk
+ * copy as soon as possible.
+ */
+ ands x4, x0, 0x7
+ b.eq body
- .macro ldrh1 reg, ptr, val
- ldrh \reg, [\ptr], \val
- .endm
+ /*
+ * For small unaligned copies, it's not worth trying to be
+ * optimal.
+ */
+ cmp x2, #8
+ b.lo bytewise_loop
- .macro strh1 reg, ptr, val
- user_ldst 9997f, sttrh, \reg, \ptr, \val
- .endm
+ /*
+ * Calculate the distance to the next 8-byte boundary.
+ */
+ mov x5, #8
+ sub x4, x5, x4
- .macro ldr1 reg, ptr, val
- ldr \reg, [\ptr], \val
- .endm
+SYM_INNER_LABEL(head_realign_1b, SYM_L_LOCAL)
+ tbz x4, #0, head_realign_2b
- .macro str1 reg, ptr, val
- user_ldst 9997f, sttr, \reg, \ptr, \val
- .endm
+ ldrb w8, [x1], #1
+USER_OFF(0, sttrb w8, [x0])
+ add x0, x0, #1
- .macro ldp1 reg1, reg2, ptr, val
- ldp \reg1, \reg2, [\ptr], \val
- .endm
+SYM_INNER_LABEL(head_realign_2b, SYM_L_LOCAL)
+ tbz x4, #1, head_realign_4b
- .macro stp1 reg1, reg2, ptr, val
- user_stp 9997f, \reg1, \reg2, \ptr, \val
- .endm
+ ldrh w8, [x1], #2
+USER_OFF(0, sttrh w8, [x0])
+ add x0, x0, #2
-end .req x5
-srcin .req x15
-SYM_FUNC_START(__arch_copy_to_user)
- add end, x0, x2
- mov srcin, x1
-#include "copy_template.S"
- mov x0, #0
- ret
+SYM_INNER_LABEL(head_realign_4b, SYM_L_LOCAL)
+ tbz x4, #2, head_realigned
- // Exception fixups
-9997: cmp dst, dstin
- b.ne 9998f
- // Before being absolutely sure we couldn't copy anything, try harder
- ldrb tmp1w, [srcin]
-USER(9998f, sttrb tmp1w, [dst])
- add dst, dst, #1
-9998: sub x0, end, dst // bytes not copied
- ret
+ ldr w8, [x1], #4
+USER_OFF(0, sttr w8, [x0])
+ add x0, x0, #4
+
+SYM_INNER_LABEL(head_realigned, SYM_L_LOCAL)
+ /*
+ * Any 1-7 byte misalignment has now been fixed; subtract this
+ * misalignment from the remaining size.
+ */
+ sub x2, x2, x4
+
+SYM_INNER_LABEL(body, SYM_L_LOCAL)
+ cmp x2, #64
+ b.lo tail_32b
+
+SYM_INNER_LABEL(body_64b_loop, SYM_L_LOCAL)
+ ldp x8, x9, [x1, #0]
+ ldp x10, x11, [x1, #16]
+ ldp x12, x13, [x1, #32]
+ ldp x14, x15, [x1, #48]
+USER_OFF(0, sttr x8, [x0, #0])
+USER_OFF(8, sttr x9, [x0, #8])
+USER_OFF(16, sttr x10, [x0, #16])
+USER_OFF(24, sttr x11, [x0, #24])
+USER_OFF(32, sttr x12, [x0, #32])
+USER_OFF(40, sttr x13, [x0, #40])
+USER_OFF(48, sttr x14, [x0, #48])
+USER_OFF(56, sttr x15, [x0, #56])
+ add x0, x0, #64
+ add x1, x1, #64
+ sub x2, x2, #64
+
+ cmp x2, #64
+ b.hs body_64b_loop
+
+SYM_INNER_LABEL(tail_32b, SYM_L_LOCAL)
+ tbz x2, #5, tail_16b
+
+ ldp x8, x9, [x1, #0]
+ ldp x10, x11, [x1, #16]
+USER_OFF(0, sttr x8, [x0, #0])
+USER_OFF(8, sttr x9, [x0, #8])
+USER_OFF(16, sttr x10, [x0, #16])
+USER_OFF(24, sttr x11, [x0, #24])
+ add x0, x0, #32
+ add x1, x1, #32
+
+SYM_INNER_LABEL(tail_16b, SYM_L_LOCAL)
+ tbz x2, #4, tail_8b
+
+ ldp x8, x9, [x1], #16
+USER_OFF(0, sttr x8, [x0, #0])
+USER_OFF(8, sttr x9, [x0, #8])
+ add x0, x0, #16
+
+SYM_INNER_LABEL(tail_8b, SYM_L_LOCAL)
+ tbz x2, #3, tail_4b
+
+ ldr x8, [x1], #8
+USER_OFF(0, sttr x8, [x0])
+ add x0, x0, #8
+
+SYM_INNER_LABEL(tail_4b, SYM_L_LOCAL)
+ tbz x2, #2, tail_2b
+
+ ldr w8, [x1], #4
+USER_OFF(0, sttr w8, [x0])
+ add x0, x0, #4
+
+SYM_INNER_LABEL(tail_2b, SYM_L_LOCAL)
+ tbz x2, #1, tail_1b
+
+ ldrh w8, [x1], #2
+USER_OFF(0, sttrh w8, [x0])
+ add x0, x0, #2
+
+SYM_INNER_LABEL(tail_1b, SYM_L_LOCAL)
+ tbz x2, #0, done
+
+ ldrb w8, [x1]
+USER_OFF(0, sttrb w8, [x0])
+
+SYM_INNER_LABEL(done, SYM_L_LOCAL)
+ mov x0, xzr
+ ret
+
+SYM_INNER_LABEL(bytewise_loop, SYM_L_LOCAL)
+ cbz x2, done
+
+ ldrb w8, [x1], #1
+USER_OFF(0, sttrb w8, [x0])
+ add x0, x0, #1
+ sub x2, x2, #1
+
+ b bytewise_loop
+
+FIXUP_OFFSET(0)
+FIXUP_OFFSET(8)
+FIXUP_OFFSET(16)
+FIXUP_OFFSET(24)
+FIXUP_OFFSET(32)
+FIXUP_OFFSET(40)
+FIXUP_OFFSET(48)
+FIXUP_OFFSET(56)
SYM_FUNC_END(__arch_copy_to_user)
EXPORT_SYMBOL(__arch_copy_to_user)
--
2.30.2
Generally, clear_user() is supposed to behave the same way as
raw_copy_{to,from}_user(), which per <linux/uaccess.h> requires:
> If raw_copy_{to,from}_user(to, from, size) returns N, size - N bytes
> starting at to must become equal to the bytes fetched from the
> corresponding area starting at from. All data past to + size - N must
> be left unmodified.
>
> If copying succeeds, the return value must be 0. If some data cannot
> be fetched, it is permitted to copy less than had been fetched; the
> only hard requirement is that not storing anything at all (i.e.
> returning size) should happen only when nothing could be copied. In
> other words, you don't have to squeeze as much as possible - it is
> allowed, but not necessary.
Unfortunately arm64's implementation of clear_user() may fail in cases
where some bytes could be written, which can be demonstrated through
testing, e.g.
| # test_clear_user: ASSERTION FAILED at lib/usercopy_kunit.c:221
| too few bytes consumed (offset=4095, size=2, ret=2)
| [FAILED] 2 byte(s)
As with __{arch,raw}_copy_to_user() there are also a number of potential
other problems where the use of misaligned accesses could result in
under-reporting the number of bytes zeroed.
This patch fixes these issues by replacing the current implementation of
__arch_clear_user() with one derived from the new __arch_copy_to_user().
This only uses aligned stores to write to user memory, and reliably
reports the number of bytes zeroed.
For correctness, I've tested this exhaustively for sizes 0 to 128
against every possible alignment relative to a leading and trailing page
boundary. I've also boot tested and run a few kernel builds with the new
implementation.
For performenace, I've benchmarked reads form /dev/zero and timed kernel
builds before and after this patch, and this seems to be at least as
good (or marginally better than) the current implementation, though the
difference is very close to noise.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/lib/clear_user.S | 195 +++++++++++++++++++++++++++-------
arch/arm64/lib/copy_to_user.S | 1 -
2 files changed, 157 insertions(+), 39 deletions(-)
diff --git a/arch/arm64/lib/clear_user.S b/arch/arm64/lib/clear_user.S
index a5a5f5b97b175..f422c05d84351 100644
--- a/arch/arm64/lib/clear_user.S
+++ b/arch/arm64/lib/clear_user.S
@@ -4,52 +4,171 @@
*/
#include <linux/linkage.h>
+
#include <asm/asm-uaccess.h>
+#include <asm/assembler.h>
+
+#define USER_OFF(off, x...) USER(fixup_offset_##off, x)
- .text
+#define FIXUP_OFFSET(n) \
+fixup_offset_##n: \
+ sub x0, x3, x0; \
+ sub x0, x0, n; \
+ ret
-/* Prototype: int __arch_clear_user(void *addr, size_t sz)
- * Purpose : clear some user memory
- * Params : addr - user memory address to clear
- * : sz - number of bytes to clear
- * Returns : number of bytes NOT cleared
+/*
+ * Write zeroes to a user space buffer
+ *
+ * Parameters:
+ * x0 - to
+ * x1 - n
+ * Returns:
+ * x0 - bytes not copied
*
- * Alignment fixed up by hardware.
+ * Unlike a memset(to, 0, n), we need to handle faults on user addresses, and
+ * we need to precisely report the number of bytes (not) written. We must only
+ * use aligned single-copy-atomic stores to write to user memory, as stores
+ * which are not single-copy-atomic (e.g. unaligned stores, STP, ASMID stores)
+ * can be split into separate byte accesses (per ARM DDI 0487I.a, Section
+ * B2.2.1) and some arbitatrary set of those byte accesses might occur prior to
+ * a fault being raised (per per ARM DDI 0487I.a, Section B2.7.1).
+ *
+ * We use STTR to write to user memory, which has 1/2/4/8 byte forms, and the
+ * user address ('to') might have arbitrary alignment, so we must handle
+ * misalignment up to 8 bytes.
*/
-
- .p2align 4
- // Alignment is for the loop, but since the prologue (including BTI)
- // is also 16 bytes we can keep any padding outside the function
SYM_FUNC_START(__arch_clear_user)
- add x2, x0, x1
- subs x1, x1, #8
- b.mi 2f
-1:
-USER(9f, sttr xzr, [x0])
- add x0, x0, #8
- subs x1, x1, #8
- b.hi 1b
-USER(9f, sttr xzr, [x2, #-8])
- mov x0, #0
- ret
+ /*
+ * The end address. Fixup handlers will use this to calculate
+ * the number of bytes cleared.
+ */
+ add x3, x0, x1
-2: tbz x1, #2, 3f
-USER(9f, sttr wzr, [x0])
-USER(8f, sttr wzr, [x2, #-4])
- mov x0, #0
- ret
+ /*
+ * Tracing of a kernel build indicates that for the vast
+ * majority of calls to copy_to_user(), 'to' is aligned to 8
+ * bytes. When this is the case, we want to skip to the bulk
+ * copy as soon as possible.
+ */
+ ands x4, x0, 0x7
+ b.eq body
-3: tbz x1, #1, 4f
-USER(9f, sttrh wzr, [x0])
-4: tbz x1, #0, 5f
-USER(7f, sttrb wzr, [x2, #-1])
-5: mov x0, #0
- ret
+ /*
+ * For small unaligned copies, it's not worth trying to be
+ * optimal.
+ */
+ cmp x1, #8
+ b.lo bytewise_loop
- // Exception fixups
-7: sub x0, x2, #5 // Adjust for faulting on the final byte...
-8: add x0, x0, #4 // ...or the second word of the 4-7 byte case
-9: sub x0, x2, x0
- ret
+ /*
+ * Calculate the distance to the next 8-byte boundary.
+ */
+ mov x5, #8
+ sub x4, x5, x4
+
+SYM_INNER_LABEL(head_realign_1b, SYM_L_LOCAL)
+ tbz x4, #0, head_realign_2b
+
+USER_OFF(0, sttrb wzr, [x0])
+ add x0, x0, #1
+
+SYM_INNER_LABEL(head_realign_2b, SYM_L_LOCAL)
+ tbz x4, #1, head_realign_4b
+
+USER_OFF(0, sttrh wzr, [x0])
+ add x0, x0, #2
+
+SYM_INNER_LABEL(head_realign_4b, SYM_L_LOCAL)
+ tbz x4, #2, head_realigned
+
+USER_OFF(0, sttr wzr, [x0])
+ add x0, x0, #4
+
+SYM_INNER_LABEL(head_realigned, SYM_L_LOCAL)
+ /*
+ * Any 1-7 byte misalignment has now been fixed; subtract this
+ * misalignment from the remaining size.
+ */
+ sub x1, x1, x4
+
+SYM_INNER_LABEL(body, SYM_L_LOCAL)
+ cmp x1, #64
+ b.lo tail_32b
+
+SYM_INNER_LABEL(body_64b_loop, SYM_L_LOCAL)
+USER_OFF(0, sttr xzr, [x0, #0])
+USER_OFF(8, sttr xzr, [x0, #8])
+USER_OFF(16, sttr xzr, [x0, #16])
+USER_OFF(24, sttr xzr, [x0, #24])
+USER_OFF(32, sttr xzr, [x0, #32])
+USER_OFF(40, sttr xzr, [x0, #40])
+USER_OFF(48, sttr xzr, [x0, #48])
+USER_OFF(56, sttr xzr, [x0, #56])
+ add x0, x0, #64
+ sub x1, x1, #64
+
+ cmp x1, #64
+ b.hs body_64b_loop
+
+SYM_INNER_LABEL(tail_32b, SYM_L_LOCAL)
+ tbz x1, #5, tail_16b
+
+USER_OFF(0, sttr xzr, [x0, #0])
+USER_OFF(8, sttr xzr, [x0, #8])
+USER_OFF(16, sttr xzr, [x0, #16])
+USER_OFF(24, sttr xzr, [x0, #24])
+ add x0, x0, #32
+
+SYM_INNER_LABEL(tail_16b, SYM_L_LOCAL)
+ tbz x1, #4, tail_8b
+
+USER_OFF(0, sttr xzr, [x0, #0])
+USER_OFF(8, sttr xzr, [x0, #8])
+ add x0, x0, #16
+
+SYM_INNER_LABEL(tail_8b, SYM_L_LOCAL)
+ tbz x1, #3, tail_4b
+
+USER_OFF(0, sttr xzr, [x0])
+ add x0, x0, #8
+
+SYM_INNER_LABEL(tail_4b, SYM_L_LOCAL)
+ tbz x1, #2, tail_2b
+
+USER_OFF(0, sttr wzr, [x0])
+ add x0, x0, #4
+
+SYM_INNER_LABEL(tail_2b, SYM_L_LOCAL)
+ tbz x1, #1, tail_1b
+
+USER_OFF(0, sttrh wzr, [x0])
+ add x0, x0, #2
+
+SYM_INNER_LABEL(tail_1b, SYM_L_LOCAL)
+ tbz x1, #0, done
+
+USER_OFF(0, sttrb wzr, [x0])
+
+SYM_INNER_LABEL(done, SYM_L_LOCAL)
+ mov x0, xzr
+ ret
+
+SYM_INNER_LABEL(bytewise_loop, SYM_L_LOCAL)
+ cbz x1, done
+
+USER_OFF(0, sttrb wzr, [x0])
+ add x0, x0, #1
+ sub x1, x1, #1
+
+ b bytewise_loop
+
+FIXUP_OFFSET(0)
+FIXUP_OFFSET(8)
+FIXUP_OFFSET(16)
+FIXUP_OFFSET(24)
+FIXUP_OFFSET(32)
+FIXUP_OFFSET(40)
+FIXUP_OFFSET(48)
+FIXUP_OFFSET(56)
SYM_FUNC_END(__arch_clear_user)
EXPORT_SYMBOL(__arch_clear_user)
diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S
index fa603487e8571..9eab9ec6c71e9 100644
--- a/arch/arm64/lib/copy_to_user.S
+++ b/arch/arm64/lib/copy_to_user.S
@@ -7,7 +7,6 @@
#include <asm/asm-uaccess.h>
#include <asm/assembler.h>
-#include <asm/cache.h>
#define USER_OFF(off, x...) USER(fixup_offset_##off, x)
--
2.30.2
On Tue, Mar 21, 2023 at 12:25:11PM +0000, Mark Rutland wrote:
> +static void assert_size_valid(struct kunit *test,
> + const struct usercopy_params *params,
> + unsigned long ret)
> +{
> + const unsigned long size = params->size;
> + const unsigned long offset = params->offset;
I think you should drop the 'unsigned' for 'offset', it better matches
the usercopy_params structure and the 'offset < 0' test.
> +
> + if (ret > size) {
> + KUNIT_ASSERT_FAILURE(test,
> + "return value is impossibly large (offset=%ld, size=%lu, ret=%lu)",
> + offset, size, ret);
> + }
> +
> + /*
> + * When the user buffer starts within a faulting page, no bytes can be
> + * copied, so ret must equal size.
> + */
> + if (offset < 0 || offset >= PAGE_SIZE) {
> + if (ret == size)
> + return;
> +
> + KUNIT_ASSERT_FAILURE(test,
> + "impossible copy did not fail (offset=%ld, size=%lu, ret=%lu)",
> + offset, size, ret);
> + }
> +
> + /*
> + * When the user buffer fits entirely within a non-faulting page, all
> + * bytes must be copied, so ret must equal 0.
> + */
> + if (offset + size <= PAGE_SIZE) {
> + if (ret == 0)
> + return;
> +
> + KUNIT_ASSERT_FAILURE(test,
> + "completely possible copy failed (offset=%ld, size=%lu, ret=%lu)",
> + offset, size, ret);
> + }
> +
> + /*
> + * The buffer starts in a non-faulting page, but continues into a
> + * faulting page. At least one byte must be copied, and at most all the
> + * non-faulting bytes may be copied.
> + */
> + if (ret == size) {
> + KUNIT_ASSERT_FAILURE(test,
> + "too few bytes consumed (offset=%ld, size=%lu, ret=%lu)",
> + offset, size, ret);
> + }
> +
> + if (ret < (offset + size) - PAGE_SIZE) {
> + KUNIT_ASSERT_FAILURE(test,
> + "too many bytes consumed (offset=%ld, size=%lu, ret=%lu)",
> + offset, size, ret);
> + }
> +}
Nitpick: slightly less indentation probably if we write the above as:
KUNIT_ASSERT_TRUE_MSG(test,
ret < (offset + size) - PAGE_SIZE,
"too many bytes consumed (offset=%ld, size=%lu, ret=%lu)",
offset, size, ret);
Not sure this works for the early return cases above.
> +static void assert_src_valid(struct kunit *test,
> + const struct usercopy_params *params,
> + const char *src, long src_offset,
> + unsigned long ret)
> +{
> + const unsigned long size = params->size;
> + const unsigned long offset = params->offset;
The unsigned offset here doesn't matter much but I'd drop it as well.
> + /*
> + * A usercopy MUST NOT modify the source buffer.
> + */
> + for (int i = 0; i < PAGE_SIZE; i++) {
> + char val = src[i];
> +
> + if (val == buf_pattern(i))
> + continue;
> +
> + KUNIT_ASSERT_FAILURE(test,
> + "source bytes modified (src[%d]=0x%x, offset=%ld, size=%lu, ret=%lu)",
> + i, offset, size, ret);
> + }
> +}
> +
> +static void assert_dst_valid(struct kunit *test,
> + const struct usercopy_params *params,
> + const char *dst, long dst_offset,
> + unsigned long ret)
> +{
> + const unsigned long size = params->size;
> + const unsigned long offset = params->offset;
> +
> + /*
> + * A usercopy MUST NOT modify any bytes before the destination buffer.
> + */
> + for (int i = 0; i < dst_offset; i++) {
> + char val = dst[i];
> +
> + if (val == 0)
> + continue;
> +
> + KUNIT_ASSERT_FAILURE(test,
> + "pre-destination bytes modified (dst_page[%d]=0x%x, offset=%ld, size=%lu, ret=%lu)",
> + i, val, offset, size, ret);
> + }
> +
> + /*
> + * A usercopy MUST NOT modify any bytes after the end of the destination
> + * buffer.
> + */
Without looking at the arm64 fixes in this series, I think such test can
fail for us currently given the right offset.
> + for (int i = dst_offset + size - ret; i < PAGE_SIZE; i++) {
> + char val = dst[i];
> +
> + if (val == 0)
> + continue;
> +
> + KUNIT_ASSERT_FAILURE(test,
> + "post-destination bytes modified (dst_page[%d]=0x%x, offset=%ld, size=%lu, ret=%lu)",
> + i, val, offset, size, ret);
> + }
> +}
[...]
> +/*
> + * Generate the size and offset combinations to test.
> + *
> + * Usercopies may have different paths for small/medium/large copies, but
> + * typically max out at 64 byte chunks. We test sizes 0 to 128 bytes to check
> + * all combinations of leading/trailing chunks and bulk copies.
> + *
> + * For each size, we test every offset relative to a leading and trailing page
> + * boundary (i.e. [size, 0] and [PAGE_SIZE - size, PAGE_SIZE]) to check every
> + * possible faulting boundary.
> + */
> +#define for_each_size_offset(size, offset) \
> + for (unsigned long size = 0; size <= 128; size++) \
> + for (long offset = -size; \
> + offset <= (long)PAGE_SIZE; \
> + offset = (offset ? offset + 1: (PAGE_SIZE - size)))
> +
> +static void test_copy_to_user(struct kunit *test)
> +{
> + const struct usercopy_env *env = test->priv;
> +
> + for_each_size_offset(size, offset) {
> + const struct usercopy_params params = {
> + .size = size,
> + .offset = offset,
> + };
> + unsigned long ret;
> +
> + buf_init_zero(env->ubuf);
> + buf_init_pattern(env->kbuf);
> +
> + ret = do_copy_to_user(env, ¶ms);
> +
> + assert_size_valid(test, ¶ms, ret);
> + assert_src_valid(test, ¶ms, env->kbuf, 0, ret);
> + assert_dst_valid(test, ¶ms, env->ubuf, params.offset, ret);
> + assert_copy_valid(test, ¶ms,
> + env->ubuf, params.offset,
> + env->kbuf, 0,
> + ret);
> + }
> +}
IIUC, in such tests you only vary the destination offset. Our copy
routines in general try to align the source and leave the destination
unaligned for performance. It would be interesting to add some variation
on the source offset as well to spot potential issues with that part of
the memcpy routines.
--
Catalin
On Tue, Mar 21, 2023 at 12:25:12PM +0000, Mark Rutland wrote:
> +static void assert_clear_valid(struct kunit *test,
> + const struct usercopy_params *params,
> + const char *dst, long dst_offset,
> + unsigned long ret)
> +{
> + const unsigned long size = params->size;
> + const unsigned long offset = params->offset;
Nit: long offset.
Otherwise,
Reviewed-by: Catalin Marinas <[email protected]>
On Tue, Mar 21, 2023 at 5:25 AM Mark Rutland <[email protected]> wrote:
>
> For some combinations of sizes and alignments __{arch,raw}_copy_to_user
> will copy some bytes between (to + size - N) and (to + size), but will
> never modify bytes past (to + size).
>
> This violates the documentation in <linux/uaccess.h>, which states:
>
> > If raw_copy_{to,from}_user(to, from, size) returns N, size - N bytes
> > starting at to must become equal to the bytes fetched from the
> > corresponding area starting at from. All data past to + size - N must
> > be left unmodified.
Hmm.
I'm not 100% sure we couldn't just relax the documentation.
After all, the "exception happens in the middle of a copy" is a
special case, and generally results in -EFAULT rather than any
indication of "this is how much data we filled in for user space".
Now, some operations do *try* to generally give partial results
(notably "read()") even in the presence of page faults in the middle,
but I'm not entirely convinced we need to bend over backwards over
this.
Put another way: we have lots of situations where we fill in partial
user butffers and then return EFAULT, so "copy_to_user()" has at no
point been "atomic" wrt the return value.
So I in no way hate this patch, and I do think it's a good "QoI fix",
but if this ends up being painful for some architecture, I get the
feeling that we could easily just relax the implementation instead.
We have accepted the whole "return value is not byte-exact" thing
forever, simply because we have never required user copies to be done
as byte-at-a-time copies.
Now, it is undoubtedly true that the buffer we fill in with a user copy must
(a) be filled AT LEAST as much as we reported it to be filled in (ie
user space can expect that there's no uninitialized data in any range
we claimed to fill)
(b) obviously never filled past the buffer size that was given
But if we take an exception in the middle, and write a partial aligned
word, and don't report that as written (which is what you are fixing),
I really feel that is a "QoI" thing, not a correctness thing.
I don't think this arm implementation thing has ever hurt anything, in
other words.
That said, at some point that quality-of-implementation thing makes
validation so much easier that maybe it's worth doing just for that
reason, which is why I think "if it's not too painful, go right ahead"
is fine.
Linus
On Tue, Mar 21, 2023 at 5:25 AM Mark Rutland <[email protected]> wrote:
>
> * arm64's copy_to_user() under-reports the number of bytes copied in
> some cases, e.g.
So I think this is the ok case.
> * arm's copy_to_user() under-reports the number of bytes copied in some
> cases, and both copy_to_user() and copy_from_user() don't guarantee
> that at least a single byte is copied when a partial copy is possible,
Again, this is ok historically.
> * i386's copy_from_user does not guarantee that at least a single byte
> is copied when a partial copit is possible, e.g.
>
> | too few bytes consumed (offset=4093, size=8, ret=8)
And here's the real example of "we've always done this optimization".
The exact details have differed, but the i386 case is the really
really traditional one: it does word-at-a-time copies, and does *not*
try to fall back to byte-wise copies. Never has.
> * riscv's copy_to_user() and copy_from_user() don't guarantee that at
> least a single byte is copied when a partial copy is possible, e.g.
>
> | too few bytes consumed (offset=4095, size=2, ret=2)
Yup. This is all the same "we've never forced byte-at-a-time copies"
> * s390 passes all tests
>
> * sparc's copy_from_user() over-reports the number of bbytes copied in
> some caes, e.g.
So this case I think this is wrong, and an outright bug. That can
cause people to think that uninitialized data is initialized, and leak
sensitive information.
> * x86_64 passes all tests
I suspect your testing is flawed due to being too limited, and x86-64
having multiple different copying routines.
Yes, at some point we made everything be quite careful with
"handle_tail" etc, but we end up still having things that fail early,
and fail hard.
At a minimum, at least unsafe_copy_to_user() will fault and not do the
"fill to the very last byte" case. Of course, that doesn't return a
partial length (it only has a "fail" case), but it's an example of
this whole thing where we haven't really been byte-exact when doing
copies.
So again, I get the feeling that these rules may make sense from a
validation standpoint, but I'm not 100% sure we should generally have
to be this careful.
Linus
On Tue, Mar 21, 2023 at 11:04:26AM -0700, Linus Torvalds wrote:
> On Tue, Mar 21, 2023 at 5:25 AM Mark Rutland <[email protected]> wrote:
> >
> > * arm64's copy_to_user() under-reports the number of bytes copied in
> > some cases, e.g.
>
> So I think this is the ok case.
>
> > * arm's copy_to_user() under-reports the number of bytes copied in some
> > cases, and both copy_to_user() and copy_from_user() don't guarantee
> > that at least a single byte is copied when a partial copy is possible,
>
> Again, this is ok historically.
>
> > * i386's copy_from_user does not guarantee that at least a single byte
> > is copied when a partial copit is possible, e.g.
> >
> > | too few bytes consumed (offset=4093, size=8, ret=8)
>
> And here's the real example of "we've always done this optimization".
> The exact details have differed, but the i386 case is the really
> really traditional one: it does word-at-a-time copies, and does *not*
> try to fall back to byte-wise copies. Never has.
Sure; I understand that. The reason for pointing this out is that Al was very
specific that implementations *must* guarantee this back in:
https://lore.kernel.org/all/[email protected]/
... and that this could be done by having the fixup handler try to copy a byte.
I had assumed that *something* depended upon that, but I don't know what that
something actually is.
I'm not wedded to the semantic either way; if that's not required I can drop it
from the tests.
> > * s390 passes all tests
> >
> > * sparc's copy_from_user() over-reports the number of bbytes copied in
> > some caes, e.g.
>
> So this case I think this is wrong, and an outright bug. That can
> cause people to think that uninitialized data is initialized, and leak
> sensitive information.
Agreed.
> > * x86_64 passes all tests
>
> I suspect your testing is flawed due to being too limited, and x86-64
> having multiple different copying routines.
Sorry; I should've called that out explicitly. I'm aware I'm not testing all
the variants (I'd be happy to); I just wanted to check that I wasn't going off
into the weeds with the semantics first.
I probably should've sent this as an RFC...
> Yes, at some point we made everything be quite careful with
> "handle_tail" etc, but we end up still having things that fail early,
> and fail hard.
>
> At a minimum, at least unsafe_copy_to_user() will fault and not do the
> "fill to the very last byte" case. Of course, that doesn't return a
> partial length (it only has a "fail" case), but it's an example of
> this whole thing where we haven't really been byte-exact when doing
> copies.
Sure; that does seem to be different structurally too, so it'd need to be
plumbed into the harness differently.
I'll note that's more like {get,put}_user() which similarly just have a fail
case (and a put_user() could do a parital write then fault).
> So again, I get the feeling that these rules may make sense from a
> validation standpoint, but I'm not 100% sure we should generally have
> to be this careful.
I'm more than happy to relax the tests (and the docs); I just need to know
where the boundary is between what we must guarantee and what's a nice-to-have.
Thanks,
Mark.
On Tue, Mar 21, 2023 at 05:09:48PM +0000, Catalin Marinas wrote:
> On Tue, Mar 21, 2023 at 12:25:11PM +0000, Mark Rutland wrote:
> > +static void assert_size_valid(struct kunit *test,
> > + const struct usercopy_params *params,
> > + unsigned long ret)
> > +{
> > + const unsigned long size = params->size;
> > + const unsigned long offset = params->offset;
>
> I think you should drop the 'unsigned' for 'offset', it better matches
> the usercopy_params structure and the 'offset < 0' test.
Agreed. I'll go fix all offset values to use long.
[...]
> > + if (ret < (offset + size) - PAGE_SIZE) {
> > + KUNIT_ASSERT_FAILURE(test,
> > + "too many bytes consumed (offset=%ld, size=%lu, ret=%lu)",
> > + offset, size, ret);
> > + }
> > +}
>
> Nitpick: slightly less indentation probably if we write the above as:
>
> KUNIT_ASSERT_TRUE_MSG(test,
> ret < (offset + size) - PAGE_SIZE,
> "too many bytes consumed (offset=%ld, size=%lu, ret=%lu)",
> offset, size, ret);
>
> Not sure this works for the early return cases above.
I had originally used KUNIT_ASSERT_*_MSG(), but found those produced a lot of
unhelpful output; lemme go check what KUNIT_ASSERT_TRUE_MSG() produces.
[...]
> > + /*
> > + * A usercopy MUST NOT modify any bytes before the destination buffer.
> > + */
> > + for (int i = 0; i < dst_offset; i++) {
> > + char val = dst[i];
> > +
> > + if (val == 0)
> > + continue;
> > +
> > + KUNIT_ASSERT_FAILURE(test,
> > + "pre-destination bytes modified (dst_page[%d]=0x%x, offset=%ld, size=%lu, ret=%lu)",
> > + i, val, offset, size, ret);
> > + }
> > +
> > + /*
> > + * A usercopy MUST NOT modify any bytes after the end of the destination
> > + * buffer.
> > + */
>
> Without looking at the arm64 fixes in this series, I think such test can
> fail for us currently given the right offset.
Yes, it can.
The test matches the documened semantic, so in that sense it's correctly
detecting that arm64 doesn't match the documentation.
Whether the documentation is right is clearly the key issue here. :)
> > +/*
> > + * Generate the size and offset combinations to test.
> > + *
> > + * Usercopies may have different paths for small/medium/large copies, but
> > + * typically max out at 64 byte chunks. We test sizes 0 to 128 bytes to check
> > + * all combinations of leading/trailing chunks and bulk copies.
> > + *
> > + * For each size, we test every offset relative to a leading and trailing page
> > + * boundary (i.e. [size, 0] and [PAGE_SIZE - size, PAGE_SIZE]) to check every
> > + * possible faulting boundary.
> > + */
> > +#define for_each_size_offset(size, offset) \
> > + for (unsigned long size = 0; size <= 128; size++) \
> > + for (long offset = -size; \
> > + offset <= (long)PAGE_SIZE; \
> > + offset = (offset ? offset + 1: (PAGE_SIZE - size)))
> > +
> > +static void test_copy_to_user(struct kunit *test)
> > +{
> > + const struct usercopy_env *env = test->priv;
> > +
> > + for_each_size_offset(size, offset) {
> > + const struct usercopy_params params = {
> > + .size = size,
> > + .offset = offset,
> > + };
> > + unsigned long ret;
> > +
> > + buf_init_zero(env->ubuf);
> > + buf_init_pattern(env->kbuf);
> > +
> > + ret = do_copy_to_user(env, ¶ms);
> > +
> > + assert_size_valid(test, ¶ms, ret);
> > + assert_src_valid(test, ¶ms, env->kbuf, 0, ret);
> > + assert_dst_valid(test, ¶ms, env->ubuf, params.offset, ret);
> > + assert_copy_valid(test, ¶ms,
> > + env->ubuf, params.offset,
> > + env->kbuf, 0,
> > + ret);
> > + }
> > +}
>
> IIUC, in such tests you only vary the destination offset. Our copy
> routines in general try to align the source and leave the destination
> unaligned for performance. It would be interesting to add some variation
> on the source offset as well to spot potential issues with that part of
> the memcpy routines.
I have that on my TODO list; I had intended to drop that into the
usercopy_params. The only problem is that the cross product of size,
src_offset, and dst_offset gets quite large.
Thanks,
Mark.
On Tue, Mar 21, 2023 at 10:50:33AM -0700, Linus Torvalds wrote:
> On Tue, Mar 21, 2023 at 5:25 AM Mark Rutland <[email protected]> wrote:
> >
> > For some combinations of sizes and alignments __{arch,raw}_copy_to_user
> > will copy some bytes between (to + size - N) and (to + size), but will
> > never modify bytes past (to + size).
> >
> > This violates the documentation in <linux/uaccess.h>, which states:
> >
> > > If raw_copy_{to,from}_user(to, from, size) returns N, size - N bytes
> > > starting at to must become equal to the bytes fetched from the
> > > corresponding area starting at from. All data past to + size - N must
> > > be left unmodified.
>
> Hmm.
>
> I'm not 100% sure we couldn't just relax the documentation.
Ok.
> After all, the "exception happens in the middle of a copy" is a
> special case, and generally results in -EFAULT rather than any
> indication of "this is how much data we filled in for user space".
>
> Now, some operations do *try* to generally give partial results
> (notably "read()") even in the presence of page faults in the middle,
> but I'm not entirely convinced we need to bend over backwards over
> this.
If you think we should relax the documented semantic, I can go do that. If we
actually need the documented semantic in some cases, then something will need
to change.
All I'm really after is "what should the semantic be?" since there's clearly a
disconnect between the documentation and the code. I'm happy to go update
either.
> Put another way: we have lots of situations where we fill in partial
> user butffers and then return EFAULT, so "copy_to_user()" has at no
> point been "atomic" wrt the return value.
>
> So I in no way hate this patch, and I do think it's a good "QoI fix",
> but if this ends up being painful for some architecture, I get the
> feeling that we could easily just relax the implementation instead.
>
> We have accepted the whole "return value is not byte-exact" thing
> forever, simply because we have never required user copies to be done
> as byte-at-a-time copies.
>
> Now, it is undoubtedly true that the buffer we fill in with a user copy must
>
> (a) be filled AT LEAST as much as we reported it to be filled in (ie
> user space can expect that there's no uninitialized data in any range
> we claimed to fill)
>
> (b) obviously never filled past the buffer size that was given
I agree those are both necessary.
> But if we take an exception in the middle, and write a partial aligned
> word, and don't report that as written (which is what you are fixing),
> I really feel that is a "QoI" thing, not a correctness thing.
>
> I don't think this arm implementation thing has ever hurt anything, in
> other words.
>
> That said, at some point that quality-of-implementation thing makes
> validation so much easier that maybe it's worth doing just for that
> reason, which is why I think "if it's not too painful, go right ahead"
> is fine.
Fair enough.
Thanks,
Mark.
On Tue, Mar 21, 2023 at 12:25:13PM +0000, Mark Rutland wrote:
> For some combinations of sizes and alignments __{arch,raw}_copy_to_user
> will copy some bytes between (to + size - N) and (to + size), but will
> never modify bytes past (to + size).
>
> This violates the documentation in <linux/uaccess.h>, which states:
>
> > If raw_copy_{to,from}_user(to, from, size) returns N, size - N bytes
> > starting at to must become equal to the bytes fetched from the
> > corresponding area starting at from. All data past to + size - N must
> > be left unmodified.
>
> This can be demonstrated through testing, e.g.
>
> | # test_copy_to_user: EXPECTATION FAILED at lib/usercopy_kunit.c:287
> | post-destination bytes modified (dst_page[4082]=0x1, offset=4081, size=16, ret=15)
> | [FAILED] 16 byte copy
>
> This happens because the __arch_copy_to_user() can make unaligned stores
> to the userspace buffer, and the ARM architecture permits (but does not
> require) that such unaligned stores write some bytes before raising a
> fault (per ARM DDI 0487I.a Section B2.2.1 and Section B2.7.1). The
> extable fixup handlers in __arch_copy_to_user() assume that any faulting
> store has failed entirely, and so under-report the number of bytes
> copied when an unaligned store writes some bytes before faulting.
I find the Arm ARM hard to parse (no surprise here). Do you happen to
know what the behavior is for the new CPY instructions? I'd very much
like to use those for uaccess as well eventually but if they have the
same imp def behaviour, I'd rather relax the documentation and continue
to live with the current behaviour.
> The only architecturally guaranteed way to avoid this is to only use
> aligned stores to write to user memory. This patch rewrites
> __arch_copy_to_user() to only access the user buffer with aligned
> stores, such that the bytes written can always be determined reliably.
Can we not fall back to byte-at-a-time? There's still a potential race
if the page becomes read-only for example. Well, probably not worth it
if we decide to go this route.
Where we may notice some small performance degradation is copy_to_user()
where the reads from the source end up unaligned due to the destination
buffer alignment. I doubt that's a common case though and most CPUs can
probably cope just well with this.
--
Catalin
On Wed, Mar 22, 2023 at 02:48:26PM +0000, Catalin Marinas wrote:
> On Tue, Mar 21, 2023 at 12:25:13PM +0000, Mark Rutland wrote:
> > For some combinations of sizes and alignments __{arch,raw}_copy_to_user
> > will copy some bytes between (to + size - N) and (to + size), but will
> > never modify bytes past (to + size).
> >
> > This violates the documentation in <linux/uaccess.h>, which states:
> >
> > > If raw_copy_{to,from}_user(to, from, size) returns N, size - N bytes
> > > starting at to must become equal to the bytes fetched from the
> > > corresponding area starting at from. All data past to + size - N must
> > > be left unmodified.
> >
> > This can be demonstrated through testing, e.g.
> >
> > | # test_copy_to_user: EXPECTATION FAILED at lib/usercopy_kunit.c:287
> > | post-destination bytes modified (dst_page[4082]=0x1, offset=4081, size=16, ret=15)
> > | [FAILED] 16 byte copy
> >
> > This happens because the __arch_copy_to_user() can make unaligned stores
> > to the userspace buffer, and the ARM architecture permits (but does not
> > require) that such unaligned stores write some bytes before raising a
> > fault (per ARM DDI 0487I.a Section B2.2.1 and Section B2.7.1). The
> > extable fixup handlers in __arch_copy_to_user() assume that any faulting
> > store has failed entirely, and so under-report the number of bytes
> > copied when an unaligned store writes some bytes before faulting.
>
> I find the Arm ARM hard to parse (no surprise here). Do you happen to
> know what the behavior is for the new CPY instructions? I'd very much
> like to use those for uaccess as well eventually but if they have the
> same imp def behaviour, I'd rather relax the documentation and continue
> to live with the current behaviour.
My understanding is that those have to be broken up into a set of smaller
accesses, and so the same applies, with the architectural window for clobbering
being the *entire* range from 'to' to 'to + size'
That said, the description of the forward-only instructions (CPYF*) suggests
those might have to be exact.
In ARM DDI 0487I.a, section C6.2.80 "CPYFPWT, CPYFMWT, CPYFEWT", it says:
> The memory copy performed by these instructions is in the forward direction
> only, so the instructions are suitable for a memory copy only where there is
> no overlap between the source and destination locations, or where the source
> address is greater than the destination address.
That *might* mean in practice that CPYF* instructions can't clobber later bytes
in dst in case they'll later be consumed as src bytes, but it doesn't strictly
say that (and maybe the core would figure out how much potential overlap there
is and tune the copy chunk size).
We could chase our architects on that.
> > The only architecturally guaranteed way to avoid this is to only use
> > aligned stores to write to user memory. This patch rewrites
> > __arch_copy_to_user() to only access the user buffer with aligned
> > stores, such that the bytes written can always be determined reliably.
>
> Can we not fall back to byte-at-a-time? There's still a potential race
> if the page becomes read-only for example. Well, probably not worth it
> if we decide to go this route.
I was trying to avoid that shape of race.
I also believe that if we have a misaligned store straddling two pages, and the
first page is faulting, it the store can do a partial write to the 2nd page,
which I suspected is not what we want (though maybe that's beningn, if we're
going to say that clobbering anywhere within the dst buffer is fine).
> Where we may notice some small performance degradation is copy_to_user()
> where the reads from the source end up unaligned due to the destination
> buffer alignment. I doubt that's a common case though and most CPUs can
> probably cope just well with this.
FWIW, I instrumented a kernel build workload, and the significant majority of
the time (IIRC by ~100x), both src and dst happened to be aligned to at least 8
bytes, so even if that were slow its the rare case. I guess that's because any
structures we copy are likely to contain a long or a pointer, and a bunch of
other copies are going to be bulk page copies into buffers which the allocator
has aligned to at least 8 bytes.
Thanks,
Mark.
On Wed, Mar 22, 2023 at 8:30 AM Mark Rutland <[email protected]> wrote:
>
> I also believe that if we have a misaligned store straddling two pages, and the
> first page is faulting, it the store can do a partial write to the 2nd page,
> which I suspected is not what we want (though maybe that's beningn, if we're
> going to say that clobbering anywhere within the dst buffer is fine).
So I don't think that clobbering anywhere in the write buffer is fine
in general:, since user space may well depend on the whole "kernel
wrote exactly this range" (ie people who implement things like
circular buffers in user space that are filled by the kernel with a
read() system call).
But that's about partial read() results in general, and things like
interruptible reads (or just partial results from a pipe) in
particular.
At the same time EFAULT really is pretty special. We've *tried* to
make it restartable by user space, but honestly, I'm not entirely sure
that was ever a good idea, and I'm not even sure anybody really
depends on it.
For that case, I don't think we necessarily should care too deeply.
HOWEVER.
There is one very very special case: we may not care about
"restartable system calls" from user space, and say "we might as well
just always return EFAULT for any partial result".
That is, after all, what we already do for many cases (ie structure
copies, "put_user()" and friends). And I suspect it's what a lot of
other systems do.
No, the one special case is for the _kernel_ use of restartable user
copies, and the "__copy_from_user_inatomic()" case in particular.
They aren't hugely common, but they are required for some cases
(notably filesystems that hold locks and cannot take user page faults
due to deadlock scenarios), and *those* need very special care.
They still don't need to be byte-exact, but they do need to be able to
restart and most notably they need to be able to make forward progress
(together with a separate "fault_in_user_readable/writable()").
We've had situations where that didn't happen, and then you get actual
kernel hangs when some filesystem just does an endless loop with page
faults.
And by "make forward progress", it's actually fine to write too much
to user space, and return a short return value, and when restarting,
do some of the writes to user space *again*. It just has to be
restartable with the same data, and it can't keep taking a fault
thanks to some bad interaction with the "fault-in" case.
Of course, that does end up using the exact same user copy code, just
under "pagefault_disable()" (which is often implicitly called through
something like "kmap_atomic()" or similar).
So I don't think we need to be byte-exact, but we do need to keep that
restart case in mind. But again, the restart can re-do parts of the
copy.
Linus
From: Mark Rutland
> Sent: 22 March 2023 14:05
....
> > IIUC, in such tests you only vary the destination offset. Our copy
> > routines in general try to align the source and leave the destination
> > unaligned for performance. It would be interesting to add some variation
> > on the source offset as well to spot potential issues with that part of
> > the memcpy routines.
>
> I have that on my TODO list; I had intended to drop that into the
> usercopy_params. The only problem is that the cross product of size,
> src_offset, and dst_offset gets quite large.
I thought that is was better to align the writes and do misaligned reads.
Although maybe copy_to/from_user() would be best aligning the user address
(to avoid page faults part way through a misaligned access).
OTOH, on x86, is it even worth bothering at all.
I have measured a performance drop for misaligned reads, but it
was less than 1 clock per cache line in a test that was doing
2 misaligned reads in at least some of the clock cycles.
I think the memory read path can do two AVX reads each clock.
So doing two misaligned 64bit reads isn't stressing it.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Thu, Mar 23, 2023 at 10:16:12PM +0000, David Laight wrote:
> From: Mark Rutland
> > Sent: 22 March 2023 14:05
> ....
> > > IIUC, in such tests you only vary the destination offset. Our copy
> > > routines in general try to align the source and leave the destination
> > > unaligned for performance. It would be interesting to add some variation
> > > on the source offset as well to spot potential issues with that part of
> > > the memcpy routines.
> >
> > I have that on my TODO list; I had intended to drop that into the
> > usercopy_params. The only problem is that the cross product of size,
> > src_offset, and dst_offset gets quite large.
>
> I thought that is was better to align the writes and do misaligned reads.
We inherited the memcpy/memset routines from the optimised cortex
strings library (fine-tuned by the toolchain people for various Arm
microarchitectures). For some CPUs with less aggressive prefetching it's
probably marginally faster to align the reads instead of writes (as
multiple unaligned writes are usually combined in the write buffer
somewhere).
Also, IIRC for some small copies (less than 16 bytes), our routines
don't bother with any alignment at all.
> Although maybe copy_to/from_user() would be best aligning the user address
> (to avoid page faults part way through a misaligned access).
In theory only copy_to_user() needs the write aligned if we want strict
guarantees of what was written. For copy_from_user() we can work around
by falling back to a byte read.
> OTOH, on x86, is it even worth bothering at all.
> I have measured a performance drop for misaligned reads, but it
> was less than 1 clock per cache line in a test that was doing
> 2 misaligned reads in at least some of the clock cycles.
> I think the memory read path can do two AVX reads each clock.
> So doing two misaligned 64bit reads isn't stressing it.
I think that's what Mark found as well in his testing, though I'm sure
one can build a very specific benchmark that shows a small degradation.
--
Catalin
On Tue, Mar 21, 2023 at 12:25:10PM +0000, Mark Rutland wrote:
> This series adds tests for the usercopy functions, which have found a
> few issues on most architectures I've tested on. The last two patches
> are fixes for arm64 (which pass all the tests, and are at least as good
> performance-wise as the existing implementation).
> [...]
> lib/usercopy_kunit.c | 506 ++++++++++++++++++++++++++++++++++
You didn't like lib/test_user_copy.c ? :)
There was a prior attempt to move it to KUnit:
https://lore.kernel.org/lkml/[email protected]/
--
Kees Cook