=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer
tags into the top byte of each pointer. Userspace programs (such as
HWASan, a memory debugging tool [1]) might use this feature and pass
tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged
pointers, due to these patches:
1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a
tagged pointer")
2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged
pointers")
3. 276e9327 ("arm64: entry: improve data abort handling of tagged
pointers")
This patchset extends tagged pointer support to syscall arguments.
As per the proposed ABI change [3], tagged pointers are only allowed to be
passed to syscalls when they point to memory ranges obtained by anonymous
mmap() or brk().
For non-memory syscalls this is done by untaging user pointers when the
kernel performs pointer checking to find out whether the pointer comes
from userspace (most notably in access_ok). The untagging is done only
when the pointer is being checked, the tag is preserved as the pointer
makes its way through the kernel and stays tagged when the kernel
dereferences the pointer when perfoming user memory accesses.
Memory syscalls (mmap, mprotect, etc.) don't do user memory accesses but
rather deal with memory ranges, and untagged pointers are better suited to
describe memory ranges internally. Thus for memory syscalls we untag
pointers completely when they enter the kernel.
=== Other approaches
One of the alternative approaches to untagging that was considered is to
completely strip the pointer tag as the pointer enters the kernel with
some kind of a syscall wrapper, but that won't work with the countless
number of different ioctl calls. With this approach we would need a custom
wrapper for each ioctl variation, which doesn't seem practical.
An alternative approach to untagging pointers in memory syscalls prologues
is to inspead allow tagged pointers to be passed to find_vma() (and other
vma related functions) and untag them there. Unfortunately, a lot of
find_vma() callers then compare or subtract the returned vma start and end
fields against the pointer that was being searched. Thus this approach
would still require changing all find_vma() callers.
=== Testing
The following testing approaches has been taken to find potential issues
with user pointer untagging:
1. Static testing (with sparse [2] and separately with a custom static
analyzer based on Clang) to track casts of __user pointers to integer
types to find places where untagging needs to be done.
2. Static testing with grep to find parts of the kernel that call
find_vma() (and other similar functions) or directly compare against
vm_start/vm_end fields of vma.
3. Static testing with grep to find parts of the kernel that compare
user pointers with TASK_SIZE or other similar consts and macros.
4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running
a modified syzkaller version that passes tagged pointers to the kernel.
Based on the results of the testing the requried patches have been added
to the patchset.
=== Notes
This patchset is meant to be merged together with "arm64 relaxed ABI" [3].
This patchset is a prerequisite for ARM's memory tagging hardware feature
support [4].
This patchset has been merged into the Pixel 2 kernel tree and is now
being used to enable testing of Pixel 2 phones with HWASan.
Thanks!
[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060e0145f292
[3] https://lkml.org/lkml/2018/12/10/402
[4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture-2018-developments-armv85a
Changes in v11:
- Added "uprobes, arm64: untag user pointers in find_active_uprobe" patch.
- Added "bpf, arm64: untag user pointers in stack_map_get_build_id_offset"
patch.
- Fixed "tracing, arm64: untag user pointers in seq_print_user_ip" to
correctly perform subtration with a tagged addr.
- Moved untagged_addr() from SYSCALL_DEFINE3(mprotect) and
SYSCALL_DEFINE4(pkey_mprotect) to do_mprotect_pkey().
- Moved untagged_addr() definition for other arches from
include/linux/memory.h to include/linux/mm.h.
- Changed untagging in strn*_user() to perform userspace accesses through
tagged pointers.
- Updated the documentation to mention that passing tagged pointers to
memory syscalls is allowed.
- Updated the test to use malloc'ed memory instead of stack memory.
Changes in v10:
- Added "mm, arm64: untag user pointers passed to memory syscalls" back.
- New patch "fs, arm64: untag user pointers in fs/userfaultfd.c".
- New patch "net, arm64: untag user pointers in tcp_zerocopy_receive".
- New patch "kernel, arm64: untag user pointers in prctl_set_mm*".
- New patch "tracing, arm64: untag user pointers in seq_print_user_ip".
Changes in v9:
- Rebased onto 4.20-rc6.
- Used u64 instead of __u64 in type casts in the untagged_addr macro for
arm64.
- Added braces around (addr) in the untagged_addr macro for other arches.
Changes in v8:
- Rebased onto 65102238 (4.20-rc1).
- Added a note to the cover letter on why syscall wrappers/shims that untag
user pointers won't work.
- Added a note to the cover letter that this patchset has been merged into
the Pixel 2 kernel tree.
- Documentation fixes, in particular added a list of syscalls that don't
support tagged user pointers.
Changes in v7:
- Rebased onto 17b57b18 (4.19-rc6).
- Dropped the "arm64: untag user address in __do_user_fault" patch, since
the existing patches already handle user faults properly.
- Dropped the "usb, arm64: untag user addresses in devio" patch, since the
passed pointer must come from a vma and therefore be untagged.
- Dropped the "arm64: annotate user pointers casts detected by sparse"
patch (see the discussion to the replies of the v6 of this patchset).
- Added more context to the cover letter.
- Updated Documentation/arm64/tagged-pointers.txt.
Changes in v6:
- Added annotations for user pointer casts found by sparse.
- Rebased onto 050cdc6c (4.19-rc1+).
Changes in v5:
- Added 3 new patches that add untagging to places found with static
analysis.
- Rebased onto 44c929e1 (4.18-rc8).
Changes in v4:
- Added a selftest for checking that passing tagged pointers to the
kernel succeeds.
- Rebased onto 81e97f013 (4.18-rc1+).
Changes in v3:
- Rebased onto e5c51f30 (4.17-rc6+).
- Added linux-arch@ to the list of recipients.
Changes in v2:
- Rebased onto 2d618bdf (4.17-rc3+).
- Removed excessive untagging in gup.c.
- Removed untagging pointers returned from __uaccess_mask_ptr.
Changes in v1:
- Rebased onto 4.17-rc1.
Changes in RFC v2:
- Added "#ifndef untagged_addr..." fallback in linux/uaccess.h instead of
defining it for each arch individually.
- Updated Documentation/arm64/tagged-pointers.txt.
- Dropped "mm, arm64: untag user addresses in memory syscalls".
- Rebased onto 3eb2ce82 (4.16-rc7).
Andrey Konovalov (14):
uaccess: add untagged_addr definition for other arches
arm64: untag user pointers in access_ok and __uaccess_mask_ptr
lib, arm64: untag user pointers in strn*_user
mm, arm64: untag user pointers passed to memory syscalls
mm, arm64: untag user pointers in mm/gup.c
fs, arm64: untag user pointers in copy_mount_options
fs, arm64: untag user pointers in fs/userfaultfd.c
net, arm64: untag user pointers in tcp_zerocopy_receive
kernel, arm64: untag user pointers in prctl_set_mm*
tracing, arm64: untag user pointers in seq_print_user_ip
uprobes, arm64: untag user pointers in find_active_uprobe
bpf, arm64: untag user pointers in stack_map_get_build_id_offset
arm64: update Documentation/arm64/tagged-pointers.txt
selftests, arm64: add a selftest for passing tagged pointers to kernel
Documentation/arm64/tagged-pointers.txt | 18 +++++++---------
arch/arm64/include/asm/uaccess.h | 10 +++++----
fs/namespace.c | 2 +-
fs/userfaultfd.c | 5 +++++
include/linux/mm.h | 4 ++++
ipc/shm.c | 2 ++
kernel/bpf/stackmap.c | 6 ++++--
kernel/events/uprobes.c | 2 ++
kernel/sys.c | 14 +++++++++++++
kernel/trace/trace_output.c | 5 +++--
lib/strncpy_from_user.c | 3 ++-
lib/strnlen_user.c | 3 ++-
mm/gup.c | 4 ++++
mm/madvise.c | 2 ++
mm/mempolicy.c | 5 +++++
mm/migrate.c | 1 +
mm/mincore.c | 2 ++
mm/mlock.c | 5 +++++
mm/mmap.c | 7 +++++++
mm/mprotect.c | 1 +
mm/mremap.c | 2 ++
mm/msync.c | 2 ++
net/ipv4/tcp.c | 2 ++
tools/testing/selftests/arm64/.gitignore | 1 +
tools/testing/selftests/arm64/Makefile | 11 ++++++++++
.../testing/selftests/arm64/run_tags_test.sh | 12 +++++++++++
tools/testing/selftests/arm64/tags_test.c | 21 +++++++++++++++++++
27 files changed, 131 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/arm64/.gitignore
create mode 100644 tools/testing/selftests/arm64/Makefile
create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh
create mode 100644 tools/testing/selftests/arm64/tags_test.c
--
2.21.0.360.g471c308f928-goog
To allow arm64 syscalls to accept tagged pointers from userspace, we must
untag them when they are passed to the kernel. Since untagging is done in
generic parts of the kernel, the untagged_addr macro needs to be defined
for all architectures.
Define it as a noop for architectures other than arm64.
Acked-by: Catalin Marinas <[email protected]>
Signed-off-by: Andrey Konovalov <[email protected]>
---
include/linux/mm.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 76769749b5a5..4d674518d392 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -99,6 +99,10 @@ extern int mmap_rnd_compat_bits __read_mostly;
#include <asm/pgtable.h>
#include <asm/processor.h>
+#ifndef untagged_addr
+#define untagged_addr(addr) (addr)
+#endif
+
#ifndef __pa_symbol
#define __pa_symbol(x) __pa(RELOC_HIDE((unsigned long)(x), 0))
#endif
--
2.21.0.360.g471c308f928-goog
This patch is a part of a series that extends arm64 kernel ABI to allow to
pass tagged user pointers (with the top byte set to something else other
than 0x00) as syscall arguments.
strncpy_from_user and strnlen_user accept user addresses as arguments, and
do not go through the same path as copy_from_user and others, so here we
need to handle the case of tagged user addresses separately.
Untag user pointers passed to these functions.
Note, that this patch only temporarily untags the pointers to perform
validity checks, but then uses them as is to perform user memory accesses.
Signed-off-by: Andrey Konovalov <[email protected]>
---
lib/strncpy_from_user.c | 3 ++-
lib/strnlen_user.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c
index 58eacd41526c..6209bb9507c7 100644
--- a/lib/strncpy_from_user.c
+++ b/lib/strncpy_from_user.c
@@ -6,6 +6,7 @@
#include <linux/uaccess.h>
#include <linux/kernel.h>
#include <linux/errno.h>
+#include <linux/mm.h>
#include <asm/byteorder.h>
#include <asm/word-at-a-time.h>
@@ -107,7 +108,7 @@ long strncpy_from_user(char *dst, const char __user *src, long count)
return 0;
max_addr = user_addr_max();
- src_addr = (unsigned long)src;
+ src_addr = (unsigned long)untagged_addr(src);
if (likely(src_addr < max_addr)) {
unsigned long max = max_addr - src_addr;
long retval;
diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c
index 1c1a1b0e38a5..8ca3d2ac32ec 100644
--- a/lib/strnlen_user.c
+++ b/lib/strnlen_user.c
@@ -2,6 +2,7 @@
#include <linux/kernel.h>
#include <linux/export.h>
#include <linux/uaccess.h>
+#include <linux/mm.h>
#include <asm/word-at-a-time.h>
@@ -109,7 +110,7 @@ long strnlen_user(const char __user *str, long count)
return 0;
max_addr = user_addr_max();
- src_addr = (unsigned long)str;
+ src_addr = (unsigned long)untagged_addr(str);
if (likely(src_addr < max_addr)) {
unsigned long max = max_addr - src_addr;
long retval;
--
2.21.0.360.g471c308f928-goog
This patch is a part of a series that extends arm64 kernel ABI to allow to
pass tagged user pointers (with the top byte set to something else other
than 0x00) as syscall arguments.
mm/gup.c provides a kernel interface that accepts user addresses and
manipulates user pages directly (for example get_user_pages, that is used
by the futex syscall). Since a user can provided tagged addresses, we need
to handle this case.
Add untagging to gup.c functions that use user addresses for vma lookups.
Signed-off-by: Andrey Konovalov <[email protected]>
---
mm/gup.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/gup.c b/mm/gup.c
index f84e22685aaa..3192741e0b3a 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -686,6 +686,8 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
if (!nr_pages)
return 0;
+ start = untagged_addr(start);
+
VM_BUG_ON(!!pages != !!(gup_flags & FOLL_GET));
/*
@@ -848,6 +850,8 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
struct vm_area_struct *vma;
vm_fault_t ret, major = 0;
+ address = untagged_addr(address);
+
if (unlocked)
fault_flags |= FAULT_FLAG_ALLOW_RETRY;
--
2.21.0.360.g471c308f928-goog
This patch is a part of a series that extends arm64 kernel ABI to allow to
pass tagged user pointers (with the top byte set to something else other
than 0x00) as syscall arguments.
tcp_zerocopy_receive() uses provided user pointers for vma lookups, which
can only by done with untagged pointers.
Untag user pointers in this function.
Signed-off-by: Andrey Konovalov <[email protected]>
---
net/ipv4/tcp.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6baa6dc1b13b..89db3b4fc753 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1758,6 +1758,8 @@ static int tcp_zerocopy_receive(struct sock *sk,
int inq;
int ret;
+ address = untagged_addr(address);
+
if (address & (PAGE_SIZE - 1) || address != zc->address)
return -EINVAL;
--
2.21.0.360.g471c308f928-goog
This patch is a part of a series that extends arm64 kernel ABI to allow to
pass tagged user pointers (with the top byte set to something else other
than 0x00) as syscall arguments.
Document the ABI changes in Documentation/arm64/tagged-pointers.txt.
Signed-off-by: Andrey Konovalov <[email protected]>
---
Documentation/arm64/tagged-pointers.txt | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/Documentation/arm64/tagged-pointers.txt b/Documentation/arm64/tagged-pointers.txt
index a25a99e82bb1..07fdddeacad0 100644
--- a/Documentation/arm64/tagged-pointers.txt
+++ b/Documentation/arm64/tagged-pointers.txt
@@ -17,13 +17,15 @@ this byte for application use.
Passing tagged addresses to the kernel
--------------------------------------
-All interpretation of userspace memory addresses by the kernel assumes
-an address tag of 0x00.
+The kernel supports tags in pointer arguments (including pointers in
+structures) of syscalls, however such pointers must point to memory ranges
+obtained by anonymous mmap() or brk().
-This includes, but is not limited to, addresses found in:
+The kernel supports tags in user fault addresses. However the fault_address
+field in the sigcontext struct will contain an untagged address.
- - pointer arguments to system calls, including pointers in structures
- passed to system calls,
+All other interpretations of userspace memory addresses by the kernel
+assume an address tag of 0x00, in particular:
- the stack pointer (sp), e.g. when interpreting it to deliver a
signal,
@@ -33,11 +35,7 @@ This includes, but is not limited to, addresses found in:
Using non-zero address tags in any of these locations may result in an
error code being returned, a (fatal) signal being raised, or other modes
-of failure.
-
-For these reasons, passing non-zero address tags to the kernel via
-system calls is forbidden, and using a non-zero address tag for sp is
-strongly discouraged.
+of failure. Using a non-zero address tag for sp is strongly discouraged.
Programs maintaining a frame pointer and frame records that use non-zero
address tags may suffer impaired or inaccurate debug and profiling
--
2.21.0.360.g471c308f928-goog
This patch is a part of a series that extends arm64 kernel ABI to allow to
pass tagged user pointers (with the top byte set to something else other
than 0x00) as syscall arguments.
This patch adds a simple test, that calls the uname syscall with a
tagged user pointer as an argument. Without the kernel accepting tagged
user pointers the test fails with EFAULT.
Signed-off-by: Andrey Konovalov <[email protected]>
---
tools/testing/selftests/arm64/.gitignore | 1 +
tools/testing/selftests/arm64/Makefile | 11 ++++++++++
.../testing/selftests/arm64/run_tags_test.sh | 12 +++++++++++
tools/testing/selftests/arm64/tags_test.c | 21 +++++++++++++++++++
4 files changed, 45 insertions(+)
create mode 100644 tools/testing/selftests/arm64/.gitignore
create mode 100644 tools/testing/selftests/arm64/Makefile
create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh
create mode 100644 tools/testing/selftests/arm64/tags_test.c
diff --git a/tools/testing/selftests/arm64/.gitignore b/tools/testing/selftests/arm64/.gitignore
new file mode 100644
index 000000000000..e8fae8d61ed6
--- /dev/null
+++ b/tools/testing/selftests/arm64/.gitignore
@@ -0,0 +1 @@
+tags_test
diff --git a/tools/testing/selftests/arm64/Makefile b/tools/testing/selftests/arm64/Makefile
new file mode 100644
index 000000000000..a61b2e743e99
--- /dev/null
+++ b/tools/testing/selftests/arm64/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+
+# ARCH can be overridden by the user for cross compiling
+ARCH ?= $(shell uname -m 2>/dev/null || echo not)
+
+ifneq (,$(filter $(ARCH),aarch64 arm64))
+TEST_GEN_PROGS := tags_test
+TEST_PROGS := run_tags_test.sh
+endif
+
+include ../lib.mk
diff --git a/tools/testing/selftests/arm64/run_tags_test.sh b/tools/testing/selftests/arm64/run_tags_test.sh
new file mode 100755
index 000000000000..745f11379930
--- /dev/null
+++ b/tools/testing/selftests/arm64/run_tags_test.sh
@@ -0,0 +1,12 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+echo "--------------------"
+echo "running tags test"
+echo "--------------------"
+./tags_test
+if [ $? -ne 0 ]; then
+ echo "[FAIL]"
+else
+ echo "[PASS]"
+fi
diff --git a/tools/testing/selftests/arm64/tags_test.c b/tools/testing/selftests/arm64/tags_test.c
new file mode 100644
index 000000000000..2bd1830a7ebe
--- /dev/null
+++ b/tools/testing/selftests/arm64/tags_test.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <sys/utsname.h>
+
+#define SHIFT_TAG(tag) ((uint64_t)(tag) << 56)
+#define SET_TAG(ptr, tag) (((uint64_t)(ptr) & ~SHIFT_TAG(0xff)) | \
+ SHIFT_TAG(tag))
+
+int main(void)
+{
+ struct utsname *ptr = (struct utsname *)malloc(sizeof(*ptr));
+ void *tagged_ptr = (void *)SET_TAG(ptr, 0x42);
+ int err = uname(tagged_ptr);
+
+ free(ptr);
+ return err;
+}
--
2.21.0.360.g471c308f928-goog
This patch is a part of a series that extends arm64 kernel ABI to allow to
pass tagged user pointers (with the top byte set to something else other
than 0x00) as syscall arguments.
find_active_uprobe() uses provided user pointer (obtained via
instruction_pointer(regs)) for vma lookups, which can only by done with
untagged pointers.
Untag the user pointer in this function.
Signed-off-by: Andrey Konovalov <[email protected]>
---
kernel/events/uprobes.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c5cde87329c7..d3a2716a813a 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1992,6 +1992,8 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
struct uprobe *uprobe = NULL;
struct vm_area_struct *vma;
+ bp_vaddr = untagged_addr(bp_vaddr);
+
down_read(&mm->mmap_sem);
vma = find_vma(mm, bp_vaddr);
if (vma && vma->vm_start <= bp_vaddr) {
--
2.21.0.360.g471c308f928-goog
This patch is a part of a series that extends arm64 kernel ABI to allow to
pass tagged user pointers (with the top byte set to something else other
than 0x00) as syscall arguments.
stack_map_get_build_id_offset() uses provided user pointers for vma
lookups, which can only by done with untagged pointers.
Untag the user pointer in this function for doing the lookup and
calculating the offset, but save as is into the bpf_stack_build_id
struct.
Signed-off-by: Andrey Konovalov <[email protected]>
---
kernel/bpf/stackmap.c | 6 ++++--
p | 45 -------------------------------------------
2 files changed, 4 insertions(+), 47 deletions(-)
delete mode 100644 p
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 950ab2f28922..bb89341d3faf 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -320,7 +320,9 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
}
for (i = 0; i < trace_nr; i++) {
- vma = find_vma(current->mm, ips[i]);
+ u64 untagged_ip = untagged_addr(ips[i]);
+
+ vma = find_vma(current->mm, untagged_ip);
if (!vma || stack_map_get_build_id(vma, id_offs[i].build_id)) {
/* per entry fall back to ips */
id_offs[i].status = BPF_STACK_BUILD_ID_IP;
@@ -328,7 +330,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
memset(id_offs[i].build_id, 0, BPF_BUILD_ID_SIZE);
continue;
}
- id_offs[i].offset = (vma->vm_pgoff << PAGE_SHIFT) + ips[i]
+ id_offs[i].offset = (vma->vm_pgoff << PAGE_SHIFT) + untagged_ip
- vma->vm_start;
id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
}
diff --git a/p b/p
deleted file mode 100644
index 9d6fa5386e55..000000000000
--- a/p
+++ /dev/null
@@ -1,45 +0,0 @@
-commit 1fa6fadf644859e8a6a8ecce258444b49be8c7ee
-Author: Andrey Konovalov <[email protected]>
-Date: Mon Mar 4 17:20:32 2019 +0100
-
- kasan: fix coccinelle warnings in kasan_p*_table
-
- kasan_p4d_table, kasan_pmd_table and kasan_pud_table are declared as
- returning bool, but return 0 instead of false, which produces a coccinelle
- warning. Fix it.
-
- Fixes: 0207df4fa1a8 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
- Reported-by: kbuild test robot <[email protected]>
- Signed-off-by: Andrey Konovalov <[email protected]>
-
-diff --git a/mm/kasan/init.c b/mm/kasan/init.c
-index 45a1b5e38e1e..fcaa1ca03175 100644
---- a/mm/kasan/init.c
-+++ b/mm/kasan/init.c
-@@ -42,7 +42,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
- #else
- static inline bool kasan_p4d_table(pgd_t pgd)
- {
-- return 0;
-+ return false;
- }
- #endif
- #if CONFIG_PGTABLE_LEVELS > 3
-@@ -54,7 +54,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
- #else
- static inline bool kasan_pud_table(p4d_t p4d)
- {
-- return 0;
-+ return false;
- }
- #endif
- #if CONFIG_PGTABLE_LEVELS > 2
-@@ -66,7 +66,7 @@ static inline bool kasan_pmd_table(pud_t pud)
- #else
- static inline bool kasan_pmd_table(pud_t pud)
- {
-- return 0;
-+ return false;
- }
- #endif
- pte_t kasan_early_shadow_pte[PTRS_PER_PTE] __page_aligned_bss;
--
2.21.0.360.g471c308f928-goog
This patch is a part of a series that extends arm64 kernel ABI to allow to
pass tagged user pointers (with the top byte set to something else other
than 0x00) as syscall arguments.
seq_print_user_ip() uses provided user pointers for vma lookups, which
can only by done with untagged pointers.
Untag user pointers in this function.
Signed-off-by: Andrey Konovalov <[email protected]>
---
kernel/trace/trace_output.c | 5 +++--
p | 45 +++++++++++++++++++++++++++++++++++++
2 files changed, 48 insertions(+), 2 deletions(-)
create mode 100644 p
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 54373d93e251..6376bee93c84 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -370,6 +370,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
{
struct file *file = NULL;
unsigned long vmstart = 0;
+ unsigned long untagged_ip = untagged_addr(ip);
int ret = 1;
if (s->full)
@@ -379,7 +380,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
const struct vm_area_struct *vma;
down_read(&mm->mmap_sem);
- vma = find_vma(mm, ip);
+ vma = find_vma(mm, untagged_ip);
if (vma) {
file = vma->vm_file;
vmstart = vma->vm_start;
@@ -388,7 +389,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
ret = trace_seq_path(s, &file->f_path);
if (ret)
trace_seq_printf(s, "[+0x%lx]",
- ip - vmstart);
+ untagged_ip - vmstart);
}
up_read(&mm->mmap_sem);
}
diff --git a/p b/p
new file mode 100644
index 000000000000..9d6fa5386e55
--- /dev/null
+++ b/p
@@ -0,0 +1,45 @@
+commit 1fa6fadf644859e8a6a8ecce258444b49be8c7ee
+Author: Andrey Konovalov <[email protected]>
+Date: Mon Mar 4 17:20:32 2019 +0100
+
+ kasan: fix coccinelle warnings in kasan_p*_table
+
+ kasan_p4d_table, kasan_pmd_table and kasan_pud_table are declared as
+ returning bool, but return 0 instead of false, which produces a coccinelle
+ warning. Fix it.
+
+ Fixes: 0207df4fa1a8 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
+ Reported-by: kbuild test robot <[email protected]>
+ Signed-off-by: Andrey Konovalov <[email protected]>
+
+diff --git a/mm/kasan/init.c b/mm/kasan/init.c
+index 45a1b5e38e1e..fcaa1ca03175 100644
+--- a/mm/kasan/init.c
++++ b/mm/kasan/init.c
+@@ -42,7 +42,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
+ #else
+ static inline bool kasan_p4d_table(pgd_t pgd)
+ {
+- return 0;
++ return false;
+ }
+ #endif
+ #if CONFIG_PGTABLE_LEVELS > 3
+@@ -54,7 +54,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
+ #else
+ static inline bool kasan_pud_table(p4d_t p4d)
+ {
+- return 0;
++ return false;
+ }
+ #endif
+ #if CONFIG_PGTABLE_LEVELS > 2
+@@ -66,7 +66,7 @@ static inline bool kasan_pmd_table(pud_t pud)
+ #else
+ static inline bool kasan_pmd_table(pud_t pud)
+ {
+- return 0;
++ return false;
+ }
+ #endif
+ pte_t kasan_early_shadow_pte[PTRS_PER_PTE] __page_aligned_bss;
--
2.21.0.360.g471c308f928-goog
This patch is a part of a series that extends arm64 kernel ABI to allow to
pass tagged user pointers (with the top byte set to something else other
than 0x00) as syscall arguments.
In copy_mount_options a user address is being subtracted from TASK_SIZE.
If the address is lower than TASK_SIZE, the size is calculated to not
allow the exact_copy_from_user() call to cross TASK_SIZE boundary.
However if the address is tagged, then the size will be calculated
incorrectly.
Untag the address before subtracting.
Signed-off-by: Andrey Konovalov <[email protected]>
---
fs/namespace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index c9cab307fa77..c27e5713bf04 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2825,7 +2825,7 @@ void *copy_mount_options(const void __user * data)
* the remainder of the page.
*/
/* copy_from_user cannot cross TASK_SIZE ! */
- size = TASK_SIZE - (unsigned long)data;
+ size = TASK_SIZE - (unsigned long)untagged_addr(data);
if (size > PAGE_SIZE)
size = PAGE_SIZE;
--
2.21.0.360.g471c308f928-goog
This patch is a part of a series that extends arm64 kernel ABI to allow to
pass tagged user pointers (with the top byte set to something else other
than 0x00) as syscall arguments.
copy_from_user (and a few other similar functions) are used to copy data
from user memory into the kernel memory or vice versa. Since a user can
provided a tagged pointer to one of the syscalls that use copy_from_user,
we need to correctly handle such pointers.
Do this by untagging user pointers in access_ok and in __uaccess_mask_ptr,
before performing access validity checks.
Note, that this patch only temporarily untags the pointers to perform the
checks, but then passes them as is into the kernel internals.
Reviewed-by: Catalin Marinas <[email protected]>
Signed-off-by: Andrey Konovalov <[email protected]>
---
arch/arm64/include/asm/uaccess.h | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index e5d5f31c6d36..9164ecb5feca 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -94,7 +94,7 @@ static inline unsigned long __range_ok(const void __user *addr, unsigned long si
return ret;
}
-#define access_ok(addr, size) __range_ok(addr, size)
+#define access_ok(addr, size) __range_ok(untagged_addr(addr), size)
#define user_addr_max get_fs
#define _ASM_EXTABLE(from, to) \
@@ -226,7 +226,8 @@ static inline void uaccess_enable_not_uao(void)
/*
* Sanitise a uaccess pointer such that it becomes NULL if above the
- * current addr_limit.
+ * current addr_limit. In case the pointer is tagged (has the top byte set),
+ * untag the pointer before checking.
*/
#define uaccess_mask_ptr(ptr) (__typeof__(ptr))__uaccess_mask_ptr(ptr)
static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
@@ -234,10 +235,11 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
void __user *safe_ptr;
asm volatile(
- " bics xzr, %1, %2\n"
+ " bics xzr, %3, %2\n"
" csel %0, %1, xzr, eq\n"
: "=&r" (safe_ptr)
- : "r" (ptr), "r" (current_thread_info()->addr_limit)
+ : "r" (ptr), "r" (current_thread_info()->addr_limit),
+ "r" (untagged_addr(ptr))
: "cc");
csdb();
--
2.21.0.360.g471c308f928-goog
This patch is a part of a series that extends arm64 kernel ABI to allow to
pass tagged user pointers (with the top byte set to something else other
than 0x00) as syscall arguments.
This patch allows tagged pointers to be passed to the following memory
syscalls: madvise, mbind, get_mempolicy, mincore, mlock, mlock2, brk,
mmap_pgoff, old_mmap, munmap, remap_file_pages, mprotect, pkey_mprotect,
mremap, msync and shmdt.
This is done by untagging pointers passed to these syscalls in the
prologues of their handlers.
Signed-off-by: Andrey Konovalov <[email protected]>
---
ipc/shm.c | 2 ++
mm/madvise.c | 2 ++
mm/mempolicy.c | 5 +++++
mm/migrate.c | 1 +
mm/mincore.c | 2 ++
mm/mlock.c | 5 +++++
mm/mmap.c | 7 +++++++
mm/mprotect.c | 1 +
mm/mremap.c | 2 ++
mm/msync.c | 2 ++
10 files changed, 29 insertions(+)
diff --git a/ipc/shm.c b/ipc/shm.c
index ce1ca9f7c6e9..7af8951e6c41 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1593,6 +1593,7 @@ SYSCALL_DEFINE3(shmat, int, shmid, char __user *, shmaddr, int, shmflg)
unsigned long ret;
long err;
+ shmaddr = untagged_addr(shmaddr);
err = do_shmat(shmid, shmaddr, shmflg, &ret, SHMLBA);
if (err)
return err;
@@ -1732,6 +1733,7 @@ long ksys_shmdt(char __user *shmaddr)
SYSCALL_DEFINE1(shmdt, char __user *, shmaddr)
{
+ shmaddr = untagged_addr(shmaddr);
return ksys_shmdt(shmaddr);
}
diff --git a/mm/madvise.c b/mm/madvise.c
index 21a7881a2db4..64e6d34a7f9b 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -809,6 +809,8 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
size_t len;
struct blk_plug plug;
+ start = untagged_addr(start);
+
if (!madvise_behavior_valid(behavior))
return error;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index af171ccb56a2..31691737c59c 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1334,6 +1334,7 @@ static long kernel_mbind(unsigned long start, unsigned long len,
int err;
unsigned short mode_flags;
+ start = untagged_addr(start);
mode_flags = mode & MPOL_MODE_FLAGS;
mode &= ~MPOL_MODE_FLAGS;
if (mode >= MPOL_MAX)
@@ -1491,6 +1492,8 @@ static int kernel_get_mempolicy(int __user *policy,
int uninitialized_var(pval);
nodemask_t nodes;
+ addr = untagged_addr(addr);
+
if (nmask != NULL && maxnode < nr_node_ids)
return -EINVAL;
@@ -1576,6 +1579,8 @@ COMPAT_SYSCALL_DEFINE6(mbind, compat_ulong_t, start, compat_ulong_t, len,
unsigned long nr_bits, alloc_size;
nodemask_t bm;
+ start = untagged_addr(start);
+
nr_bits = min_t(unsigned long, maxnode-1, MAX_NUMNODES);
alloc_size = ALIGN(nr_bits, BITS_PER_LONG) / 8;
diff --git a/mm/migrate.c b/mm/migrate.c
index ac6f4939bb59..ecc6dcdefb1f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1612,6 +1612,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
if (get_user(node, nodes + i))
goto out_flush;
addr = (unsigned long)p;
+ addr = untagged_addr(addr);
err = -ENODEV;
if (node < 0 || node >= MAX_NUMNODES)
diff --git a/mm/mincore.c b/mm/mincore.c
index 218099b5ed31..c4a3f4484b6b 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -228,6 +228,8 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len,
unsigned long pages;
unsigned char *tmp;
+ start = untagged_addr(start);
+
/* Check the start address: needs to be page-aligned.. */
if (start & ~PAGE_MASK)
return -EINVAL;
diff --git a/mm/mlock.c b/mm/mlock.c
index 080f3b36415b..6934ec92bf39 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -715,6 +715,7 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla
SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
{
+ start = untagged_addr(start);
return do_mlock(start, len, VM_LOCKED);
}
@@ -722,6 +723,8 @@ SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
{
vm_flags_t vm_flags = VM_LOCKED;
+ start = untagged_addr(start);
+
if (flags & ~MLOCK_ONFAULT)
return -EINVAL;
@@ -735,6 +738,8 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
{
int ret;
+ start = untagged_addr(start);
+
len = PAGE_ALIGN(len + (offset_in_page(start)));
start &= PAGE_MASK;
diff --git a/mm/mmap.c b/mm/mmap.c
index 41eb48d9b527..512c679c7f33 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -199,6 +199,8 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
bool downgraded = false;
LIST_HEAD(uf);
+ brk = untagged_addr(brk);
+
if (down_write_killable(&mm->mmap_sem))
return -EINTR;
@@ -1571,6 +1573,8 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
struct file *file = NULL;
unsigned long retval;
+ addr = untagged_addr(addr);
+
if (!(flags & MAP_ANONYMOUS)) {
audit_mmap_fd(fd, flags);
file = fget(fd);
@@ -2867,6 +2871,7 @@ EXPORT_SYMBOL(vm_munmap);
SYSCALL_DEFINE2(munmap, unsigned long, addr, size_t, len)
{
+ addr = untagged_addr(addr);
profile_munmap(addr);
return __vm_munmap(addr, len, true);
}
@@ -2885,6 +2890,8 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
unsigned long ret = -EINVAL;
struct file *file;
+ start = untagged_addr(start);
+
pr_warn_once("%s (%d) uses deprecated remap_file_pages() syscall. See Documentation/vm/remap_file_pages.rst.\n",
current->comm, current->pid);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 028c724dcb1a..3c2b11629f89 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -468,6 +468,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
if (grows == (PROT_GROWSDOWN|PROT_GROWSUP)) /* can't be both */
return -EINVAL;
+ start = untagged_addr(start);
if (start & ~PAGE_MASK)
return -EINVAL;
if (!len)
diff --git a/mm/mremap.c b/mm/mremap.c
index e3edef6b7a12..6422aeee65bb 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -605,6 +605,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
LIST_HEAD(uf_unmap_early);
LIST_HEAD(uf_unmap);
+ addr = untagged_addr(addr);
+
if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
return ret;
diff --git a/mm/msync.c b/mm/msync.c
index ef30a429623a..c3bd3e75f687 100644
--- a/mm/msync.c
+++ b/mm/msync.c
@@ -37,6 +37,8 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags)
int unmapped_error = 0;
int error = -EINVAL;
+ start = untagged_addr(start);
+
if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC))
goto out;
if (offset_in_page(start))
--
2.21.0.360.g471c308f928-goog
This patch is a part of a series that extends arm64 kernel ABI to allow to
pass tagged user pointers (with the top byte set to something else other
than 0x00) as syscall arguments.
userfaultfd_register() and userfaultfd_unregister() use provided user
pointers for vma lookups, which can only by done with untagged pointers.
Untag user pointers in these functions.
Signed-off-by: Andrey Konovalov <[email protected]>
---
fs/userfaultfd.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 89800fc7dc9d..a3b70e0d9756 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1320,6 +1320,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
goto out;
}
+ uffdio_register.range.start =
+ untagged_addr(uffdio_register.range.start);
+
ret = validate_range(mm, uffdio_register.range.start,
uffdio_register.range.len);
if (ret)
@@ -1507,6 +1510,8 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister)))
goto out;
+ uffdio_unregister.start = untagged_addr(uffdio_unregister.start);
+
ret = validate_range(mm, uffdio_unregister.start,
uffdio_unregister.len);
if (ret)
--
2.21.0.360.g471c308f928-goog
This patch is a part of a series that extends arm64 kernel ABI to allow to
pass tagged user pointers (with the top byte set to something else other
than 0x00) as syscall arguments.
prctl_set_mm() and prctl_set_mm_map() use provided user pointers for vma
lookups, which can only by done with untagged pointers.
Untag user pointers in these functions.
Signed-off-by: Andrey Konovalov <[email protected]>
---
kernel/sys.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/kernel/sys.c b/kernel/sys.c
index 12df0e5434b8..8e56d87cc6db 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1993,6 +1993,18 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
if (copy_from_user(&prctl_map, addr, sizeof(prctl_map)))
return -EFAULT;
+ prctl_map->start_code = untagged_addr(prctl_map.start_code);
+ prctl_map->end_code = untagged_addr(prctl_map.end_code);
+ prctl_map->start_data = untagged_addr(prctl_map.start_data);
+ prctl_map->end_data = untagged_addr(prctl_map.end_data);
+ prctl_map->start_brk = untagged_addr(prctl_map.start_brk);
+ prctl_map->brk = untagged_addr(prctl_map.brk);
+ prctl_map->start_stack = untagged_addr(prctl_map.start_stack);
+ prctl_map->arg_start = untagged_addr(prctl_map.arg_start);
+ prctl_map->arg_end = untagged_addr(prctl_map.arg_end);
+ prctl_map->env_start = untagged_addr(prctl_map.env_start);
+ prctl_map->env_end = untagged_addr(prctl_map.env_end);
+
error = validate_prctl_map(&prctl_map);
if (error)
return error;
@@ -2106,6 +2118,8 @@ static int prctl_set_mm(int opt, unsigned long addr,
opt != PR_SET_MM_MAP_SIZE)))
return -EINVAL;
+ addr = untagged_addr(addr);
+
#ifdef CONFIG_CHECKPOINT_RESTORE
if (opt == PR_SET_MM_MAP || opt == PR_SET_MM_MAP_SIZE)
return prctl_set_mm_map(opt, (const void __user *)addr, arg4);
--
2.21.0.360.g471c308f928-goog
On 03/15/2019 12:51 PM, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
>
> tcp_zerocopy_receive() uses provided user pointers for vma lookups, which
> can only by done with untagged pointers.
>
> Untag user pointers in this function.
>
> Signed-off-by: Andrey Konovalov <[email protected]>
> ---
> net/ipv4/tcp.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 6baa6dc1b13b..89db3b4fc753 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1758,6 +1758,8 @@ static int tcp_zerocopy_receive(struct sock *sk,
> int inq;
> int ret;
>
> + address = untagged_addr(address);
> +
> if (address & (PAGE_SIZE - 1) || address != zc->address)
The second test will fail, if the top bits are changed in address but not in zc->address
> return -EINVAL;
>
>
On Fri, 15 Mar 2019 20:51:34 +0100
Andrey Konovalov <[email protected]> wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
>
> seq_print_user_ip() uses provided user pointers for vma lookups, which
> can only by done with untagged pointers.
>
> Untag user pointers in this function.
>
> Signed-off-by: Andrey Konovalov <[email protected]>
> ---
> kernel/trace/trace_output.c | 5 +++--
> p | 45 +++++++++++++++++++++++++++++++++++++
> 2 files changed, 48 insertions(+), 2 deletions(-)
> create mode 100644 p
>
> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index 54373d93e251..6376bee93c84 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -370,6 +370,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
> {
> struct file *file = NULL;
> unsigned long vmstart = 0;
> + unsigned long untagged_ip = untagged_addr(ip);
> int ret = 1;
>
> if (s->full)
> @@ -379,7 +380,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
> const struct vm_area_struct *vma;
>
> down_read(&mm->mmap_sem);
> - vma = find_vma(mm, ip);
> + vma = find_vma(mm, untagged_ip);
> if (vma) {
> file = vma->vm_file;
> vmstart = vma->vm_start;
> @@ -388,7 +389,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
> ret = trace_seq_path(s, &file->f_path);
> if (ret)
> trace_seq_printf(s, "[+0x%lx]",
> - ip - vmstart);
> + untagged_ip - vmstart);
> }
> up_read(&mm->mmap_sem);
> }
> diff --git a/p b/p
> new file mode 100644
> index 000000000000..9d6fa5386e55
> --- /dev/null
> +++ b/p
> @@ -0,0 +1,45 @@
> +commit 1fa6fadf644859e8a6a8ecce258444b49be8c7ee
> +Author: Andrey Konovalov <[email protected]>
> +Date: Mon Mar 4 17:20:32 2019 +0100
> +
> + kasan: fix coccinelle warnings in kasan_p*_table
> +
> + kasan_p4d_table, kasan_pmd_table and kasan_pud_table are declared as
> + returning bool, but return 0 instead of false, which produces a coccinelle
> + warning. Fix it.
> +
> + Fixes: 0207df4fa1a8 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
> + Reported-by: kbuild test robot <[email protected]>
> + Signed-off-by: Andrey Konovalov <[email protected]>
Did you mean to append this commit to this patch?
-- Steve
> +
> +diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> +index 45a1b5e38e1e..fcaa1ca03175 100644
> +--- a/mm/kasan/init.c
> ++++ b/mm/kasan/init.c
> +@@ -42,7 +42,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
> + #else
> + static inline bool kasan_p4d_table(pgd_t pgd)
> + {
> +- return 0;
> ++ return false;
> + }
> + #endif
> + #if CONFIG_PGTABLE_LEVELS > 3
> +@@ -54,7 +54,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
> + #else
> + static inline bool kasan_pud_table(p4d_t p4d)
> + {
> +- return 0;
> ++ return false;
> + }
> + #endif
> + #if CONFIG_PGTABLE_LEVELS > 2
> +@@ -66,7 +66,7 @@ static inline bool kasan_pmd_table(pud_t pud)
> + #else
> + static inline bool kasan_pmd_table(pud_t pud)
> + {
> +- return 0;
> ++ return false;
> + }
> + #endif
> + pte_t kasan_early_shadow_pte[PTRS_PER_PTE] __page_aligned_bss;
On 15/03/2019 19:51, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
>
> strncpy_from_user and strnlen_user accept user addresses as arguments, and
> do not go through the same path as copy_from_user and others, so here we
> need to handle the case of tagged user addresses separately.
>
> Untag user pointers passed to these functions.
>
> Note, that this patch only temporarily untags the pointers to perform
> validity checks, but then uses them as is to perform user memory accesses.
Thank you for this new version, looks good to me.
To give a bit of context to the readers, I asked Andrey to make this change, because
it makes a difference with hardware memory tagging. Indeed, in that situation, it is
always preferable to access the memory using the user-provided tag, so that tag
checking can take place; if there is a mismatch, a tag fault will occur (which is
handled in a way similar to a page fault). It is also preferable not to assume that
an untagged user pointer (tag 0x0) bypasses tag checks.
Kevin
>
> Signed-off-by: Andrey Konovalov <[email protected]>
> ---
> lib/strncpy_from_user.c | 3 ++-
> lib/strnlen_user.c | 3 ++-
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c
> index 58eacd41526c..6209bb9507c7 100644
> --- a/lib/strncpy_from_user.c
> +++ b/lib/strncpy_from_user.c
> @@ -6,6 +6,7 @@
> #include <linux/uaccess.h>
> #include <linux/kernel.h>
> #include <linux/errno.h>
> +#include <linux/mm.h>
>
> #include <asm/byteorder.h>
> #include <asm/word-at-a-time.h>
> @@ -107,7 +108,7 @@ long strncpy_from_user(char *dst, const char __user *src, long count)
> return 0;
>
> max_addr = user_addr_max();
> - src_addr = (unsigned long)src;
> + src_addr = (unsigned long)untagged_addr(src);
> if (likely(src_addr < max_addr)) {
> unsigned long max = max_addr - src_addr;
> long retval;
> diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c
> index 1c1a1b0e38a5..8ca3d2ac32ec 100644
> --- a/lib/strnlen_user.c
> +++ b/lib/strnlen_user.c
> @@ -2,6 +2,7 @@
> #include <linux/kernel.h>
> #include <linux/export.h>
> #include <linux/uaccess.h>
> +#include <linux/mm.h>
>
> #include <asm/word-at-a-time.h>
>
> @@ -109,7 +110,7 @@ long strnlen_user(const char __user *str, long count)
> return 0;
>
> max_addr = user_addr_max();
> - src_addr = (unsigned long)str;
> + src_addr = (unsigned long)untagged_addr(str);
> if (likely(src_addr < max_addr)) {
> unsigned long max = max_addr - src_addr;
> long retval;
On 15/03/2019 19:51, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
>
> prctl_set_mm() and prctl_set_mm_map() use provided user pointers for vma
> lookups, which can only by done with untagged pointers.
>
> Untag user pointers in these functions.
>
> Signed-off-by: Andrey Konovalov <[email protected]>
> ---
> kernel/sys.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 12df0e5434b8..8e56d87cc6db 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1993,6 +1993,18 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
> if (copy_from_user(&prctl_map, addr, sizeof(prctl_map)))
> return -EFAULT;
>
> + prctl_map->start_code = untagged_addr(prctl_map.start_code);
> + prctl_map->end_code = untagged_addr(prctl_map.end_code);
> + prctl_map->start_data = untagged_addr(prctl_map.start_data);
> + prctl_map->end_data = untagged_addr(prctl_map.end_data);
> + prctl_map->start_brk = untagged_addr(prctl_map.start_brk);
> + prctl_map->brk = untagged_addr(prctl_map.brk);
> + prctl_map->start_stack = untagged_addr(prctl_map.start_stack);
> + prctl_map->arg_start = untagged_addr(prctl_map.arg_start);
> + prctl_map->arg_end = untagged_addr(prctl_map.arg_end);
> + prctl_map->env_start = untagged_addr(prctl_map.env_start);
> + prctl_map->env_end = untagged_addr(prctl_map.env_end);
As the buildbot suggests, those -> should be . instead :) You might want to check
your local build with CONFIG_CHECKPOINT_RESTORE=y.
> +
> error = validate_prctl_map(&prctl_map);
> if (error)
> return error;
> @@ -2106,6 +2118,8 @@ static int prctl_set_mm(int opt, unsigned long addr,
> opt != PR_SET_MM_MAP_SIZE)))
> return -EINVAL;
>
> + addr = untagged_addr(addr);
This is a bit too coarse, addr is indeed used for find_vma() later on, but it is also
used to access memory, by prctl_set_mm_mmap() and prctl_set_auxv().
Kevin
> +
> #ifdef CONFIG_CHECKPOINT_RESTORE
> if (opt == PR_SET_MM_MAP || opt == PR_SET_MM_MAP_SIZE)
> return prctl_set_mm_map(opt, (const void __user *)addr, arg4);
On Fri, Mar 15, 2019 at 9:14 PM Steven Rostedt <[email protected]> wrote:
>
> On Fri, 15 Mar 2019 20:51:34 +0100
> Andrey Konovalov <[email protected]> wrote:
>
> > This patch is a part of a series that extends arm64 kernel ABI to allow to
> > pass tagged user pointers (with the top byte set to something else other
> > than 0x00) as syscall arguments.
> >
> > seq_print_user_ip() uses provided user pointers for vma lookups, which
> > can only by done with untagged pointers.
> >
> > Untag user pointers in this function.
> >
> > Signed-off-by: Andrey Konovalov <[email protected]>
> > ---
> > kernel/trace/trace_output.c | 5 +++--
> > p | 45 +++++++++++++++++++++++++++++++++++++
> > 2 files changed, 48 insertions(+), 2 deletions(-)
> > create mode 100644 p
> >
> > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> > index 54373d93e251..6376bee93c84 100644
> > --- a/kernel/trace/trace_output.c
> > +++ b/kernel/trace/trace_output.c
> > @@ -370,6 +370,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
> > {
> > struct file *file = NULL;
> > unsigned long vmstart = 0;
> > + unsigned long untagged_ip = untagged_addr(ip);
> > int ret = 1;
> >
> > if (s->full)
> > @@ -379,7 +380,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
> > const struct vm_area_struct *vma;
> >
> > down_read(&mm->mmap_sem);
> > - vma = find_vma(mm, ip);
> > + vma = find_vma(mm, untagged_ip);
> > if (vma) {
> > file = vma->vm_file;
> > vmstart = vma->vm_start;
> > @@ -388,7 +389,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
> > ret = trace_seq_path(s, &file->f_path);
> > if (ret)
> > trace_seq_printf(s, "[+0x%lx]",
> > - ip - vmstart);
> > + untagged_ip - vmstart);
> > }
> > up_read(&mm->mmap_sem);
> > }
> > diff --git a/p b/p
> > new file mode 100644
> > index 000000000000..9d6fa5386e55
> > --- /dev/null
> > +++ b/p
> > @@ -0,0 +1,45 @@
> > +commit 1fa6fadf644859e8a6a8ecce258444b49be8c7ee
> > +Author: Andrey Konovalov <[email protected]>
> > +Date: Mon Mar 4 17:20:32 2019 +0100
> > +
> > + kasan: fix coccinelle warnings in kasan_p*_table
> > +
> > + kasan_p4d_table, kasan_pmd_table and kasan_pud_table are declared as
> > + returning bool, but return 0 instead of false, which produces a coccinelle
> > + warning. Fix it.
> > +
> > + Fixes: 0207df4fa1a8 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
> > + Reported-by: kbuild test robot <[email protected]>
> > + Signed-off-by: Andrey Konovalov <[email protected]>
>
> Did you mean to append this commit to this patch?
No, did it by mistake. Will remove in v12, thanks for noticing!
>
> -- Steve
>
> > +
> > +diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> > +index 45a1b5e38e1e..fcaa1ca03175 100644
> > +--- a/mm/kasan/init.c
> > ++++ b/mm/kasan/init.c
> > +@@ -42,7 +42,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
> > + #else
> > + static inline bool kasan_p4d_table(pgd_t pgd)
> > + {
> > +- return 0;
> > ++ return false;
> > + }
> > + #endif
> > + #if CONFIG_PGTABLE_LEVELS > 3
> > +@@ -54,7 +54,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
> > + #else
> > + static inline bool kasan_pud_table(p4d_t p4d)
> > + {
> > +- return 0;
> > ++ return false;
> > + }
> > + #endif
> > + #if CONFIG_PGTABLE_LEVELS > 2
> > +@@ -66,7 +66,7 @@ static inline bool kasan_pmd_table(pud_t pud)
> > + #else
> > + static inline bool kasan_pmd_table(pud_t pud)
> > + {
> > +- return 0;
> > ++ return false;
> > + }
> > + #endif
> > + pte_t kasan_early_shadow_pte[PTRS_PER_PTE] __page_aligned_bss;
>
On Fri, Mar 15, 2019 at 9:03 PM Eric Dumazet <[email protected]> wrote:
>
>
>
> On 03/15/2019 12:51 PM, Andrey Konovalov wrote:
> > This patch is a part of a series that extends arm64 kernel ABI to allow to
> > pass tagged user pointers (with the top byte set to something else other
> > than 0x00) as syscall arguments.
> >
> > tcp_zerocopy_receive() uses provided user pointers for vma lookups, which
> > can only by done with untagged pointers.
> >
> > Untag user pointers in this function.
> >
> > Signed-off-by: Andrey Konovalov <[email protected]>
> > ---
> > net/ipv4/tcp.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 6baa6dc1b13b..89db3b4fc753 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -1758,6 +1758,8 @@ static int tcp_zerocopy_receive(struct sock *sk,
> > int inq;
> > int ret;
> >
> > + address = untagged_addr(address);
> > +
> > if (address & (PAGE_SIZE - 1) || address != zc->address)
>
> The second test will fail, if the top bits are changed in address but not in zc->address
Will fix in v12, thanks Eric!
>
> > return -EINVAL;
> >
> >
>
On Mon, Mar 18, 2019 at 2:14 PM Andrey Konovalov <[email protected]> wrote:
>
> On Fri, Mar 15, 2019 at 9:03 PM Eric Dumazet <[email protected]> wrote:
> >
> >
> >
> > On 03/15/2019 12:51 PM, Andrey Konovalov wrote:
> > > This patch is a part of a series that extends arm64 kernel ABI to allow to
> > > pass tagged user pointers (with the top byte set to something else other
> > > than 0x00) as syscall arguments.
> > >
> > > tcp_zerocopy_receive() uses provided user pointers for vma lookups, which
> > > can only by done with untagged pointers.
> > >
> > > Untag user pointers in this function.
> > >
> > > Signed-off-by: Andrey Konovalov <[email protected]>
> > > ---
> > > net/ipv4/tcp.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > index 6baa6dc1b13b..89db3b4fc753 100644
> > > --- a/net/ipv4/tcp.c
> > > +++ b/net/ipv4/tcp.c
> > > @@ -1758,6 +1758,8 @@ static int tcp_zerocopy_receive(struct sock *sk,
> > > int inq;
> > > int ret;
> > >
> > > + address = untagged_addr(address);
> > > +
> > > if (address & (PAGE_SIZE - 1) || address != zc->address)
> >
> > The second test will fail, if the top bits are changed in address but not in zc->address
>
> Will fix in v12, thanks Eric!
Looking at the code, what's the point of this address != zc->address
check? Should I just remove it?
>
> >
> > > return -EINVAL;
> > >
> > >
> >
On 15/03/2019 19:51, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
>
> Document the ABI changes in Documentation/arm64/tagged-pointers.txt.
>
> Signed-off-by: Andrey Konovalov <[email protected]>
> ---
> Documentation/arm64/tagged-pointers.txt | 18 ++++++++----------
> 1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/Documentation/arm64/tagged-pointers.txt b/Documentation/arm64/tagged-pointers.txt
> index a25a99e82bb1..07fdddeacad0 100644
> --- a/Documentation/arm64/tagged-pointers.txt
> +++ b/Documentation/arm64/tagged-pointers.txt
> @@ -17,13 +17,15 @@ this byte for application use.
> Passing tagged addresses to the kernel
> --------------------------------------
>
> -All interpretation of userspace memory addresses by the kernel assumes
> -an address tag of 0x00.
> +The kernel supports tags in pointer arguments (including pointers in
> +structures) of syscalls, however such pointers must point to memory ranges
> +obtained by anonymous mmap() or brk().
>
> -This includes, but is not limited to, addresses found in:
> +The kernel supports tags in user fault addresses. However the fault_address
> +field in the sigcontext struct will contain an untagged address.
>
> - - pointer arguments to system calls, including pointers in structures
> - passed to system calls,
> +All other interpretations of userspace memory addresses by the kernel
> +assume an address tag of 0x00, in particular:
>
> - the stack pointer (sp), e.g. when interpreting it to deliver a
> signal,
> @@ -33,11 +35,7 @@ This includes, but is not limited to, addresses found in:
>
> Using non-zero address tags in any of these locations may result in an
> error code being returned, a (fatal) signal being raised, or other modes
> -of failure.
> -
> -For these reasons, passing non-zero address tags to the kernel via
> -system calls is forbidden, and using a non-zero address tag for sp is
> -strongly discouraged.
> +of failure. Using a non-zero address tag for sp is strongly discouraged.
I don't understand why we should keep such a limitation. For MTE, tagging SP is
something we are definitely considering. This does bother userspace software in some
rare cases, but I'm not sure in what way it bothers the kernel.
Kevin
>
> Programs maintaining a frame pointer and frame records that use non-zero
> address tags may suffer impaired or inaccurate debug and profiling
On Mon, Mar 18, 2019 at 6:17 AM Andrey Konovalov <[email protected]> wrote:
>
> Looking at the code, what's the point of this address != zc->address
> check? Should I just remove it?
No you must not remove it.
The test detects if a u64 ->unsigned long conversion might have truncated bits.
Quite surprisingly some people still use 32bit kernels.
The ABI is 64bit only, because we did not want to have yet another compat layer.
struct tcp_zerocopy_receive {
__u64 address; /* in: address of mapping */
__u32 length; /* in/out: number of bytes to map/mapped */
__u32 recv_skip_hint; /* out: amount of bytes to skip */
};
On arm64 the TCR_EL1.TBI0 bit has been always enabled in the Linux
kernel hence the userspace (EL0) is allowed to set a non-zero value
in the top byte but the resulting pointers are not allowed at the
user-kernel syscall ABI boundary.
This patchset proposes a relaxation of the ABI and a mechanism to
advertise it to the userspace via an AT_FLAGS.
The rationale behind the choice of AT_FLAGS is that the Unix System V
ABI defines AT_FLAGS as "flags", leaving some degree of freedom in
interpretation.
There are two previous attempts of using AT_FLAGS in the Linux Kernel
for different reasons: the first was more generic and was used to expose
the support for the GNU STACK NX feature [1] and the second was done for
the MIPS architecture and was used to expose the support of "MIPS ABI
Extension for IEEE Std 754 Non-Compliant Interlinking" [2].
Both the changes are currently _not_ merged in mainline.
The only architecture that reserves some of the bits in AT_FLAGS is
currently MIPS, which introduced the concept of platform specific ABI
(psABI) reserving the top-byte [3].
When ARM64_AT_FLAGS_SYSCALL_TBI is set the kernel is advertising
to the userspace that a relaxed ABI is supported hence this type
of pointers are now allowed to be passed to the syscalls when they are
in memory ranges obtained by anonymous mmap() or brk().
The userspace _must_ verify that the flag is set before passing tagged
pointers to the syscalls allowed by this relaxation.
More in general, exposing the ARM64_AT_FLAGS_SYSCALL_TBI flag and mandating
to the software to check that the feature is present, before using the
associated functionality, it provides a degree of control on the decision
of disabling such a feature in future without consequently breaking the
userspace.
The change required a modification of the elf common code, because in Linux
the AT_FLAGS are currently set to zero by default by the kernel.
The newly added flag has been verified on arm64 using the code below.
#include <stdio.h>
#include <stdbool.h>
#include <sys/auxv.h>
#define ARM64_AT_FLAGS_SYSCALL_TBI (1 << 0)
bool arm64_syscall_tbi_is_present(void)
{
unsigned long at_flags = getauxval(AT_FLAGS);
if (at_flags & ARM64_AT_FLAGS_SYSCALL_TBI)
return true;
return false;
}
void main()
{
if (arm64_syscall_tbi_is_present())
printf("ARM64_AT_FLAGS_SYSCALL_TBI is present\n");
}
This patchset should be merged together with [4].
[1] https://patchwork.ozlabs.org/patch/579578/
[2] https://lore.kernel.org/patchwork/cover/618280/
[3] ftp://http://www.linux-mips.org/pub/linux/mips/doc/ABI/psABI_mips3.0.pdf
[4] https://patchwork.kernel.org/cover/10674351/
ABI References:
---------------
Sco SysV ABI: http://www.sco.com/developers/gabi/2003-12-17/contents.html
PowerPC AUXV: http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655242_98651.html
AMD64 ABI: https://www.cs.tufts.edu/comp/40-2012f/readings/amd64-abi.pdf
x86 ABI: https://www.uclibc.org/docs/psABI-i386.pdf
MIPS ABI: ftp://http://www.linux-mips.org/pub/linux/mips/doc/ABI/psABI_mips3.0.pdf
ARM ABI: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044f/IHI0044F_aaelf.pdf
SPARC ABI: http://math-atlas.sourceforge.net/devel/assembly/abi_sysV_sparc.pdf
CC: Alexander Viro <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Branislav Rankov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Dave Martin <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Evgeniy Stepanov <[email protected]>
Cc: Graeme Barnes <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jacob Bramley <[email protected]>
Cc: Kate Stewart <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kevin Brodsky <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Kostya Serebryany <[email protected]>
Cc: Lee Smith <[email protected]>
Cc: Luc Van Oostenryck <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ramana Radhakrishnan <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Ruben Ayrapetyan <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Szabolcs Nagy <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Vincenzo Frascino <[email protected]>
Changes:
--------
v2:
- Rebased on 5.1-rc1
- Addressed review comments
- Modified tagged-pointers.txt to be compliant with the
new ABI relaxation
Vincenzo Frascino (4):
elf: Make AT_FLAGS arch configurable
arm64: Define Documentation/arm64/elf_at_flags.txt
arm64: Relax Documentation/arm64/tagged-pointers.txt
arm64: elf: Advertise relaxed ABI
Documentation/arm64/elf_at_flags.txt | 133 ++++++++++++++++++++++++
Documentation/arm64/tagged-pointers.txt | 23 ++--
arch/arm64/include/asm/atflags.h | 7 ++
arch/arm64/include/asm/elf.h | 5 +
arch/arm64/include/uapi/asm/atflags.h | 8 ++
fs/binfmt_elf.c | 6 +-
fs/binfmt_elf_fdpic.c | 6 +-
fs/compat_binfmt_elf.c | 5 +
8 files changed, 184 insertions(+), 9 deletions(-)
create mode 100644 Documentation/arm64/elf_at_flags.txt
create mode 100644 arch/arm64/include/asm/atflags.h
create mode 100644 arch/arm64/include/uapi/asm/atflags.h
--
2.21.0
Currently, the AT_FLAGS in the elf auxiliary vector are set to 0
by default by the kernel.
Some architectures might need to expose to the userspace a non-zero
value to advertise some platform specific ABI functionalities.
Make AT_FLAGS configurable by the architectures that require it.
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
CC: Andrey Konovalov <[email protected]>
CC: Alexander Viro <[email protected]>
Signed-off-by: Vincenzo Frascino <[email protected]>
---
fs/binfmt_elf.c | 6 +++++-
fs/binfmt_elf_fdpic.c | 6 +++++-
fs/compat_binfmt_elf.c | 5 +++++
3 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 7d09d125f148..f699a9ef5112 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -84,6 +84,10 @@ static int elf_core_dump(struct coredump_params *cprm);
#define ELF_CORE_EFLAGS 0
#endif
+#ifndef ELF_AT_FLAGS
+#define ELF_AT_FLAGS 0
+#endif
+
#define ELF_PAGESTART(_v) ((_v) & ~(unsigned long)(ELF_MIN_ALIGN-1))
#define ELF_PAGEOFFSET(_v) ((_v) & (ELF_MIN_ALIGN-1))
#define ELF_PAGEALIGN(_v) (((_v) + ELF_MIN_ALIGN - 1) & ~(ELF_MIN_ALIGN - 1))
@@ -249,7 +253,7 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
NEW_AUX_ENT(AT_PHENT, sizeof(struct elf_phdr));
NEW_AUX_ENT(AT_PHNUM, exec->e_phnum);
NEW_AUX_ENT(AT_BASE, interp_load_addr);
- NEW_AUX_ENT(AT_FLAGS, 0);
+ NEW_AUX_ENT(AT_FLAGS, ELF_AT_FLAGS);
NEW_AUX_ENT(AT_ENTRY, exec->e_entry);
NEW_AUX_ENT(AT_UID, from_kuid_munged(cred->user_ns, cred->uid));
NEW_AUX_ENT(AT_EUID, from_kuid_munged(cred->user_ns, cred->euid));
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index b53bb3729ac1..cf1e680a6b88 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -82,6 +82,10 @@ static int elf_fdpic_map_file_by_direct_mmap(struct elf_fdpic_params *,
static int elf_fdpic_core_dump(struct coredump_params *cprm);
#endif
+#ifndef ELF_AT_FLAGS
+#define ELF_AT_FLAGS 0
+#endif
+
static struct linux_binfmt elf_fdpic_format = {
.module = THIS_MODULE,
.load_binary = load_elf_fdpic_binary,
@@ -651,7 +655,7 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm,
NEW_AUX_ENT(AT_PHENT, sizeof(struct elf_phdr));
NEW_AUX_ENT(AT_PHNUM, exec_params->hdr.e_phnum);
NEW_AUX_ENT(AT_BASE, interp_params->elfhdr_addr);
- NEW_AUX_ENT(AT_FLAGS, 0);
+ NEW_AUX_ENT(AT_FLAGS, ELF_AT_FLAGS);
NEW_AUX_ENT(AT_ENTRY, exec_params->entry_addr);
NEW_AUX_ENT(AT_UID, (elf_addr_t) from_kuid_munged(cred->user_ns, cred->uid));
NEW_AUX_ENT(AT_EUID, (elf_addr_t) from_kuid_munged(cred->user_ns, cred->euid));
diff --git a/fs/compat_binfmt_elf.c b/fs/compat_binfmt_elf.c
index 15f6e96b3bd9..a21cf99701ae 100644
--- a/fs/compat_binfmt_elf.c
+++ b/fs/compat_binfmt_elf.c
@@ -79,6 +79,11 @@
#define ELF_HWCAP2 COMPAT_ELF_HWCAP2
#endif
+#ifdef COMPAT_ELF_AT_FLAGS
+#undef ELF_AT_FLAGS
+#define ELF_AT_FLAGS COMPAT_ELF_AT_FLAGS
+#endif
+
#ifdef COMPAT_ARCH_DLINFO
#undef ARCH_DLINFO
#define ARCH_DLINFO COMPAT_ARCH_DLINFO
--
2.21.0
On arm64 the TCR_EL1.TBI0 bit has been always enabled hence
the userspace (EL0) is allowed to set a non-zero value in the
top byte but the resulting pointers are not allowed at the
user-kernel syscall ABI boundary.
With the relaxed ABI proposed through this document, it is now possible
to pass tagged pointers to the syscalls, when these pointers are in
memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap() or brk().
This change in the ABI requires a mechanism to inform the userspace
that such an option is available.
Specify and document the way in which AT_FLAGS can be used to advertise
this feature to the userspace.
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
CC: Andrey Konovalov <[email protected]>
Signed-off-by: Vincenzo Frascino <[email protected]>
Squash with "arm64: Define Documentation/arm64/elf_at_flags.txt"
---
Documentation/arm64/elf_at_flags.txt | 133 +++++++++++++++++++++++++++
1 file changed, 133 insertions(+)
create mode 100644 Documentation/arm64/elf_at_flags.txt
diff --git a/Documentation/arm64/elf_at_flags.txt b/Documentation/arm64/elf_at_flags.txt
new file mode 100644
index 000000000000..9b3494207c14
--- /dev/null
+++ b/Documentation/arm64/elf_at_flags.txt
@@ -0,0 +1,133 @@
+ARM64 ELF AT_FLAGS
+==================
+
+This document describes the usage and semantics of AT_FLAGS on arm64.
+
+1. Introduction
+---------------
+
+AT_FLAGS is part of the Auxiliary Vector, contains the flags and it
+is set to zero by the kernel on arm64 unless one or more of the
+features detailed in paragraph 2 are present.
+
+The auxiliary vector can be accessed by the userspace using the
+getauxval() API provided by the C library.
+getauxval() returns an unsigned long and when a flag is present in
+the AT_FLAGS, the corresponding bit in the returned value is set to 1.
+
+The AT_FLAGS with a "defined semantics" on arm64 are exposed to the
+userspace via user API (uapi/asm/atflags.h).
+The AT_FLAGS bits with "undefined semantics" are set to zero by default.
+This means that the AT_FLAGS bits to which this document does not assign
+an explicit meaning are to be intended reserved for future use.
+The kernel will populate all such bits with zero until meanings are
+assigned to them. If and when meanings are assigned, it is guaranteed
+that they will not impact the functional operation of existing userspace
+software. Userspace software should ignore any AT_FLAGS bit whose meaning
+is not defined when the software is written.
+
+The userspace software can test for features by acquiring the AT_FLAGS
+entry of the auxiliary vector, and testing whether a relevant flag
+is set.
+
+Example of a userspace test function:
+
+bool feature_x_is_present(void)
+{
+ unsigned long at_flags = getauxval(AT_FLAGS);
+ if (at_flags & FEATURE_X)
+ return true;
+
+ return false;
+}
+
+Where the software relies on a feature advertised by AT_FLAGS, it
+must check that the feature is present before attempting to
+use it.
+
+2. Features exposed via AT_FLAGS
+--------------------------------
+
+bit[0]: ARM64_AT_FLAGS_SYSCALL_TBI
+
+ On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64
+ kernel, hence the userspace (EL0) is allowed to set a non-zero value
+ in the top byte but the resulting pointers are not allowed at the
+ user-kernel syscall ABI boundary.
+ When bit[0] is set to 1 the kernel is advertising to the userspace
+ that a relaxed ABI is supported hence this type of pointers are now
+ allowed to be passed to the syscalls, when these pointers are in
+ memory ranges privately owned by a process and obtained by the
+ process in accordance with the definition of "valid tagged pointer"
+ in paragraph 3.
+ In these cases the tag is preserved as the pointer goes through the
+ kernel. Only when the kernel needs to check if a pointer is coming
+ from userspace an untag operation is required.
+
+3. ARM64_AT_FLAGS_SYSCALL_TBI
+-----------------------------
+
+From the kernel syscall interface prospective, we define, for the purposes
+of this document, a "valid tagged pointer" as a pointer that either it has
+a zero value set in the top byte or it has a non-zero value, it is in memory
+ranges privately owned by a userspace process and it is obtained in one of
+the following ways:
+ - mmap() done by the process itself, where either:
+ * flags = MAP_PRIVATE | MAP_ANONYMOUS
+ * flags = MAP_PRIVATE and the file descriptor refers to a regular
+ file or "/dev/zero"
+ - a mapping below sbrk(0) done by the process itself
+ - any memory mapped by the kernel in the process's address space during
+ creation and following the restrictions presented above (i.e. data, bss,
+ stack).
+
+When the ARM64_AT_FLAGS_SYSCALL_TBI flag is set by the kernel, the following
+behaviours are guaranteed by the ABI:
+
+ - Every current or newly introduced syscall can accept any valid tagged
+ pointers.
+
+ - If a non valid tagged pointer is passed to a syscall then the behaviour
+ is undefined.
+
+ - Every valid tagged pointer is expected to work as an untagged one.
+
+ - The kernel preserves any valid tagged pointers and returns them to the
+ userspace unchanged in all the cases except the ones documented in the
+ "Preserving tags" paragraph of tagged-pointers.txt.
+
+A definition of the meaning of tagged pointers on arm64 can be found in:
+Documentation/arm64/tagged-pointers.txt.
+
+Example of correct usage (pseudo-code) for a userspace application:
+
+bool arm64_syscall_tbi_is_present(void)
+{
+ unsigned long at_flags = getauxval(AT_FLAGS);
+ if (at_flags & ARM64_AT_FLAGS_SYSCALL_TBI)
+ return true;
+
+ return false;
+}
+
+void main(void)
+{
+ char *addr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS, -1, 0);
+
+ int fd = open("test.txt", O_WRONLY);
+
+ /* Check if the relaxed ABI is supported */
+ if (arm64_syscall_tbi_is_present()) {
+ /* Add a tag to the pointer */
+ addr = tag_pointer(addr);
+ }
+
+ strcpy("Hello World\n", addr);
+
+ /* Write to a file */
+ write(fd, addr, sizeof(addr));
+
+ close(fd);
+}
+
--
2.21.0
On arm64 the TCR_EL1.TBI0 bit has been always enabled hence
the userspace (EL0) is allowed to set a non-zero value in the
top byte but the resulting pointers are not allowed at the
user-kernel syscall ABI boundary.
With the relaxed ABI proposed in this set, it is now possible to pass
tagged pointers to the syscalls, when these pointers are in memory
ranges obtained by an anonymous (MAP_ANONYMOUS) mmap() or sbrk().
Relax the requirements described in tagged-pointers.txt to be compliant
with the behaviours guaranteed by the ABI deriving from the introduction
of the ARM64_AT_FLAGS_SYSCALL_TBI flag.
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
CC: Andrey Konovalov <[email protected]>
Signed-off-by: Vincenzo Frascino <[email protected]>
---
Documentation/arm64/tagged-pointers.txt | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/Documentation/arm64/tagged-pointers.txt b/Documentation/arm64/tagged-pointers.txt
index a25a99e82bb1..df27188b9433 100644
--- a/Documentation/arm64/tagged-pointers.txt
+++ b/Documentation/arm64/tagged-pointers.txt
@@ -18,7 +18,8 @@ Passing tagged addresses to the kernel
--------------------------------------
All interpretation of userspace memory addresses by the kernel assumes
-an address tag of 0x00.
+an address tag of 0x00, unless the ARM64_AT_FLAGS_SYSCALL_TBI flag is
+set by the kernel.
This includes, but is not limited to, addresses found in:
@@ -31,18 +32,23 @@ This includes, but is not limited to, addresses found in:
- the frame pointer (x29) and frame records, e.g. when interpreting
them to generate a backtrace or call graph.
-Using non-zero address tags in any of these locations may result in an
-error code being returned, a (fatal) signal being raised, or other modes
-of failure.
+Using non-zero address tags in any of these locations when the
+ARM64_AT_FLAGS_SYSCALL_TBI flag is not set by the kernel, may result in
+an error code being returned, a (fatal) signal being raised, or other
+modes of failure.
-For these reasons, passing non-zero address tags to the kernel via
-system calls is forbidden, and using a non-zero address tag for sp is
-strongly discouraged.
+For these reasons, when the flag is not set, passing non-zero address
+tags to the kernel via system calls is forbidden, and using a non-zero
+address tag for sp is strongly discouraged.
Programs maintaining a frame pointer and frame records that use non-zero
address tags may suffer impaired or inaccurate debug and profiling
visibility.
+A definition of the meaning of ARM64_AT_FLAGS_SYSCALL_TBI and of the
+guarantees that the ABI provides when the flag is set by the kernel can
+be found in: Documentation/arm64/elf_at_flags.txt.
+
Preserving tags
---------------
@@ -57,6 +63,9 @@ be preserved.
The architecture prevents the use of a tagged PC, so the upper byte will
be set to a sign-extension of bit 55 on exception return.
+This behaviours are preserved even when the ARM64_AT_FLAGS_SYSCALL_TBI flag
+is set by the kernel.
+
Other considerations
--------------------
--
2.21.0
On arm64 the TCR_EL1.TBI0 bit has been always enabled hence
the userspace (EL0) is allowed to set a non-zero value in the top
byte but the resulting pointers are not allowed at the user-kernel
syscall ABI boundary.
Set ARM64_AT_FLAGS_SYSCALL_TBI (bit[0]) in the AT_FLAGS to advertise
the relaxation of the ABI to the userspace.
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
CC: Andrey Konovalov <[email protected]>
Signed-off-by: Vincenzo Frascino <[email protected]>
---
arch/arm64/include/asm/atflags.h | 7 +++++++
arch/arm64/include/asm/elf.h | 5 +++++
arch/arm64/include/uapi/asm/atflags.h | 8 ++++++++
3 files changed, 20 insertions(+)
create mode 100644 arch/arm64/include/asm/atflags.h
create mode 100644 arch/arm64/include/uapi/asm/atflags.h
diff --git a/arch/arm64/include/asm/atflags.h b/arch/arm64/include/asm/atflags.h
new file mode 100644
index 000000000000..b20093d61bf2
--- /dev/null
+++ b/arch/arm64/include/asm/atflags.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_ATFLAGS_H
+#define __ASM_ATFLAGS_H
+
+#include <uapi/asm/atflags.h>
+
+#endif
diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
index 6adc1a90e7e6..73d5184a4dd9 100644
--- a/arch/arm64/include/asm/elf.h
+++ b/arch/arm64/include/asm/elf.h
@@ -16,6 +16,7 @@
#ifndef __ASM_ELF_H
#define __ASM_ELF_H
+#include <asm/atflags.h>
#include <asm/hwcap.h>
/*
@@ -167,6 +168,10 @@ do { \
NEW_AUX_ENT(AT_IGNORE, 0); \
} while (0)
+/* Platform specific AT_FLAGS */
+#define ELF_AT_FLAGS ARM64_AT_FLAGS_SYSCALL_TBI
+#define COMPAT_ELF_AT_FLAGS 0
+
#define ARCH_HAS_SETUP_ADDITIONAL_PAGES
struct linux_binprm;
extern int arch_setup_additional_pages(struct linux_binprm *bprm,
diff --git a/arch/arm64/include/uapi/asm/atflags.h b/arch/arm64/include/uapi/asm/atflags.h
new file mode 100644
index 000000000000..1cf25692ffd6
--- /dev/null
+++ b/arch/arm64/include/uapi/asm/atflags.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __UAPI_ASM_ATFLAGS_H
+#define __UAPI_ASM_ATFLAGS_H
+
+/* Platform specific AT_FLAGS */
+#define ARM64_AT_FLAGS_SYSCALL_TBI (1 << 0)
+
+#endif
--
2.21.0
On Mon, Mar 18, 2019 at 3:45 PM Eric Dumazet <[email protected]> wrote:
>
> On Mon, Mar 18, 2019 at 6:17 AM Andrey Konovalov <[email protected]> wrote:
> >
>
> > Looking at the code, what's the point of this address != zc->address
> > check? Should I just remove it?
>
> No you must not remove it.
>
> The test detects if a u64 ->unsigned long conversion might have truncated bits.
>
> Quite surprisingly some people still use 32bit kernels.
>
> The ABI is 64bit only, because we did not want to have yet another compat layer.
>
> struct tcp_zerocopy_receive {
> __u64 address; /* in: address of mapping */
> __u32 length; /* in/out: number of bytes to map/mapped */
> __u32 recv_skip_hint; /* out: amount of bytes to skip */
> };
Ah, got it, thanks! I'll add a comment here then, otherwise this looks
confusing.
On Sat, Mar 16, 2019 at 8:32 PM kbuild test robot <[email protected]> wrote:
>
> Hi Andrey,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on linus/master]
> [also build test ERROR on v5.0 next-20190306]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url: https://github.com/0day-ci/linux/commits/Andrey-Konovalov/uaccess-add-untagged_addr-definition-for-other-arches/20190317-015913
> config: x86_64-randconfig-x012-201911 (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
>
> All errors (new ones prefixed by >>):
>
> kernel/sys.c: In function 'prctl_set_mm_map':
> >> kernel/sys.c:1996:11: error: invalid type argument of '->' (have 'struct prctl_mm_map')
> prctl_map->start_code = untagged_addr(prctl_map.start_code);
> ^~
> kernel/sys.c:1997:11: error: invalid type argument of '->' (have 'struct prctl_mm_map')
> prctl_map->end_code = untagged_addr(prctl_map.end_code);
> ^~
> kernel/sys.c:1998:11: error: invalid type argument of '->' (have 'struct prctl_mm_map')
> prctl_map->start_data = untagged_addr(prctl_map.start_data);
> ^~
> kernel/sys.c:1999:11: error: invalid type argument of '->' (have 'struct prctl_mm_map')
> prctl_map->end_data = untagged_addr(prctl_map.end_data);
> ^~
> kernel/sys.c:2000:11: error: invalid type argument of '->' (have 'struct prctl_mm_map')
> prctl_map->start_brk = untagged_addr(prctl_map.start_brk);
> ^~
> kernel/sys.c:2001:11: error: invalid type argument of '->' (have 'struct prctl_mm_map')
> prctl_map->brk = untagged_addr(prctl_map.brk);
> ^~
> kernel/sys.c:2002:11: error: invalid type argument of '->' (have 'struct prctl_mm_map')
> prctl_map->start_stack = untagged_addr(prctl_map.start_stack);
> ^~
> kernel/sys.c:2003:11: error: invalid type argument of '->' (have 'struct prctl_mm_map')
> prctl_map->arg_start = untagged_addr(prctl_map.arg_start);
> ^~
> kernel/sys.c:2004:11: error: invalid type argument of '->' (have 'struct prctl_mm_map')
> prctl_map->arg_end = untagged_addr(prctl_map.arg_end);
> ^~
> kernel/sys.c:2005:11: error: invalid type argument of '->' (have 'struct prctl_mm_map')
> prctl_map->env_start = untagged_addr(prctl_map.env_start);
> ^~
> kernel/sys.c:2006:11: error: invalid type argument of '->' (have 'struct prctl_mm_map')
> prctl_map->env_end = untagged_addr(prctl_map.env_end);
> ^~
>
> vim +1996 kernel/sys.c
Right, I didn't have the related config options enabled when I did the
testing...
>
> 1974
> 1975 #ifdef CONFIG_CHECKPOINT_RESTORE
> 1976 static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data_size)
> 1977 {
> 1978 struct prctl_mm_map prctl_map = { .exe_fd = (u32)-1, };
> 1979 unsigned long user_auxv[AT_VECTOR_SIZE];
> 1980 struct mm_struct *mm = current->mm;
> 1981 int error;
> 1982
> 1983 BUILD_BUG_ON(sizeof(user_auxv) != sizeof(mm->saved_auxv));
> 1984 BUILD_BUG_ON(sizeof(struct prctl_mm_map) > 256);
> 1985
> 1986 if (opt == PR_SET_MM_MAP_SIZE)
> 1987 return put_user((unsigned int)sizeof(prctl_map),
> 1988 (unsigned int __user *)addr);
> 1989
> 1990 if (data_size != sizeof(prctl_map))
> 1991 return -EINVAL;
> 1992
> 1993 if (copy_from_user(&prctl_map, addr, sizeof(prctl_map)))
> 1994 return -EFAULT;
> 1995
> > 1996 prctl_map->start_code = untagged_addr(prctl_map.start_code);
> 1997 prctl_map->end_code = untagged_addr(prctl_map.end_code);
> 1998 prctl_map->start_data = untagged_addr(prctl_map.start_data);
> 1999 prctl_map->end_data = untagged_addr(prctl_map.end_data);
> 2000 prctl_map->start_brk = untagged_addr(prctl_map.start_brk);
> 2001 prctl_map->brk = untagged_addr(prctl_map.brk);
> 2002 prctl_map->start_stack = untagged_addr(prctl_map.start_stack);
> 2003 prctl_map->arg_start = untagged_addr(prctl_map.arg_start);
> 2004 prctl_map->arg_end = untagged_addr(prctl_map.arg_end);
> 2005 prctl_map->env_start = untagged_addr(prctl_map.env_start);
> 2006 prctl_map->env_end = untagged_addr(prctl_map.env_end);
> 2007
> 2008 error = validate_prctl_map(&prctl_map);
> 2009 if (error)
> 2010 return error;
> 2011
> 2012 if (prctl_map.auxv_size) {
> 2013 memset(user_auxv, 0, sizeof(user_auxv));
> 2014 if (copy_from_user(user_auxv,
> 2015 (const void __user *)prctl_map.auxv,
> 2016 prctl_map.auxv_size))
> 2017 return -EFAULT;
> 2018
> 2019 /* Last entry must be AT_NULL as specification requires */
> 2020 user_auxv[AT_VECTOR_SIZE - 2] = AT_NULL;
> 2021 user_auxv[AT_VECTOR_SIZE - 1] = AT_NULL;
> 2022 }
> 2023
> 2024 if (prctl_map.exe_fd != (u32)-1) {
> 2025 error = prctl_set_mm_exe_file(mm, prctl_map.exe_fd);
> 2026 if (error)
> 2027 return error;
> 2028 }
> 2029
> 2030 /*
> 2031 * arg_lock protects concurent updates but we still need mmap_sem for
> 2032 * read to exclude races with sys_brk.
> 2033 */
> 2034 down_read(&mm->mmap_sem);
> 2035
> 2036 /*
> 2037 * We don't validate if these members are pointing to
> 2038 * real present VMAs because application may have correspond
> 2039 * VMAs already unmapped and kernel uses these members for statistics
> 2040 * output in procfs mostly, except
> 2041 *
> 2042 * - @start_brk/@brk which are used in do_brk but kernel lookups
> 2043 * for VMAs when updating these memvers so anything wrong written
> 2044 * here cause kernel to swear at userspace program but won't lead
> 2045 * to any problem in kernel itself
> 2046 */
> 2047
> 2048 spin_lock(&mm->arg_lock);
> 2049 mm->start_code = prctl_map.start_code;
> 2050 mm->end_code = prctl_map.end_code;
> 2051 mm->start_data = prctl_map.start_data;
> 2052 mm->end_data = prctl_map.end_data;
> 2053 mm->start_brk = prctl_map.start_brk;
> 2054 mm->brk = prctl_map.brk;
> 2055 mm->start_stack = prctl_map.start_stack;
> 2056 mm->arg_start = prctl_map.arg_start;
> 2057 mm->arg_end = prctl_map.arg_end;
> 2058 mm->env_start = prctl_map.env_start;
> 2059 mm->env_end = prctl_map.env_end;
> 2060 spin_unlock(&mm->arg_lock);
> 2061
> 2062 /*
> 2063 * Note this update of @saved_auxv is lockless thus
> 2064 * if someone reads this member in procfs while we're
> 2065 * updating -- it may get partly updated results. It's
> 2066 * known and acceptable trade off: we leave it as is to
> 2067 * not introduce additional locks here making the kernel
> 2068 * more complex.
> 2069 */
> 2070 if (prctl_map.auxv_size)
> 2071 memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv));
> 2072
> 2073 up_read(&mm->mmap_sem);
> 2074 return 0;
> 2075 }
> 2076 #endif /* CONFIG_CHECKPOINT_RESTORE */
> 2077
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Mon, Mar 18, 2019 at 12:47 PM Kevin Brodsky <[email protected]> wrote:
>
> On 15/03/2019 19:51, Andrey Konovalov wrote:
> > This patch is a part of a series that extends arm64 kernel ABI to allow to
> > pass tagged user pointers (with the top byte set to something else other
> > than 0x00) as syscall arguments.
> >
> > prctl_set_mm() and prctl_set_mm_map() use provided user pointers for vma
> > lookups, which can only by done with untagged pointers.
> >
> > Untag user pointers in these functions.
> >
> > Signed-off-by: Andrey Konovalov <[email protected]>
> > ---
> > kernel/sys.c | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index 12df0e5434b8..8e56d87cc6db 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -1993,6 +1993,18 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
> > if (copy_from_user(&prctl_map, addr, sizeof(prctl_map)))
> > return -EFAULT;
> >
> > + prctl_map->start_code = untagged_addr(prctl_map.start_code);
> > + prctl_map->end_code = untagged_addr(prctl_map.end_code);
> > + prctl_map->start_data = untagged_addr(prctl_map.start_data);
> > + prctl_map->end_data = untagged_addr(prctl_map.end_data);
> > + prctl_map->start_brk = untagged_addr(prctl_map.start_brk);
> > + prctl_map->brk = untagged_addr(prctl_map.brk);
> > + prctl_map->start_stack = untagged_addr(prctl_map.start_stack);
> > + prctl_map->arg_start = untagged_addr(prctl_map.arg_start);
> > + prctl_map->arg_end = untagged_addr(prctl_map.arg_end);
> > + prctl_map->env_start = untagged_addr(prctl_map.env_start);
> > + prctl_map->env_end = untagged_addr(prctl_map.env_end);
>
> As the buildbot suggests, those -> should be . instead :) You might want to check
> your local build with CONFIG_CHECKPOINT_RESTORE=y.
Oops :)
>
> > +
> > error = validate_prctl_map(&prctl_map);
> > if (error)
> > return error;
> > @@ -2106,6 +2118,8 @@ static int prctl_set_mm(int opt, unsigned long addr,
> > opt != PR_SET_MM_MAP_SIZE)))
> > return -EINVAL;
> >
> > + addr = untagged_addr(addr);
>
> This is a bit too coarse, addr is indeed used for find_vma() later on, but it is also
> used to access memory, by prctl_set_mm_mmap() and prctl_set_auxv().
Yes, I wrote this patch before our Friday discussion and forgot about
it. I'll fix it in v12, thanks!
>
> Kevin
>
> > +
> > #ifdef CONFIG_CHECKPOINT_RESTORE
> > if (opt == PR_SET_MM_MAP || opt == PR_SET_MM_MAP_SIZE)
> > return prctl_set_mm_map(opt, (const void __user *)addr, arg4);
>
On Mon, Mar 18, 2019 at 2:26 PM Kevin Brodsky <[email protected]> wrote:
>
> On 15/03/2019 19:51, Andrey Konovalov wrote:
> > This patch is a part of a series that extends arm64 kernel ABI to allow to
> > pass tagged user pointers (with the top byte set to something else other
> > than 0x00) as syscall arguments.
> >
> > Document the ABI changes in Documentation/arm64/tagged-pointers.txt.
> >
> > Signed-off-by: Andrey Konovalov <[email protected]>
> > ---
> > Documentation/arm64/tagged-pointers.txt | 18 ++++++++----------
> > 1 file changed, 8 insertions(+), 10 deletions(-)
> >
> > diff --git a/Documentation/arm64/tagged-pointers.txt b/Documentation/arm64/tagged-pointers.txt
> > index a25a99e82bb1..07fdddeacad0 100644
> > --- a/Documentation/arm64/tagged-pointers.txt
> > +++ b/Documentation/arm64/tagged-pointers.txt
> > @@ -17,13 +17,15 @@ this byte for application use.
> > Passing tagged addresses to the kernel
> > --------------------------------------
> >
> > -All interpretation of userspace memory addresses by the kernel assumes
> > -an address tag of 0x00.
> > +The kernel supports tags in pointer arguments (including pointers in
> > +structures) of syscalls, however such pointers must point to memory ranges
> > +obtained by anonymous mmap() or brk().
> >
> > -This includes, but is not limited to, addresses found in:
> > +The kernel supports tags in user fault addresses. However the fault_address
> > +field in the sigcontext struct will contain an untagged address.
> >
> > - - pointer arguments to system calls, including pointers in structures
> > - passed to system calls,
> > +All other interpretations of userspace memory addresses by the kernel
> > +assume an address tag of 0x00, in particular:
> >
> > - the stack pointer (sp), e.g. when interpreting it to deliver a
> > signal,
> > @@ -33,11 +35,7 @@ This includes, but is not limited to, addresses found in:
> >
> > Using non-zero address tags in any of these locations may result in an
> > error code being returned, a (fatal) signal being raised, or other modes
> > -of failure.
> > -
> > -For these reasons, passing non-zero address tags to the kernel via
> > -system calls is forbidden, and using a non-zero address tag for sp is
> > -strongly discouraged.
> > +of failure. Using a non-zero address tag for sp is strongly discouraged.
>
> I don't understand why we should keep such a limitation. For MTE, tagging SP is
> something we are definitely considering. This does bother userspace software in some
> rare cases, but I'm not sure in what way it bothers the kernel.
I don't mind allowing tagged sp as well, but it seems that it's
another ABI relaxation that needs to be handled separately. I'm not
sure if we want to include that into this patchset, which is supposed
to allow tagged pointers to be passed to syscalls.
>
> Kevin
>
> >
> > Programs maintaining a frame pointer and frame records that use non-zero
> > address tags may suffer impaired or inaccurate debug and profiling
>
Hi Vincenzo,
On Mon, Mar 18, 2019 at 10:06 PM Vincenzo Frascino
<[email protected]> wrote:
>
> On arm64 the TCR_EL1.TBI0 bit has been always enabled hence
> the userspace (EL0) is allowed to set a non-zero value in the
> top byte but the resulting pointers are not allowed at the
> user-kernel syscall ABI boundary.
>
> With the relaxed ABI proposed through this document, it is now possible
> to pass tagged pointers to the syscalls, when these pointers are in
> memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap() or brk().
>
> This change in the ABI requires a mechanism to inform the userspace
> that such an option is available.
>
> Specify and document the way in which AT_FLAGS can be used to advertise
> this feature to the userspace.
>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> CC: Andrey Konovalov <[email protected]>
> Signed-off-by: Vincenzo Frascino <[email protected]>
>
> Squash with "arm64: Define Documentation/arm64/elf_at_flags.txt"
> ---
> Documentation/arm64/elf_at_flags.txt | 133 +++++++++++++++++++++++++++
> 1 file changed, 133 insertions(+)
> create mode 100644 Documentation/arm64/elf_at_flags.txt
>
> diff --git a/Documentation/arm64/elf_at_flags.txt b/Documentation/arm64/elf_at_flags.txt
> new file mode 100644
> index 000000000000..9b3494207c14
> --- /dev/null
> +++ b/Documentation/arm64/elf_at_flags.txt
> @@ -0,0 +1,133 @@
> +ARM64 ELF AT_FLAGS
> +==================
> +
> +This document describes the usage and semantics of AT_FLAGS on arm64.
> +
> +1. Introduction
> +---------------
> +
> +AT_FLAGS is part of the Auxiliary Vector, contains the flags and it
> +is set to zero by the kernel on arm64 unless one or more of the
> +features detailed in paragraph 2 are present.
> +
> +The auxiliary vector can be accessed by the userspace using the
> +getauxval() API provided by the C library.
> +getauxval() returns an unsigned long and when a flag is present in
> +the AT_FLAGS, the corresponding bit in the returned value is set to 1.
> +
> +The AT_FLAGS with a "defined semantics" on arm64 are exposed to the
> +userspace via user API (uapi/asm/atflags.h).
> +The AT_FLAGS bits with "undefined semantics" are set to zero by default.
> +This means that the AT_FLAGS bits to which this document does not assign
> +an explicit meaning are to be intended reserved for future use.
> +The kernel will populate all such bits with zero until meanings are
> +assigned to them. If and when meanings are assigned, it is guaranteed
> +that they will not impact the functional operation of existing userspace
> +software. Userspace software should ignore any AT_FLAGS bit whose meaning
> +is not defined when the software is written.
> +
> +The userspace software can test for features by acquiring the AT_FLAGS
> +entry of the auxiliary vector, and testing whether a relevant flag
> +is set.
> +
> +Example of a userspace test function:
> +
> +bool feature_x_is_present(void)
> +{
> + unsigned long at_flags = getauxval(AT_FLAGS);
> + if (at_flags & FEATURE_X)
> + return true;
> +
> + return false;
> +}
> +
> +Where the software relies on a feature advertised by AT_FLAGS, it
> +must check that the feature is present before attempting to
> +use it.
> +
> +2. Features exposed via AT_FLAGS
> +--------------------------------
> +
> +bit[0]: ARM64_AT_FLAGS_SYSCALL_TBI
> +
> + On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64
> + kernel, hence the userspace (EL0) is allowed to set a non-zero value
> + in the top byte but the resulting pointers are not allowed at the
> + user-kernel syscall ABI boundary.
> + When bit[0] is set to 1 the kernel is advertising to the userspace
> + that a relaxed ABI is supported hence this type of pointers are now
> + allowed to be passed to the syscalls, when these pointers are in
> + memory ranges privately owned by a process and obtained by the
> + process in accordance with the definition of "valid tagged pointer"
> + in paragraph 3.
> + In these cases the tag is preserved as the pointer goes through the
> + kernel. Only when the kernel needs to check if a pointer is coming
> + from userspace an untag operation is required.
> +
> +3. ARM64_AT_FLAGS_SYSCALL_TBI
> +-----------------------------
> +
> +From the kernel syscall interface prospective, we define, for the purposes
> +of this document, a "valid tagged pointer" as a pointer that either it has
> +a zero value set in the top byte or it has a non-zero value, it is in memory
> +ranges privately owned by a userspace process and it is obtained in one of
> +the following ways:
> + - mmap() done by the process itself, where either:
> + * flags = MAP_PRIVATE | MAP_ANONYMOUS
> + * flags = MAP_PRIVATE and the file descriptor refers to a regular
> + file or "/dev/zero"
> + - a mapping below sbrk(0) done by the process itself
> + - any memory mapped by the kernel in the process's address space during
> + creation and following the restrictions presented above (i.e. data, bss,
> + stack).
> +
> +When the ARM64_AT_FLAGS_SYSCALL_TBI flag is set by the kernel, the following
> +behaviours are guaranteed by the ABI:
> +
> + - Every current or newly introduced syscall can accept any valid tagged
> + pointers.
> +
> + - If a non valid tagged pointer is passed to a syscall then the behaviour
> + is undefined.
> +
> + - Every valid tagged pointer is expected to work as an untagged one.
> +
> + - The kernel preserves any valid tagged pointers and returns them to the
> + userspace unchanged in all the cases except the ones documented in the
> + "Preserving tags" paragraph of tagged-pointers.txt.
> +
> +A definition of the meaning of tagged pointers on arm64 can be found in:
> +Documentation/arm64/tagged-pointers.txt.
> +
> +Example of correct usage (pseudo-code) for a userspace application:
> +
> +bool arm64_syscall_tbi_is_present(void)
> +{
> + unsigned long at_flags = getauxval(AT_FLAGS);
> + if (at_flags & ARM64_AT_FLAGS_SYSCALL_TBI)
> + return true;
> +
> + return false;
> +}
> +
> +void main(void)
> +{
> + char *addr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
> + MAP_ANONYMOUS, -1, 0);
> +
> + int fd = open("test.txt", O_WRONLY);
> +
> + /* Check if the relaxed ABI is supported */
> + if (arm64_syscall_tbi_is_present()) {
> + /* Add a tag to the pointer */
> + addr = tag_pointer(addr);
> + }
> +
> + strcpy("Hello World\n", addr);
Nit: s/strcpy("Hello World\n", addr)/strcpy(addr, "Hello World\n")
Thanks,
Amit D
> +
> + /* Write to a file */
> + write(fd, addr, sizeof(addr));
> +
> + close(fd);
> +}
> +
> --
> 2.21.0
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Fri, Mar 22, 2019 at 11:52:37AM +0530, Amit Daniel Kachhap wrote:
> On Mon, Mar 18, 2019 at 10:06 PM Vincenzo Frascino
> <[email protected]> wrote:
> > +Example of correct usage (pseudo-code) for a userspace application:
> > +
> > +bool arm64_syscall_tbi_is_present(void)
> > +{
> > + unsigned long at_flags = getauxval(AT_FLAGS);
> > + if (at_flags & ARM64_AT_FLAGS_SYSCALL_TBI)
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > +void main(void)
> > +{
> > + char *addr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
> > + MAP_ANONYMOUS, -1, 0);
> > +
> > + int fd = open("test.txt", O_WRONLY);
> > +
> > + /* Check if the relaxed ABI is supported */
> > + if (arm64_syscall_tbi_is_present()) {
> > + /* Add a tag to the pointer */
> > + addr = tag_pointer(addr);
> > + }
> > +
> > + strcpy("Hello World\n", addr);
>
> Nit: s/strcpy("Hello World\n", addr)/strcpy(addr, "Hello World\n")
Not exactly a nit ;).
> > +
> > + /* Write to a file */
> > + write(fd, addr, sizeof(addr));
I presume this was supposed to write "Hello World\n" to a file but
sizeof(addr) is 1.
Since we already support tagged pointers in user space (as long as they
are not passed into the kernel), the above example could tag the pointer
unconditionally and only clear it before write() if
!arm64_syscall_tbi_is_present().
--
Catalin
On 18/03/2019 16:35, Vincenzo Frascino wrote:
> On arm64 the TCR_EL1.TBI0 bit has been always enabled hence
> the userspace (EL0) is allowed to set a non-zero value in the
> top byte but the resulting pointers are not allowed at the
> user-kernel syscall ABI boundary.
>
> With the relaxed ABI proposed through this document, it is now possible
> to pass tagged pointers to the syscalls, when these pointers are in
> memory ranges obtained by an anonymous (MAP_ANONYMOUS) mmap() or brk().
>
> This change in the ABI requires a mechanism to inform the userspace
> that such an option is available.
>
> Specify and document the way in which AT_FLAGS can be used to advertise
> this feature to the userspace.
>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> CC: Andrey Konovalov <[email protected]>
> Signed-off-by: Vincenzo Frascino <[email protected]>
>
> Squash with "arm64: Define Documentation/arm64/elf_at_flags.txt"
> ---
> Documentation/arm64/elf_at_flags.txt | 133 +++++++++++++++++++++++++++
> 1 file changed, 133 insertions(+)
> create mode 100644 Documentation/arm64/elf_at_flags.txt
>
> diff --git a/Documentation/arm64/elf_at_flags.txt b/Documentation/arm64/elf_at_flags.txt
> new file mode 100644
> index 000000000000..9b3494207c14
> --- /dev/null
> +++ b/Documentation/arm64/elf_at_flags.txt
> @@ -0,0 +1,133 @@
> +ARM64 ELF AT_FLAGS
> +==================
> +
> +This document describes the usage and semantics of AT_FLAGS on arm64.
> +
> +1. Introduction
> +---------------
> +
> +AT_FLAGS is part of the Auxiliary Vector, contains the flags and it
> +is set to zero by the kernel on arm64 unless one or more of the
> +features detailed in paragraph 2 are present.
> +
> +The auxiliary vector can be accessed by the userspace using the
> +getauxval() API provided by the C library.
> +getauxval() returns an unsigned long and when a flag is present in
> +the AT_FLAGS, the corresponding bit in the returned value is set to 1.
> +
> +The AT_FLAGS with a "defined semantics" on arm64 are exposed to the
> +userspace via user API (uapi/asm/atflags.h).
> +The AT_FLAGS bits with "undefined semantics" are set to zero by default.
> +This means that the AT_FLAGS bits to which this document does not assign
> +an explicit meaning are to be intended reserved for future use.
> +The kernel will populate all such bits with zero until meanings are
> +assigned to them. If and when meanings are assigned, it is guaranteed
> +that they will not impact the functional operation of existing userspace
> +software. Userspace software should ignore any AT_FLAGS bit whose meaning
> +is not defined when the software is written.
> +
> +The userspace software can test for features by acquiring the AT_FLAGS
> +entry of the auxiliary vector, and testing whether a relevant flag
> +is set.
> +
> +Example of a userspace test function:
> +
> +bool feature_x_is_present(void)
> +{
> + unsigned long at_flags = getauxval(AT_FLAGS);
> + if (at_flags & FEATURE_X)
> + return true;
> +
> + return false;
> +}
> +
> +Where the software relies on a feature advertised by AT_FLAGS, it
> +must check that the feature is present before attempting to
> +use it.
> +
> +2. Features exposed via AT_FLAGS
> +--------------------------------
> +
> +bit[0]: ARM64_AT_FLAGS_SYSCALL_TBI
> +
> + On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64
> + kernel, hence the userspace (EL0) is allowed to set a non-zero value
> + in the top byte but the resulting pointers are not allowed at the
> + user-kernel syscall ABI boundary.
> + When bit[0] is set to 1 the kernel is advertising to the userspace
> + that a relaxed ABI is supported hence this type of pointers are now
> + allowed to be passed to the syscalls, when these pointers are in
> + memory ranges privately owned by a process and obtained by the
> + process in accordance with the definition of "valid tagged pointer"
> + in paragraph 3.
> + In these cases the tag is preserved as the pointer goes through the
> + kernel. Only when the kernel needs to check if a pointer is coming
> + from userspace an untag operation is required.
I would leave this last sentence out, because:
1. It is an implementation detail that doesn't impact this user ABI.
2. It is not entirely accurate: untagging the pointer may be needed for various kinds
of address lookup (like finding the corresponding VMA), at which point the kernel
usually already knows it is a userspace pointer.
> +
> +3. ARM64_AT_FLAGS_SYSCALL_TBI
> +-----------------------------
> +
> +From the kernel syscall interface prospective, we define, for the purposes
> +of this document, a "valid tagged pointer" as a pointer that either it has
> +a zero value set in the top byte or it has a non-zero value, it is in memory
> +ranges privately owned by a userspace process and it is obtained in one of
> +the following ways:
> + - mmap() done by the process itself, where either:
> + * flags = MAP_PRIVATE | MAP_ANONYMOUS
> + * flags = MAP_PRIVATE and the file descriptor refers to a regular
> + file or "/dev/zero"
> + - a mapping below sbrk(0) done by the process itself
I don't think that's very clear, this doesn't say how the mapping is obtained. Maybe
"a mapping obtained by the process using brk() or sbrk()"?
> + - any memory mapped by the kernel in the process's address space during
> + creation and following the restrictions presented above (i.e. data, bss,
> + stack).
With the rules above, the code section is included as well. Replacing "i.e." with
"e.g." would avoid having to list every single section (which is probably not a good
idea anyway).
Kevin
> +
> +When the ARM64_AT_FLAGS_SYSCALL_TBI flag is set by the kernel, the following
> +behaviours are guaranteed by the ABI:
> +
> + - Every current or newly introduced syscall can accept any valid tagged
> + pointers.
> +
> + - If a non valid tagged pointer is passed to a syscall then the behaviour
> + is undefined.
> +
> + - Every valid tagged pointer is expected to work as an untagged one.
> +
> + - The kernel preserves any valid tagged pointers and returns them to the
> + userspace unchanged in all the cases except the ones documented in the
> + "Preserving tags" paragraph of tagged-pointers.txt.
> +
> +A definition of the meaning of tagged pointers on arm64 can be found in:
> +Documentation/arm64/tagged-pointers.txt.
> +
> +Example of correct usage (pseudo-code) for a userspace application:
> +
> +bool arm64_syscall_tbi_is_present(void)
> +{
> + unsigned long at_flags = getauxval(AT_FLAGS);
> + if (at_flags & ARM64_AT_FLAGS_SYSCALL_TBI)
> + return true;
> +
> + return false;
> +}
> +
> +void main(void)
> +{
> + char *addr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
> + MAP_ANONYMOUS, -1, 0);
> +
> + int fd = open("test.txt", O_WRONLY);
> +
> + /* Check if the relaxed ABI is supported */
> + if (arm64_syscall_tbi_is_present()) {
> + /* Add a tag to the pointer */
> + addr = tag_pointer(addr);
> + }
> +
> + strcpy("Hello World\n", addr);
> +
> + /* Write to a file */
> + write(fd, addr, sizeof(addr));
> +
> + close(fd);
> +}
> +
On Fri, Mar 22, 2019 at 03:52:49PM +0000, Kevin Brodsky wrote:
> On 18/03/2019 16:35, Vincenzo Frascino wrote:
> > +2. Features exposed via AT_FLAGS
> > +--------------------------------
> > +
> > +bit[0]: ARM64_AT_FLAGS_SYSCALL_TBI
> > +
> > + On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64
> > + kernel, hence the userspace (EL0) is allowed to set a non-zero value
> > + in the top byte but the resulting pointers are not allowed at the
> > + user-kernel syscall ABI boundary.
> > + When bit[0] is set to 1 the kernel is advertising to the userspace
> > + that a relaxed ABI is supported hence this type of pointers are now
> > + allowed to be passed to the syscalls, when these pointers are in
> > + memory ranges privately owned by a process and obtained by the
> > + process in accordance with the definition of "valid tagged pointer"
> > + in paragraph 3.
> > + In these cases the tag is preserved as the pointer goes through the
> > + kernel. Only when the kernel needs to check if a pointer is coming
> > + from userspace an untag operation is required.
>
> I would leave this last sentence out, because:
> 1. It is an implementation detail that doesn't impact this user ABI.
> 2. It is not entirely accurate: untagging the pointer may be needed for
> various kinds of address lookup (like finding the corresponding VMA), at
> which point the kernel usually already knows it is a userspace pointer.
I fully agree, the above paragraph should not be part of the user ABI
document.
> > +3. ARM64_AT_FLAGS_SYSCALL_TBI
> > +-----------------------------
> > +
> > +From the kernel syscall interface prospective, we define, for the purposes
> > +of this document, a "valid tagged pointer" as a pointer that either it has
> > +a zero value set in the top byte or it has a non-zero value, it is in memory
> > +ranges privately owned by a userspace process and it is obtained in one of
> > +the following ways:
> > + - mmap() done by the process itself, where either:
> > + * flags = MAP_PRIVATE | MAP_ANONYMOUS
> > + * flags = MAP_PRIVATE and the file descriptor refers to a regular
> > + file or "/dev/zero"
> > + - a mapping below sbrk(0) done by the process itself
>
> I don't think that's very clear, this doesn't say how the mapping is
> obtained. Maybe "a mapping obtained by the process using brk() or sbrk()"?
I think what we mean here is anything in the "[heap]" section as per
/proc/*/maps (in the kernel this would be start_brk to brk).
> > + - any memory mapped by the kernel in the process's address space during
> > + creation and following the restrictions presented above (i.e. data, bss,
> > + stack).
>
> With the rules above, the code section is included as well. Replacing "i.e."
> with "e.g." would avoid having to list every single section (which is
> probably not a good idea anyway).
We could mention [stack] explicitly as that's documented in the
Documentation/filesystems/proc.txt and it's likely considered ABI
already.
The code section is MAP_PRIVATE, and can be done by the dynamic loader
(user process), so it falls under the mmap() rules listed above. I guess
we could simply drop "done by the process itself" here and allow
MAP_PRIVATE|MAP_ANONYMOUS or MAP_PRIVATE of regular file. This would
cover the [heap] and [stack] and we won't have to debate the brk() case
at all.
We probably mention somewhere (or we should in the tagged pointers doc)
that we don't support tagged PC.
--
Catalin
On 03/04/2019 17:50, Catalin Marinas wrote:
> On Fri, Mar 22, 2019 at 03:52:49PM +0000, Kevin Brodsky wrote:
>> On 18/03/2019 16:35, Vincenzo Frascino wrote:
>>> +2. Features exposed via AT_FLAGS
>>> +--------------------------------
>>> +
>>> +bit[0]: ARM64_AT_FLAGS_SYSCALL_TBI
>>> +
>>> + On arm64 the TCR_EL1.TBI0 bit has been always enabled on the arm64
>>> + kernel, hence the userspace (EL0) is allowed to set a non-zero value
>>> + in the top byte but the resulting pointers are not allowed at the
>>> + user-kernel syscall ABI boundary.
>>> + When bit[0] is set to 1 the kernel is advertising to the userspace
>>> + that a relaxed ABI is supported hence this type of pointers are now
>>> + allowed to be passed to the syscalls, when these pointers are in
>>> + memory ranges privately owned by a process and obtained by the
>>> + process in accordance with the definition of "valid tagged pointer"
>>> + in paragraph 3.
>>> + In these cases the tag is preserved as the pointer goes through the
>>> + kernel. Only when the kernel needs to check if a pointer is coming
>>> + from userspace an untag operation is required.
>> I would leave this last sentence out, because:
>> 1. It is an implementation detail that doesn't impact this user ABI.
>> 2. It is not entirely accurate: untagging the pointer may be needed for
>> various kinds of address lookup (like finding the corresponding VMA), at
>> which point the kernel usually already knows it is a userspace pointer.
> I fully agree, the above paragraph should not be part of the user ABI
> document.
>
>>> +3. ARM64_AT_FLAGS_SYSCALL_TBI
>>> +-----------------------------
>>> +
>>> +From the kernel syscall interface prospective, we define, for the purposes
>>> +of this document, a "valid tagged pointer" as a pointer that either it has
>>> +a zero value set in the top byte or it has a non-zero value, it is in memory
>>> +ranges privately owned by a userspace process and it is obtained in one of
>>> +the following ways:
>>> + - mmap() done by the process itself, where either:
>>> + * flags = MAP_PRIVATE | MAP_ANONYMOUS
>>> + * flags = MAP_PRIVATE and the file descriptor refers to a regular
>>> + file or "/dev/zero"
>>> + - a mapping below sbrk(0) done by the process itself
>> I don't think that's very clear, this doesn't say how the mapping is
>> obtained. Maybe "a mapping obtained by the process using brk() or sbrk()"?
> I think what we mean here is anything in the "[heap]" section as per
> /proc/*/maps (in the kernel this would be start_brk to brk).
>
>>> + - any memory mapped by the kernel in the process's address space during
>>> + creation and following the restrictions presented above (i.e. data, bss,
>>> + stack).
>> With the rules above, the code section is included as well. Replacing "i.e."
>> with "e.g." would avoid having to list every single section (which is
>> probably not a good idea anyway).
> We could mention [stack] explicitly as that's documented in the
> Documentation/filesystems/proc.txt and it's likely considered ABI
> already.
>
> The code section is MAP_PRIVATE, and can be done by the dynamic loader
> (user process), so it falls under the mmap() rules listed above. I guess
> we could simply drop "done by the process itself" here and allow
> MAP_PRIVATE|MAP_ANONYMOUS or MAP_PRIVATE of regular file. This would
> cover the [heap] and [stack] and we won't have to debate the brk() case
> at all.
That's probably the best option. I initially used this wording because I was worried
that there could be cases where the kernel allocates "magic" memory for userspace
that is MAP_PRIVATE|MAP_ANONYMOUS, but in fact it's probably not the case (presumably
such mapping should always be done via install_special_mapping(), which is definitely
not MAP_PRIVATE).
> We probably mention somewhere (or we should in the tagged pointers doc)
> that we don't support tagged PC.
I think that Documentation/arm64/tagged-pointers.txt already makes it reasonably
clear (anyway, with the architecture not supporting it, you can't expect much from
the kernel).
Kevin