From: Arnd Bergmann <[email protected]>
Hi Christoph, Russell,
This is the rebased version of my ARM set_fs patches on top of
v5.14-rc2. There were a couple of minor conflicts that I resolved,
but otherwise this is still identical to the version I tested earlier.
The one notable change this time is that I rename thread_info->syscall
to thread_info->abi_syscall, to clarify that this field now contains
both the ABI bits (0x900000 for OABI, 0x000000 for EABI) in kernels
that support both, and every access to this has to mask out the bits
it needs. As Russell mentioned before, having a 'syscall' field that
not just contains the syscall number is confusing and error-prone, so
I hope this change is good enough.
I have tested the oabi-compat changes using the LTP tests for the three
modified syscalls using an Armv7 kernel and a Debian 5 OABI user space.
I also tested the syscall_get_nr() in all combinations of OABI/EABI
kernel user space and fixed the bugs I found after Russell pointed out
those issues in the early versions.
Linus Walleij did additional testing of v4 on Footbridge with OABI
user space.
With this and the m68k changes getting merged, we are very close
to eliminating set_fs completely.
Russell, you can pull these from
https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git arm-setfs_v5
or I can add them to the ARM patch tracker if you prefer.
Arnd
v4: https://lore.kernel.org/linux-arm-kernel/[email protected]/
v3: https://lore.kernel.org/linux-arm-kernel/[email protected]/
v2: https://lore.kernel.org/linux-arm-kernel/[email protected]/
v1: https://lore.kernel.org/linux-arm-kernel/[email protected]/
Arnd Bergmann (9):
mm/maccess: fix unaligned copy_{from,to}_kernel_nofault
ARM: traps: use get_kernel_nofault instead of set_fs()
ARM: oabi-compat: add epoll_pwait handler
ARM: syscall: always store thread_info->abi_syscall
ARM: oabi-compat: rework epoll_wait/epoll_pwait emulation
ARM: oabi-compat: rework sys_semtimedop emulation
ARM: oabi-compat: rework fcntl64() emulation
ARM: uaccess: add __{get,put}_kernel_nofault
ARM: uaccess: remove set_fs() implementation
ARM: oabi-compat: fix oabi epoll sparse warning
arch/arm/Kconfig | 1 -
arch/arm/include/asm/ptrace.h | 1 -
arch/arm/include/asm/syscall.h | 16 ++-
arch/arm/include/asm/thread_info.h | 6 +-
arch/arm/include/asm/uaccess-asm.h | 6 -
arch/arm/include/asm/uaccess.h | 169 ++++++++++++-----------
arch/arm/include/uapi/asm/unistd.h | 1 +
arch/arm/kernel/asm-offsets.c | 3 +-
arch/arm/kernel/entry-common.S | 20 +--
arch/arm/kernel/process.c | 7 +-
arch/arm/kernel/ptrace.c | 14 +-
arch/arm/kernel/signal.c | 8 --
arch/arm/kernel/sys_oabi-compat.c | 214 +++++++++++++++++------------
arch/arm/kernel/traps.c | 47 +++----
arch/arm/lib/copy_from_user.S | 3 +-
arch/arm/lib/copy_to_user.S | 3 +-
arch/arm/tools/syscall.tbl | 2 +-
fs/eventpoll.c | 5 +-
include/linux/eventpoll.h | 18 +++
include/linux/syscalls.h | 3 +
ipc/sem.c | 84 ++++++-----
mm/maccess.c | 28 +++-
22 files changed, 361 insertions(+), 298 deletions(-)
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Alexander Viro <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Arnd Bergmann <[email protected]>
--
2.29.2
From: Arnd Bergmann <[email protected]>
This is one of the last users of get_fs(), and this is fairly easy to
change, since the infrastructure for it is already there.
The replacement here is essentially a copy of the existing fcntl64()
syscall entry function.
Signed-off-by: Arnd Bergmann <[email protected]>
---
arch/arm/kernel/sys_oabi-compat.c | 93 ++++++++++++++++++++-----------
1 file changed, 60 insertions(+), 33 deletions(-)
diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index 5ea365c35ca5..223ee46b6e75 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -194,56 +194,83 @@ struct oabi_flock64 {
pid_t l_pid;
} __attribute__ ((packed,aligned(4)));
-static long do_locks(unsigned int fd, unsigned int cmd,
- unsigned long arg)
+static int get_oabi_flock(struct flock64 *kernel, struct oabi_flock64 __user *arg)
{
- struct flock64 kernel;
struct oabi_flock64 user;
- mm_segment_t fs;
- long ret;
if (copy_from_user(&user, (struct oabi_flock64 __user *)arg,
sizeof(user)))
return -EFAULT;
- kernel.l_type = user.l_type;
- kernel.l_whence = user.l_whence;
- kernel.l_start = user.l_start;
- kernel.l_len = user.l_len;
- kernel.l_pid = user.l_pid;
-
- fs = get_fs();
- set_fs(KERNEL_DS);
- ret = sys_fcntl64(fd, cmd, (unsigned long)&kernel);
- set_fs(fs);
-
- if (!ret && (cmd == F_GETLK64 || cmd == F_OFD_GETLK)) {
- user.l_type = kernel.l_type;
- user.l_whence = kernel.l_whence;
- user.l_start = kernel.l_start;
- user.l_len = kernel.l_len;
- user.l_pid = kernel.l_pid;
- if (copy_to_user((struct oabi_flock64 __user *)arg,
- &user, sizeof(user)))
- ret = -EFAULT;
- }
- return ret;
+
+ kernel->l_type = user.l_type;
+ kernel->l_whence = user.l_whence;
+ kernel->l_start = user.l_start;
+ kernel->l_len = user.l_len;
+ kernel->l_pid = user.l_pid;
+
+ return 0;
+}
+
+static int put_oabi_flock(struct flock64 *kernel, struct oabi_flock64 __user *arg)
+{
+ struct oabi_flock64 user;
+
+ user.l_type = kernel->l_type;
+ user.l_whence = kernel->l_whence;
+ user.l_start = kernel->l_start;
+ user.l_len = kernel->l_len;
+ user.l_pid = kernel->l_pid;
+
+ if (copy_to_user((struct oabi_flock64 __user *)arg,
+ &user, sizeof(user)))
+ return -EFAULT;
+
+ return 0;
}
asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
unsigned long arg)
{
+ void __user *argp = (void __user *)arg;
+ struct fd f = fdget_raw(fd);
+ struct flock64 flock;
+ long err = -EBADF;
+
+ if (!f.file)
+ goto out;
+
switch (cmd) {
- case F_OFD_GETLK:
- case F_OFD_SETLK:
- case F_OFD_SETLKW:
case F_GETLK64:
+ case F_OFD_GETLK:
+ err = security_file_fcntl(f.file, cmd, arg);
+ if (err)
+ break;
+ err = get_oabi_flock(&flock, argp);
+ if (err)
+ break;
+ err = fcntl_getlk64(f.file, cmd, &flock);
+ if (!err)
+ err = put_oabi_flock(&flock, argp);
+ break;
case F_SETLK64:
case F_SETLKW64:
- return do_locks(fd, cmd, arg);
-
+ case F_OFD_SETLK:
+ case F_OFD_SETLKW:
+ err = security_file_fcntl(f.file, cmd, arg);
+ if (err)
+ break;
+ err = get_oabi_flock(&flock, argp);
+ if (err)
+ break;
+ err = fcntl_setlk64(fd, f.file, cmd, &flock);
+ break;
default:
- return sys_fcntl64(fd, cmd, arg);
+ err = sys_fcntl64(fd, cmd, arg);
+ break;
}
+ fdput(f);
+out:
+ return err;
}
struct oabi_epoll_event {
--
2.29.2
From: Arnd Bergmann <[email protected]>
On machines such as ARMv5 that trap unaligned accesses, these
two functions can be slow when each access needs to be emulated,
or they might not work at all.
Change them so that each loop is only used when both the src
and dst pointers are naturally aligned.
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
---
mm/maccess.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/mm/maccess.c b/mm/maccess.c
index 3bd70405f2d8..d3f1a1f0b1c1 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -24,13 +24,21 @@ bool __weak copy_from_kernel_nofault_allowed(const void *unsafe_src,
long copy_from_kernel_nofault(void *dst, const void *src, size_t size)
{
+ unsigned long align = 0;
+
+ if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
+ align = (unsigned long)dst | (unsigned long)src;
+
if (!copy_from_kernel_nofault_allowed(src, size))
return -ERANGE;
pagefault_disable();
- copy_from_kernel_nofault_loop(dst, src, size, u64, Efault);
- copy_from_kernel_nofault_loop(dst, src, size, u32, Efault);
- copy_from_kernel_nofault_loop(dst, src, size, u16, Efault);
+ if (!(align & 7))
+ copy_from_kernel_nofault_loop(dst, src, size, u64, Efault);
+ if (!(align & 3))
+ copy_from_kernel_nofault_loop(dst, src, size, u32, Efault);
+ if (!(align & 1))
+ copy_from_kernel_nofault_loop(dst, src, size, u16, Efault);
copy_from_kernel_nofault_loop(dst, src, size, u8, Efault);
pagefault_enable();
return 0;
@@ -50,10 +58,18 @@ EXPORT_SYMBOL_GPL(copy_from_kernel_nofault);
long copy_to_kernel_nofault(void *dst, const void *src, size_t size)
{
+ unsigned long align = 0;
+
+ if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
+ align = (unsigned long)dst | (unsigned long)src;
+
pagefault_disable();
- copy_to_kernel_nofault_loop(dst, src, size, u64, Efault);
- copy_to_kernel_nofault_loop(dst, src, size, u32, Efault);
- copy_to_kernel_nofault_loop(dst, src, size, u16, Efault);
+ if (!(align & 7))
+ copy_to_kernel_nofault_loop(dst, src, size, u64, Efault);
+ if (!(align & 3))
+ copy_to_kernel_nofault_loop(dst, src, size, u32, Efault);
+ if (!(align & 1))
+ copy_to_kernel_nofault_loop(dst, src, size, u16, Efault);
copy_to_kernel_nofault_loop(dst, src, size, u8, Efault);
pagefault_enable();
return 0;
--
2.29.2
From: Arnd Bergmann <[email protected]>
ARM uses set_fs() and __get_user() to allow the stack dumping code to
access possibly invalid pointers carefully. These can be changed to the
simpler get_kernel_nofault(), and allow the eventual removal of set_fs().
dump_instr() will print either kernel or user space pointers,
depending on how it was called. For dump_mem(), I assume we are only
interested in kernel pointers, and the only time that this is called
with user_mode(regs)==true is when the regs themselves are unreliable
as a result of the condition that caused the trap.
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
---
arch/arm/kernel/traps.c | 47 ++++++++++++++---------------------------
1 file changed, 16 insertions(+), 31 deletions(-)
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 64308e3a5d0c..10dd3ef1f398 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -122,17 +122,8 @@ static void dump_mem(const char *lvl, const char *str, unsigned long bottom,
unsigned long top)
{
unsigned long first;
- mm_segment_t fs;
int i;
- /*
- * We need to switch to kernel mode so that we can use __get_user
- * to safely read from kernel space. Note that we now dump the
- * code first, just in case the backtrace kills us.
- */
- fs = get_fs();
- set_fs(KERNEL_DS);
-
printk("%s%s(0x%08lx to 0x%08lx)\n", lvl, str, bottom, top);
for (first = bottom & ~31; first < top; first += 32) {
@@ -145,7 +136,7 @@ static void dump_mem(const char *lvl, const char *str, unsigned long bottom,
for (p = first, i = 0; i < 8 && p < top; i++, p += 4) {
if (p >= bottom && p < top) {
unsigned long val;
- if (__get_user(val, (unsigned long *)p) == 0)
+ if (get_kernel_nofault(val, (unsigned long *)p))
sprintf(str + i * 9, " %08lx", val);
else
sprintf(str + i * 9, " ????????");
@@ -153,11 +144,9 @@ static void dump_mem(const char *lvl, const char *str, unsigned long bottom,
}
printk("%s%04lx:%s\n", lvl, first & 0xffff, str);
}
-
- set_fs(fs);
}
-static void __dump_instr(const char *lvl, struct pt_regs *regs)
+static void dump_instr(const char *lvl, struct pt_regs *regs)
{
unsigned long addr = instruction_pointer(regs);
const int thumb = thumb_mode(regs);
@@ -173,10 +162,20 @@ static void __dump_instr(const char *lvl, struct pt_regs *regs)
for (i = -4; i < 1 + !!thumb; i++) {
unsigned int val, bad;
- if (thumb)
- bad = get_user(val, &((u16 *)addr)[i]);
- else
- bad = get_user(val, &((u32 *)addr)[i]);
+ if (!user_mode(regs)) {
+ if (thumb) {
+ u16 val16;
+ bad = get_kernel_nofault(val16, &((u16 *)addr)[i]);
+ val = val16;
+ } else {
+ bad = get_kernel_nofault(val, &((u32 *)addr)[i]);
+ }
+ } else {
+ if (thumb)
+ bad = get_user(val, &((u16 *)addr)[i]);
+ else
+ bad = get_user(val, &((u32 *)addr)[i]);
+ }
if (!bad)
p += sprintf(p, i == 0 ? "(%0*x) " : "%0*x ",
@@ -189,20 +188,6 @@ static void __dump_instr(const char *lvl, struct pt_regs *regs)
printk("%sCode: %s\n", lvl, str);
}
-static void dump_instr(const char *lvl, struct pt_regs *regs)
-{
- mm_segment_t fs;
-
- if (!user_mode(regs)) {
- fs = get_fs();
- set_fs(KERNEL_DS);
- __dump_instr(lvl, regs);
- set_fs(fs);
- } else {
- __dump_instr(lvl, regs);
- }
-}
-
#ifdef CONFIG_ARM_UNWIND
static inline void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
const char *loglvl)
--
2.29.2
From: Arnd Bergmann <[email protected]>
The epoll_wait() syscall has a special version for OABI compat
mode to convert the arguments to the EABI structure layout
of the kernel. However, the later epoll_pwait() syscall was
added in arch/arm in linux-2.6.32 without this conversion.
Use the same kind of handler for both.
Fixes: 369842658a36 ("ARM: 5677/1: ARM support for TIF_RESTORE_SIGMASK/pselect6/ppoll/epoll_pwait")
Cc: [email protected]
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
---
arch/arm/kernel/sys_oabi-compat.c | 38 ++++++++++++++++++++++++++++---
arch/arm/tools/syscall.tbl | 2 +-
2 files changed, 36 insertions(+), 4 deletions(-)
diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index 075a2e0ed2c1..443203fafb6b 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -265,9 +265,8 @@ asmlinkage long sys_oabi_epoll_ctl(int epfd, int op, int fd,
return do_epoll_ctl(epfd, op, fd, &kernel, false);
}
-asmlinkage long sys_oabi_epoll_wait(int epfd,
- struct oabi_epoll_event __user *events,
- int maxevents, int timeout)
+static long do_oabi_epoll_wait(int epfd, struct oabi_epoll_event __user *events,
+ int maxevents, int timeout)
{
struct epoll_event *kbuf;
struct oabi_epoll_event e;
@@ -314,6 +313,39 @@ asmlinkage long sys_oabi_epoll_wait(int epfd,
}
#endif
+SYSCALL_DEFINE4(oabi_epoll_wait, int, epfd,
+ struct oabi_epoll_event __user *, events,
+ int, maxevents, int, timeout)
+{
+ return do_oabi_epoll_wait(epfd, events, maxevents, timeout);
+}
+
+/*
+ * Implement the event wait interface for the eventpoll file. It is the kernel
+ * part of the user space epoll_pwait(2).
+ */
+SYSCALL_DEFINE6(oabi_epoll_pwait, int, epfd,
+ struct oabi_epoll_event __user *, events, int, maxevents,
+ int, timeout, const sigset_t __user *, sigmask,
+ size_t, sigsetsize)
+{
+ int error;
+
+ /*
+ * If the caller wants a certain signal mask to be set during the wait,
+ * we apply it here.
+ */
+ error = set_user_sigmask(sigmask, sigsetsize);
+ if (error)
+ return error;
+
+ error = do_oabi_epoll_wait(epfd, events, maxevents, timeout);
+ restore_saved_sigmask_unless(error == -EINTR);
+
+ return error;
+}
+#endif
+
struct oabi_sembuf {
unsigned short sem_num;
short sem_op;
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index c5df1179fc5d..11d0b960b2c2 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -360,7 +360,7 @@
343 common vmsplice sys_vmsplice
344 common move_pages sys_move_pages
345 common getcpu sys_getcpu
-346 common epoll_pwait sys_epoll_pwait
+346 common epoll_pwait sys_epoll_pwait sys_oabi_epoll_pwait
347 common kexec_load sys_kexec_load
348 common utimensat sys_utimensat_time32
349 common signalfd sys_signalfd
--
2.29.2
From: Arnd Bergmann <[email protected]>
sys_oabi_semtimedop() is one of the last users of set_fs() on Arm. To
remove this one, expose the internal code of the actual implementation
that operates on a kernel pointer and call it directly after copying.
There should be no measurable impact on the normal execution of this
function, and it makes the overly long function a little shorter, which
may help readability.
While reworking the oabi version, make it behave a little more like
the native one, using kvmalloc_array() and restructure the code
flow in a similar way.
The naming of __do_semtimedop() is not very good, I hope someone can
come up with a better name.
One regression was spotted by kernel test robot <[email protected]>
and fixed before the first mailing list submission.
Signed-off-by: Arnd Bergmann <[email protected]>
---
arch/arm/kernel/sys_oabi-compat.c | 60 ++++++++++++++++------
include/linux/syscalls.h | 3 ++
ipc/sem.c | 84 +++++++++++++++++++------------
3 files changed, 99 insertions(+), 48 deletions(-)
diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index 1f6a433200f1..5ea365c35ca5 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -80,6 +80,7 @@
#include <linux/socket.h>
#include <linux/net.h>
#include <linux/ipc.h>
+#include <linux/ipc_namespace.h>
#include <linux/uaccess.h>
#include <linux/slab.h>
@@ -302,46 +303,52 @@ struct oabi_sembuf {
unsigned short __pad;
};
+#define sc_semopm sem_ctls[2]
+
+#ifdef CONFIG_SYSVIPC
asmlinkage long sys_oabi_semtimedop(int semid,
struct oabi_sembuf __user *tsops,
unsigned nsops,
const struct old_timespec32 __user *timeout)
{
+ struct ipc_namespace *ns;
struct sembuf *sops;
- struct old_timespec32 local_timeout;
long err;
int i;
+ ns = current->nsproxy->ipc_ns;
+ if (nsops > ns->sc_semopm)
+ return -E2BIG;
if (nsops < 1 || nsops > SEMOPM)
return -EINVAL;
- if (!access_ok(tsops, sizeof(*tsops) * nsops))
- return -EFAULT;
- sops = kmalloc_array(nsops, sizeof(*sops), GFP_KERNEL);
+ sops = kvmalloc_array(nsops, sizeof(*sops), GFP_KERNEL);
if (!sops)
return -ENOMEM;
err = 0;
for (i = 0; i < nsops; i++) {
struct oabi_sembuf osb;
- err |= __copy_from_user(&osb, tsops, sizeof(osb));
+ err |= copy_from_user(&osb, tsops, sizeof(osb));
sops[i].sem_num = osb.sem_num;
sops[i].sem_op = osb.sem_op;
sops[i].sem_flg = osb.sem_flg;
tsops++;
}
- if (timeout) {
- /* copy this as well before changing domain protection */
- err |= copy_from_user(&local_timeout, timeout, sizeof(*timeout));
- timeout = &local_timeout;
- }
if (err) {
err = -EFAULT;
- } else {
- mm_segment_t fs = get_fs();
- set_fs(KERNEL_DS);
- err = sys_semtimedop_time32(semid, sops, nsops, timeout);
- set_fs(fs);
+ goto out;
+ }
+
+ if (timeout) {
+ struct timespec64 ts;
+ err = get_old_timespec32(&ts, timeout);
+ if (err)
+ goto out;
+ err = __do_semtimedop(semid, sops, nsops, &ts, ns);
+ goto out;
}
- kfree(sops);
+ err = __do_semtimedop(semid, sops, nsops, NULL, ns);
+out:
+ kvfree(sops);
return err;
}
@@ -368,6 +375,27 @@ asmlinkage int sys_oabi_ipc(uint call, int first, int second, int third,
return sys_ipc(call, first, second, third, ptr, fifth);
}
}
+#else
+asmlinkage long sys_oabi_semtimedop(int semid,
+ struct oabi_sembuf __user *tsops,
+ unsigned nsops,
+ const struct old_timespec32 __user *timeout)
+{
+ return -ENOSYS;
+}
+
+asmlinkage long sys_oabi_semop(int semid, struct oabi_sembuf __user *tsops,
+ unsigned nsops)
+{
+ return -ENOSYS;
+}
+
+asmlinkage int sys_oabi_ipc(uint call, int first, int second, int third,
+ void __user *ptr, long fifth)
+{
+ return -ENOSYS;
+}
+#endif
asmlinkage long sys_oabi_bind(int fd, struct sockaddr __user *addr, int addrlen)
{
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 69c9a7010081..6c6fc3fd5b72 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1373,6 +1373,9 @@ long ksys_old_shmctl(int shmid, int cmd, struct shmid_ds __user *buf);
long compat_ksys_semtimedop(int semid, struct sembuf __user *tsems,
unsigned int nsops,
const struct old_timespec32 __user *timeout);
+long __do_semtimedop(int semid, struct sembuf *tsems, unsigned int nsops,
+ const struct timespec64 *timeout,
+ struct ipc_namespace *ns);
int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
int __user *optlen);
diff --git a/ipc/sem.c b/ipc/sem.c
index 971e75d28364..ae8d9104b0a0 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -1984,46 +1984,34 @@ static struct sem_undo *find_alloc_undo(struct ipc_namespace *ns, int semid)
return un;
}
-static long do_semtimedop(int semid, struct sembuf __user *tsops,
- unsigned nsops, const struct timespec64 *timeout)
+long __do_semtimedop(int semid, struct sembuf *sops,
+ unsigned nsops, const struct timespec64 *timeout,
+ struct ipc_namespace *ns)
{
int error = -EINVAL;
struct sem_array *sma;
- struct sembuf fast_sops[SEMOPM_FAST];
- struct sembuf *sops = fast_sops, *sop;
+ struct sembuf *sop;
struct sem_undo *un;
int max, locknum;
bool undos = false, alter = false, dupsop = false;
struct sem_queue queue;
unsigned long dup = 0, jiffies_left = 0;
- struct ipc_namespace *ns;
-
- ns = current->nsproxy->ipc_ns;
if (nsops < 1 || semid < 0)
return -EINVAL;
if (nsops > ns->sc_semopm)
return -E2BIG;
- if (nsops > SEMOPM_FAST) {
- sops = kvmalloc_array(nsops, sizeof(*sops), GFP_KERNEL);
- if (sops == NULL)
- return -ENOMEM;
- }
-
- if (copy_from_user(sops, tsops, nsops * sizeof(*tsops))) {
- error = -EFAULT;
- goto out_free;
- }
if (timeout) {
if (timeout->tv_sec < 0 || timeout->tv_nsec < 0 ||
timeout->tv_nsec >= 1000000000L) {
error = -EINVAL;
- goto out_free;
+ goto out;
}
jiffies_left = timespec64_to_jiffies(timeout);
}
+
max = 0;
for (sop = sops; sop < sops + nsops; sop++) {
unsigned long mask = 1ULL << ((sop->sem_num) % BITS_PER_LONG);
@@ -2052,7 +2040,7 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
un = find_alloc_undo(ns, semid);
if (IS_ERR(un)) {
error = PTR_ERR(un);
- goto out_free;
+ goto out;
}
} else {
un = NULL;
@@ -2063,25 +2051,25 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
if (IS_ERR(sma)) {
rcu_read_unlock();
error = PTR_ERR(sma);
- goto out_free;
+ goto out;
}
error = -EFBIG;
if (max >= sma->sem_nsems) {
rcu_read_unlock();
- goto out_free;
+ goto out;
}
error = -EACCES;
if (ipcperms(ns, &sma->sem_perm, alter ? S_IWUGO : S_IRUGO)) {
rcu_read_unlock();
- goto out_free;
+ goto out;
}
error = security_sem_semop(&sma->sem_perm, sops, nsops, alter);
if (error) {
rcu_read_unlock();
- goto out_free;
+ goto out;
}
error = -EIDRM;
@@ -2095,7 +2083,7 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
* entangled here and why it's RMID race safe on comments at sem_lock()
*/
if (!ipc_valid_object(&sma->sem_perm))
- goto out_unlock_free;
+ goto out_unlock;
/*
* semid identifiers are not unique - find_alloc_undo may have
* allocated an undo structure, it was invalidated by an RMID
@@ -2104,7 +2092,7 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
* "un" itself is guaranteed by rcu.
*/
if (un && un->semid == -1)
- goto out_unlock_free;
+ goto out_unlock;
queue.sops = sops;
queue.nsops = nsops;
@@ -2130,10 +2118,10 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
rcu_read_unlock();
wake_up_q(&wake_q);
- goto out_free;
+ goto out;
}
if (error < 0) /* non-blocking error path */
- goto out_unlock_free;
+ goto out_unlock;
/*
* We need to sleep on this operation, so we put the current
@@ -2198,14 +2186,14 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
if (error != -EINTR) {
/* see SEM_BARRIER_2 for purpose/pairing */
smp_acquire__after_ctrl_dep();
- goto out_free;
+ goto out;
}
rcu_read_lock();
locknum = sem_lock(sma, sops, nsops);
if (!ipc_valid_object(&sma->sem_perm))
- goto out_unlock_free;
+ goto out_unlock;
/*
* No necessity for any barrier: We are protect by sem_lock()
@@ -2217,7 +2205,7 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
* Leave without unlink_queue(), but with sem_unlock().
*/
if (error != -EINTR)
- goto out_unlock_free;
+ goto out_unlock;
/*
* If an interrupt occurred we have to clean up the queue.
@@ -2228,13 +2216,45 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
unlink_queue(sma, &queue);
-out_unlock_free:
+out_unlock:
sem_unlock(sma, locknum);
rcu_read_unlock();
+out:
+ return error;
+}
+
+static long do_semtimedop(int semid, struct sembuf __user *tsops,
+ unsigned nsops, const struct timespec64 *timeout)
+{
+ struct sembuf fast_sops[SEMOPM_FAST];
+ struct sembuf *sops = fast_sops;
+ struct ipc_namespace *ns;
+ int ret;
+
+ ns = current->nsproxy->ipc_ns;
+ if (nsops > ns->sc_semopm)
+ return -E2BIG;
+ if (nsops < 1)
+ return -EINVAL;
+
+ if (nsops > SEMOPM_FAST) {
+ sops = kvmalloc_array(nsops, sizeof(*sops), GFP_KERNEL);
+ if (sops == NULL)
+ return -ENOMEM;
+ }
+
+ if (copy_from_user(sops, tsops, nsops * sizeof(*tsops))) {
+ ret = -EFAULT;
+ goto out_free;
+ }
+
+ ret = __do_semtimedop(semid, sops, nsops, timeout, ns);
+
out_free:
if (sops != fast_sops)
kvfree(sops);
- return error;
+
+ return ret;
}
long ksys_semtimedop(int semid, struct sembuf __user *tsops,
--
2.29.2
From: Arnd Bergmann <[email protected]>
The system call number is used in a a couple of places, in particular
ptrace, seccomp and /proc/<pid>/syscall.
The last one apparently never worked reliably on ARM for tasks that are
not currently getting traced.
Storing the syscall number in the normal entry path makes it work,
as well as allowing us to see if the current system call is for OABI
compat mode, which is the next thing I want to hook into.
Since the thread_info->syscall field is not just the number any more, it
is now renamed to abi_syscall. In kernels that enable both OABI and EABI,
the upper bits of this field encode 0x900000 (__NR_OABI_SYSCALL_BASE)
for OABI tasks, while normal EABI tasks do not set the upper bits. This
makes it possible to implement the in_oabi_syscall() helper later.
All other users of thread_info->syscall go through the syscall_get_nr()
helper, which in turn filters out the ABI bits.
Note that the ABI information is lost with PTRACE_SET_SYSCALL, so one
cannot set the internal number to a particular version, but this was
already the case. We could change it to let gdb encode the ABI type along
with the syscall in a CONFIG_OABI_COMPAT-enabled kernel, but that itself
would be a (backwards-compatible) ABI change, so I don't do it here.
Signed-off-by: Arnd Bergmann <[email protected]>
---
arch/arm/include/asm/syscall.h | 5 ++++-
arch/arm/include/asm/thread_info.h | 2 +-
arch/arm/include/uapi/asm/unistd.h | 1 +
arch/arm/kernel/asm-offsets.c | 1 +
arch/arm/kernel/entry-common.S | 8 ++++++--
arch/arm/kernel/ptrace.c | 14 ++++++++------
6 files changed, 21 insertions(+), 10 deletions(-)
diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index fd02761ba06c..f055e846a5cc 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -22,7 +22,10 @@ extern const unsigned long sys_call_table[];
static inline int syscall_get_nr(struct task_struct *task,
struct pt_regs *regs)
{
- return task_thread_info(task)->syscall;
+ if (IS_ENABLED(CONFIG_AEABI) && !IS_ENABLED(CONFIG_OABI_COMPAT))
+ return task_thread_info(task)->abi_syscall;
+
+ return task_thread_info(task)->abi_syscall & __NR_SYSCALL_MASK;
}
static inline void syscall_rollback(struct task_struct *task,
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index 70d4cbc49ae1..17c56051747b 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -62,7 +62,7 @@ struct thread_info {
unsigned long stack_canary;
#endif
struct cpu_context_save cpu_context; /* cpu context */
- __u32 syscall; /* syscall number */
+ __u32 abi_syscall; /* ABI type and syscall nr */
__u8 used_cp[16]; /* thread used copro */
unsigned long tp_value[2]; /* TLS registers */
#ifdef CONFIG_CRUNCH
diff --git a/arch/arm/include/uapi/asm/unistd.h b/arch/arm/include/uapi/asm/unistd.h
index ae7749e15726..a1149911464c 100644
--- a/arch/arm/include/uapi/asm/unistd.h
+++ b/arch/arm/include/uapi/asm/unistd.h
@@ -15,6 +15,7 @@
#define _UAPI__ASM_ARM_UNISTD_H
#define __NR_OABI_SYSCALL_BASE 0x900000
+#define __NR_SYSCALL_MASK 0x0fffff
#if defined(__thumb__) || defined(__ARM_EABI__)
#define __NR_SYSCALL_BASE 0
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 70993af22d80..a0945b898ca3 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -48,6 +48,7 @@ int main(void)
DEFINE(TI_CPU, offsetof(struct thread_info, cpu));
DEFINE(TI_CPU_DOMAIN, offsetof(struct thread_info, cpu_domain));
DEFINE(TI_CPU_SAVE, offsetof(struct thread_info, cpu_context));
+ DEFINE(TI_ABI_SYSCALL, offsetof(struct thread_info, abi_syscall));
DEFINE(TI_USED_CP, offsetof(struct thread_info, used_cp));
DEFINE(TI_TP_VALUE, offsetof(struct thread_info, tp_value));
DEFINE(TI_FPSTATE, offsetof(struct thread_info, fpstate));
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index 7f0b7aba1498..e837af90cd44 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -226,6 +226,7 @@ ENTRY(vector_swi)
/* saved_psr and saved_pc are now dead */
uaccess_disable tbl
+ get_thread_info tsk
adr tbl, sys_call_table @ load syscall table pointer
@@ -237,13 +238,17 @@ ENTRY(vector_swi)
* get the old ABI syscall table address.
*/
bics r10, r10, #0xff000000
+ strne r10, [tsk, #TI_ABI_SYSCALL]
+ streq scno, [tsk, #TI_ABI_SYSCALL]
eorne scno, r10, #__NR_OABI_SYSCALL_BASE
ldrne tbl, =sys_oabi_call_table
#elif !defined(CONFIG_AEABI)
bic scno, scno, #0xff000000 @ mask off SWI op-code
+ str scno, [tsk, #TI_ABI_SYSCALL]
eor scno, scno, #__NR_SYSCALL_BASE @ check OS number
+#else
+ str scno, [tsk, #TI_ABI_SYSCALL]
#endif
- get_thread_info tsk
/*
* Reload the registers that may have been corrupted on entry to
* the syscall assembly (by tracing or context tracking.)
@@ -288,7 +293,6 @@ ENDPROC(vector_swi)
* context switches, and waiting for our parent to respond.
*/
__sys_trace:
- mov r1, scno
add r0, sp, #S_OFF
bl syscall_trace_enter
mov scno, r0
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 2771e682220b..d886ea8910cb 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -25,6 +25,7 @@
#include <linux/tracehook.h>
#include <linux/unistd.h>
+#include <asm/syscall.h>
#include <asm/traps.h>
#define CREATE_TRACE_POINTS
@@ -811,7 +812,8 @@ long arch_ptrace(struct task_struct *child, long request,
break;
case PTRACE_SET_SYSCALL:
- task_thread_info(child)->syscall = data;
+ task_thread_info(child)->abi_syscall = data &
+ __NR_SYSCALL_MASK;
ret = 0;
break;
@@ -880,14 +882,14 @@ static void tracehook_report_syscall(struct pt_regs *regs,
if (dir == PTRACE_SYSCALL_EXIT)
tracehook_report_syscall_exit(regs, 0);
else if (tracehook_report_syscall_entry(regs))
- current_thread_info()->syscall = -1;
+ current_thread_info()->abi_syscall = -1;
regs->ARM_ip = ip;
}
-asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno)
+asmlinkage int syscall_trace_enter(struct pt_regs *regs)
{
- current_thread_info()->syscall = scno;
+ int scno;
if (test_thread_flag(TIF_SYSCALL_TRACE))
tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER);
@@ -898,11 +900,11 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno)
return -1;
#else
/* XXX: remove this once OABI gets fixed */
- secure_computing_strict(current_thread_info()->syscall);
+ secure_computing_strict(syscall_get_nr(current, regs));
#endif
/* Tracer or seccomp may have changed syscall. */
- scno = current_thread_info()->syscall;
+ scno = syscall_get_nr(current, regs);
if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
trace_sys_enter(regs, scno);
--
2.29.2
From: Arnd Bergmann <[email protected]>
There are no remaining callers of set_fs(), so just remove it
along with all associated code that operates on
thread_info->addr_limit.
There are still further optimizations that can be done:
- In get_user(), the address check could be moved entirely
into the out of line code, rather than passing a constant
as an argument,
- I assume the DACR handling can be simplified as we now
only change it during user access when CONFIG_CPU_SW_DOMAIN_PAN
is set, but not during set_fs().
Signed-off-by: Arnd Bergmann <[email protected]>
---
arch/arm/Kconfig | 1 -
arch/arm/include/asm/ptrace.h | 1 -
arch/arm/include/asm/thread_info.h | 4 ---
arch/arm/include/asm/uaccess-asm.h | 6 ----
arch/arm/include/asm/uaccess.h | 46 +++---------------------------
arch/arm/kernel/asm-offsets.c | 2 --
arch/arm/kernel/entry-common.S | 12 --------
arch/arm/kernel/process.c | 7 +----
arch/arm/kernel/signal.c | 8 ------
arch/arm/lib/copy_from_user.S | 3 +-
arch/arm/lib/copy_to_user.S | 3 +-
11 files changed, 7 insertions(+), 86 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 82f908fa5676..40a3a96b3d4e 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -126,7 +126,6 @@ config ARM
select PCI_SYSCALL if PCI
select PERF_USE_VMALLOC
select RTC_LIB
- select SET_FS
select SYS_SUPPORTS_APM_EMULATION
# Above selects are sorted alphabetically; please add new ones
# according to that. Thanks.
diff --git a/arch/arm/include/asm/ptrace.h b/arch/arm/include/asm/ptrace.h
index 91d6b7856be4..93051e2f402c 100644
--- a/arch/arm/include/asm/ptrace.h
+++ b/arch/arm/include/asm/ptrace.h
@@ -19,7 +19,6 @@ struct pt_regs {
struct svc_pt_regs {
struct pt_regs regs;
u32 dacr;
- u32 addr_limit;
};
#define to_svc_pt_regs(r) container_of(r, struct svc_pt_regs, regs)
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index 17c56051747b..d89931aed59f 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -31,8 +31,6 @@ struct task_struct;
#include <asm/types.h>
-typedef unsigned long mm_segment_t;
-
struct cpu_context_save {
__u32 r4;
__u32 r5;
@@ -54,7 +52,6 @@ struct cpu_context_save {
struct thread_info {
unsigned long flags; /* low level flags */
int preempt_count; /* 0 => preemptable, <0 => bug */
- mm_segment_t addr_limit; /* address limit */
struct task_struct *task; /* main task structure */
__u32 cpu; /* cpu */
__u32 cpu_domain; /* cpu domain */
@@ -80,7 +77,6 @@ struct thread_info {
.task = &tsk, \
.flags = 0, \
.preempt_count = INIT_PREEMPT_COUNT, \
- .addr_limit = KERNEL_DS, \
}
/*
diff --git a/arch/arm/include/asm/uaccess-asm.h b/arch/arm/include/asm/uaccess-asm.h
index e6eb7a2aaf1e..6451a433912c 100644
--- a/arch/arm/include/asm/uaccess-asm.h
+++ b/arch/arm/include/asm/uaccess-asm.h
@@ -84,12 +84,8 @@
* if \disable is set.
*/
.macro uaccess_entry, tsk, tmp0, tmp1, tmp2, disable
- ldr \tmp1, [\tsk, #TI_ADDR_LIMIT]
- ldr \tmp2, =TASK_SIZE
- str \tmp2, [\tsk, #TI_ADDR_LIMIT]
DACR( mrc p15, 0, \tmp0, c3, c0, 0)
DACR( str \tmp0, [sp, #SVC_DACR])
- str \tmp1, [sp, #SVC_ADDR_LIMIT]
.if \disable && IS_ENABLED(CONFIG_CPU_SW_DOMAIN_PAN)
/* kernel=client, user=no access */
mov \tmp2, #DACR_UACCESS_DISABLE
@@ -106,9 +102,7 @@
/* Restore the user access state previously saved by uaccess_entry */
.macro uaccess_exit, tsk, tmp0, tmp1
- ldr \tmp1, [sp, #SVC_ADDR_LIMIT]
DACR( ldr \tmp0, [sp, #SVC_DACR])
- str \tmp1, [\tsk, #TI_ADDR_LIMIT]
DACR( mcr p15, 0, \tmp0, c3, c0, 0)
.endm
diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index 4f60638755c4..084d1c07c2d0 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -52,32 +52,8 @@ static __always_inline void uaccess_restore(unsigned int flags)
extern int __get_user_bad(void);
extern int __put_user_bad(void);
-/*
- * Note that this is actually 0x1,0000,0000
- */
-#define KERNEL_DS 0x00000000
-
#ifdef CONFIG_MMU
-#define USER_DS TASK_SIZE
-#define get_fs() (current_thread_info()->addr_limit)
-
-static inline void set_fs(mm_segment_t fs)
-{
- current_thread_info()->addr_limit = fs;
-
- /*
- * Prevent a mispredicted conditional call to set_fs from forwarding
- * the wrong address limit to access_ok under speculation.
- */
- dsb(nsh);
- isb();
-
- modify_domain(DOMAIN_KERNEL, fs ? DOMAIN_CLIENT : DOMAIN_MANAGER);
-}
-
-#define uaccess_kernel() (get_fs() == KERNEL_DS)
-
/*
* We use 33-bit arithmetic here. Success returns zero, failure returns
* addr_limit. We take advantage that addr_limit will be zero for KERNEL_DS,
@@ -89,7 +65,7 @@ static inline void set_fs(mm_segment_t fs)
__asm__(".syntax unified\n" \
"adds %1, %2, %3; sbcscc %1, %1, %0; movcc %0, #0" \
: "=&r" (flag), "=&r" (roksum) \
- : "r" (addr), "Ir" (size), "0" (current_thread_info()->addr_limit) \
+ : "r" (addr), "Ir" (size), "0" (TASK_SIZE) \
: "cc"); \
flag; })
@@ -120,7 +96,7 @@ static inline void __user *__uaccess_mask_range_ptr(const void __user *ptr,
" subshs %1, %1, %2\n"
" movlo %0, #0\n"
: "+r" (safe_ptr), "=&r" (tmp)
- : "r" (size), "r" (current_thread_info()->addr_limit)
+ : "r" (size), "r" (TASK_SIZE)
: "cc");
csdb();
@@ -194,7 +170,7 @@ extern int __get_user_64t_4(void *);
#define __get_user_check(x, p) \
({ \
- unsigned long __limit = current_thread_info()->addr_limit - 1; \
+ unsigned long __limit = TASK_SIZE - 1; \
register typeof(*(p)) __user *__p asm("r0") = (p); \
register __inttype(x) __r2 asm("r2"); \
register unsigned long __l asm("r1") = __limit; \
@@ -245,7 +221,7 @@ extern int __put_user_8(void *, unsigned long long);
#define __put_user_check(__pu_val, __ptr, __err, __s) \
({ \
- unsigned long __limit = current_thread_info()->addr_limit - 1; \
+ unsigned long __limit = TASK_SIZE - 1; \
register typeof(__pu_val) __r2 asm("r2") = __pu_val; \
register const void __user *__p asm("r0") = __ptr; \
register unsigned long __l asm("r1") = __limit; \
@@ -262,19 +238,8 @@ extern int __put_user_8(void *, unsigned long long);
#else /* CONFIG_MMU */
-/*
- * uClinux has only one addr space, so has simplified address limits.
- */
-#define USER_DS KERNEL_DS
-
-#define uaccess_kernel() (true)
#define __addr_ok(addr) ((void)(addr), 1)
#define __range_ok(addr, size) ((void)(addr), 0)
-#define get_fs() (KERNEL_DS)
-
-static inline void set_fs(mm_segment_t fs)
-{
-}
#define get_user(x, p) __get_user(x, p)
#define __put_user_check __put_user_nocheck
@@ -283,9 +248,6 @@ static inline void set_fs(mm_segment_t fs)
#define access_ok(addr, size) (__range_ok(addr, size) == 0)
-#define user_addr_max() \
- (uaccess_kernel() ? ~0UL : get_fs())
-
#ifdef CONFIG_CPU_SPECTRE
/*
* When mitigating Spectre variant 1, it is not worth fixing the non-
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index a0945b898ca3..c60fefabc868 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -43,7 +43,6 @@ int main(void)
BLANK();
DEFINE(TI_FLAGS, offsetof(struct thread_info, flags));
DEFINE(TI_PREEMPT, offsetof(struct thread_info, preempt_count));
- DEFINE(TI_ADDR_LIMIT, offsetof(struct thread_info, addr_limit));
DEFINE(TI_TASK, offsetof(struct thread_info, task));
DEFINE(TI_CPU, offsetof(struct thread_info, cpu));
DEFINE(TI_CPU_DOMAIN, offsetof(struct thread_info, cpu_domain));
@@ -92,7 +91,6 @@ int main(void)
DEFINE(S_OLD_R0, offsetof(struct pt_regs, ARM_ORIG_r0));
DEFINE(PT_REGS_SIZE, sizeof(struct pt_regs));
DEFINE(SVC_DACR, offsetof(struct svc_pt_regs, dacr));
- DEFINE(SVC_ADDR_LIMIT, offsetof(struct svc_pt_regs, addr_limit));
DEFINE(SVC_REGS_SIZE, sizeof(struct svc_pt_regs));
BLANK();
DEFINE(SIGFRAME_RC3_OFFSET, offsetof(struct sigframe, retcode[3]));
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index e837af90cd44..d9c99db50243 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -49,10 +49,6 @@ __ret_fast_syscall:
UNWIND(.fnstart )
UNWIND(.cantunwind )
disable_irq_notrace @ disable interrupts
- ldr r2, [tsk, #TI_ADDR_LIMIT]
- ldr r1, =TASK_SIZE
- cmp r2, r1
- blne addr_limit_check_failed
ldr r1, [tsk, #TI_FLAGS] @ re-check for syscall tracing
movs r1, r1, lsl #16
bne fast_work_pending
@@ -87,10 +83,6 @@ __ret_fast_syscall:
bl do_rseq_syscall
#endif
disable_irq_notrace @ disable interrupts
- ldr r2, [tsk, #TI_ADDR_LIMIT]
- ldr r1, =TASK_SIZE
- cmp r2, r1
- blne addr_limit_check_failed
ldr r1, [tsk, #TI_FLAGS] @ re-check for syscall tracing
movs r1, r1, lsl #16
beq no_work_pending
@@ -129,10 +121,6 @@ ret_slow_syscall:
#endif
disable_irq_notrace @ disable interrupts
ENTRY(ret_to_user_from_irq)
- ldr r2, [tsk, #TI_ADDR_LIMIT]
- ldr r1, =TASK_SIZE
- cmp r2, r1
- blne addr_limit_check_failed
ldr r1, [tsk, #TI_FLAGS]
movs r1, r1, lsl #16
bne slow_work_pending
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index fc9e8b37eaa8..581542fbf5bf 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -108,7 +108,7 @@ void __show_regs(struct pt_regs *regs)
unsigned long flags;
char buf[64];
#ifndef CONFIG_CPU_V7M
- unsigned int domain, fs;
+ unsigned int domain;
#ifdef CONFIG_CPU_SW_DOMAIN_PAN
/*
* Get the domain register for the parent context. In user
@@ -117,14 +117,11 @@ void __show_regs(struct pt_regs *regs)
*/
if (user_mode(regs)) {
domain = DACR_UACCESS_ENABLE;
- fs = get_fs();
} else {
domain = to_svc_pt_regs(regs)->dacr;
- fs = to_svc_pt_regs(regs)->addr_limit;
}
#else
domain = get_domain();
- fs = get_fs();
#endif
#endif
@@ -160,8 +157,6 @@ void __show_regs(struct pt_regs *regs)
if ((domain & domain_mask(DOMAIN_USER)) ==
domain_val(DOMAIN_USER, DOMAIN_NOACCESS))
segment = "none";
- else if (fs == KERNEL_DS)
- segment = "kernel";
else
segment = "user";
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index a3a38d0a4c85..3e7ddac086c2 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -711,14 +711,6 @@ struct page *get_signal_page(void)
return page;
}
-/* Defer to generic check */
-asmlinkage void addr_limit_check_failed(void)
-{
-#ifdef CONFIG_MMU
- addr_limit_user_check();
-#endif
-}
-
#ifdef CONFIG_DEBUG_RSEQ
asmlinkage void do_rseq_syscall(struct pt_regs *regs)
{
diff --git a/arch/arm/lib/copy_from_user.S b/arch/arm/lib/copy_from_user.S
index f8016e3db65d..480a20766137 100644
--- a/arch/arm/lib/copy_from_user.S
+++ b/arch/arm/lib/copy_from_user.S
@@ -109,8 +109,7 @@
ENTRY(arm_copy_from_user)
#ifdef CONFIG_CPU_SPECTRE
- get_thread_info r3
- ldr r3, [r3, #TI_ADDR_LIMIT]
+ ldr r3, =TASK_SIZE
uaccess_mask_range_ptr r1, r2, r3, ip
#endif
diff --git a/arch/arm/lib/copy_to_user.S b/arch/arm/lib/copy_to_user.S
index ebfe4cb3d912..842ea5ede485 100644
--- a/arch/arm/lib/copy_to_user.S
+++ b/arch/arm/lib/copy_to_user.S
@@ -109,8 +109,7 @@
ENTRY(__copy_to_user_std)
WEAK(arm_copy_to_user)
#ifdef CONFIG_CPU_SPECTRE
- get_thread_info r3
- ldr r3, [r3, #TI_ADDR_LIMIT]
+ ldr r3, =TASK_SIZE
uaccess_mask_range_ptr r0, r2, r3, ip
#endif
--
2.29.2
From: Arnd Bergmann <[email protected]>
As my patches change the oabi epoll definition, I received a report
from the kernel test robot about a pre-existing issue with a mismatched
__poll_t type.
The OABI code was correct when it was initially added in linux-2.16,
but a later (also correct) change to the generic __poll_t triggered a
type mismatch warning from sparse.
As __poll_t is always 32-bit bits wide and otherwise compatible, using
this instead of __u32 in the oabi_epoll_event definition is a valid
workaround.
Reported-by: kernel test robot <[email protected]>
Fixes: 8ced390c2b18 ("define __poll_t, annotate constants")
Fixes: ee219b946e4b ("uapi: turn __poll_t sparse checks on by default")
Fixes: 687ad0191488 ("[ARM] 3109/1: old ABI compat: syscall wrappers for ABI impedance matching")
Signed-off-by: Arnd Bergmann <[email protected]>
---
arch/arm/kernel/sys_oabi-compat.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index 223ee46b6e75..68112c172025 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -274,7 +274,7 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
}
struct oabi_epoll_event {
- __u32 events;
+ __poll_t events;
__u64 data;
} __attribute__ ((packed,aligned(4)));
--
2.29.2
From: Arnd Bergmann <[email protected]>
These mimic the behavior of get_user and put_user, except
for domain switching, address limit checking and handling
of mismatched sizes, none of which are relevant here.
To work with pre-Armv6 kernels, this has to avoid TUSER()
inside of the new macros, the new approach passes the "t"
string along with the opcode, which is a bit uglier but
avoids duplicating more code.
As there is no __get_user_asm_dword(), I work around it
by copying 32 bit at a time, which is possible because
the output size is known.
Signed-off-by: Arnd Bergmann <[email protected]>
---
arch/arm/include/asm/uaccess.h | 123 ++++++++++++++++++++++-----------
1 file changed, 83 insertions(+), 40 deletions(-)
diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index a13d90206472..4f60638755c4 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -308,11 +308,11 @@ static inline void set_fs(mm_segment_t fs)
#define __get_user(x, ptr) \
({ \
long __gu_err = 0; \
- __get_user_err((x), (ptr), __gu_err); \
+ __get_user_err((x), (ptr), __gu_err, TUSER()); \
__gu_err; \
})
-#define __get_user_err(x, ptr, err) \
+#define __get_user_err(x, ptr, err, __t) \
do { \
unsigned long __gu_addr = (unsigned long)(ptr); \
unsigned long __gu_val; \
@@ -321,18 +321,19 @@ do { \
might_fault(); \
__ua_flags = uaccess_save_and_enable(); \
switch (sizeof(*(ptr))) { \
- case 1: __get_user_asm_byte(__gu_val, __gu_addr, err); break; \
- case 2: __get_user_asm_half(__gu_val, __gu_addr, err); break; \
- case 4: __get_user_asm_word(__gu_val, __gu_addr, err); break; \
+ case 1: __get_user_asm_byte(__gu_val, __gu_addr, err, __t); break; \
+ case 2: __get_user_asm_half(__gu_val, __gu_addr, err, __t); break; \
+ case 4: __get_user_asm_word(__gu_val, __gu_addr, err, __t); break; \
default: (__gu_val) = __get_user_bad(); \
} \
uaccess_restore(__ua_flags); \
(x) = (__typeof__(*(ptr)))__gu_val; \
} while (0)
+#endif
#define __get_user_asm(x, addr, err, instr) \
__asm__ __volatile__( \
- "1: " TUSER(instr) " %1, [%2], #0\n" \
+ "1: " instr " %1, [%2], #0\n" \
"2:\n" \
" .pushsection .text.fixup,\"ax\"\n" \
" .align 2\n" \
@@ -348,40 +349,38 @@ do { \
: "r" (addr), "i" (-EFAULT) \
: "cc")
-#define __get_user_asm_byte(x, addr, err) \
- __get_user_asm(x, addr, err, ldrb)
+#define __get_user_asm_byte(x, addr, err, __t) \
+ __get_user_asm(x, addr, err, "ldrb" __t)
#if __LINUX_ARM_ARCH__ >= 6
-#define __get_user_asm_half(x, addr, err) \
- __get_user_asm(x, addr, err, ldrh)
+#define __get_user_asm_half(x, addr, err, __t) \
+ __get_user_asm(x, addr, err, "ldrh" __t)
#else
#ifndef __ARMEB__
-#define __get_user_asm_half(x, __gu_addr, err) \
+#define __get_user_asm_half(x, __gu_addr, err, __t) \
({ \
unsigned long __b1, __b2; \
- __get_user_asm_byte(__b1, __gu_addr, err); \
- __get_user_asm_byte(__b2, __gu_addr + 1, err); \
+ __get_user_asm_byte(__b1, __gu_addr, err, __t); \
+ __get_user_asm_byte(__b2, __gu_addr + 1, err, __t); \
(x) = __b1 | (__b2 << 8); \
})
#else
-#define __get_user_asm_half(x, __gu_addr, err) \
+#define __get_user_asm_half(x, __gu_addr, err, __t) \
({ \
unsigned long __b1, __b2; \
- __get_user_asm_byte(__b1, __gu_addr, err); \
- __get_user_asm_byte(__b2, __gu_addr + 1, err); \
+ __get_user_asm_byte(__b1, __gu_addr, err, __t); \
+ __get_user_asm_byte(__b2, __gu_addr + 1, err, __t); \
(x) = (__b1 << 8) | __b2; \
})
#endif
#endif /* __LINUX_ARM_ARCH__ >= 6 */
-#define __get_user_asm_word(x, addr, err) \
- __get_user_asm(x, addr, err, ldr)
-#endif
-
+#define __get_user_asm_word(x, addr, err, __t) \
+ __get_user_asm(x, addr, err, "ldr" __t)
#define __put_user_switch(x, ptr, __err, __fn) \
do { \
@@ -425,7 +424,7 @@ do { \
#define __put_user_nocheck(x, __pu_ptr, __err, __size) \
do { \
unsigned long __pu_addr = (unsigned long)__pu_ptr; \
- __put_user_nocheck_##__size(x, __pu_addr, __err); \
+ __put_user_nocheck_##__size(x, __pu_addr, __err, TUSER());\
} while (0)
#define __put_user_nocheck_1 __put_user_asm_byte
@@ -433,9 +432,11 @@ do { \
#define __put_user_nocheck_4 __put_user_asm_word
#define __put_user_nocheck_8 __put_user_asm_dword
+#endif /* !CONFIG_CPU_SPECTRE */
+
#define __put_user_asm(x, __pu_addr, err, instr) \
__asm__ __volatile__( \
- "1: " TUSER(instr) " %1, [%2], #0\n" \
+ "1: " instr " %1, [%2], #0\n" \
"2:\n" \
" .pushsection .text.fixup,\"ax\"\n" \
" .align 2\n" \
@@ -450,36 +451,36 @@ do { \
: "r" (x), "r" (__pu_addr), "i" (-EFAULT) \
: "cc")
-#define __put_user_asm_byte(x, __pu_addr, err) \
- __put_user_asm(x, __pu_addr, err, strb)
+#define __put_user_asm_byte(x, __pu_addr, err, __t) \
+ __put_user_asm(x, __pu_addr, err, "strb" __t)
#if __LINUX_ARM_ARCH__ >= 6
-#define __put_user_asm_half(x, __pu_addr, err) \
- __put_user_asm(x, __pu_addr, err, strh)
+#define __put_user_asm_half(x, __pu_addr, err, __t) \
+ __put_user_asm(x, __pu_addr, err, "strh" __t)
#else
#ifndef __ARMEB__
-#define __put_user_asm_half(x, __pu_addr, err) \
+#define __put_user_asm_half(x, __pu_addr, err, __t) \
({ \
unsigned long __temp = (__force unsigned long)(x); \
- __put_user_asm_byte(__temp, __pu_addr, err); \
- __put_user_asm_byte(__temp >> 8, __pu_addr + 1, err); \
+ __put_user_asm_byte(__temp, __pu_addr, err, __t); \
+ __put_user_asm_byte(__temp >> 8, __pu_addr + 1, err, __t);\
})
#else
-#define __put_user_asm_half(x, __pu_addr, err) \
+#define __put_user_asm_half(x, __pu_addr, err, __t) \
({ \
unsigned long __temp = (__force unsigned long)(x); \
- __put_user_asm_byte(__temp >> 8, __pu_addr, err); \
- __put_user_asm_byte(__temp, __pu_addr + 1, err); \
+ __put_user_asm_byte(__temp >> 8, __pu_addr, err, __t); \
+ __put_user_asm_byte(__temp, __pu_addr + 1, err, __t); \
})
#endif
#endif /* __LINUX_ARM_ARCH__ >= 6 */
-#define __put_user_asm_word(x, __pu_addr, err) \
- __put_user_asm(x, __pu_addr, err, str)
+#define __put_user_asm_word(x, __pu_addr, err, __t) \
+ __put_user_asm(x, __pu_addr, err, "str" __t)
#ifndef __ARMEB__
#define __reg_oper0 "%R2"
@@ -489,12 +490,12 @@ do { \
#define __reg_oper1 "%R2"
#endif
-#define __put_user_asm_dword(x, __pu_addr, err) \
+#define __put_user_asm_dword(x, __pu_addr, err, __t) \
__asm__ __volatile__( \
- ARM( "1: " TUSER(str) " " __reg_oper1 ", [%1], #4\n" ) \
- ARM( "2: " TUSER(str) " " __reg_oper0 ", [%1]\n" ) \
- THUMB( "1: " TUSER(str) " " __reg_oper1 ", [%1]\n" ) \
- THUMB( "2: " TUSER(str) " " __reg_oper0 ", [%1, #4]\n" ) \
+ ARM( "1: str" __t " " __reg_oper1 ", [%1], #4\n" ) \
+ ARM( "2: str" __t " " __reg_oper0 ", [%1]\n" ) \
+ THUMB( "1: str" __t " " __reg_oper1 ", [%1]\n" ) \
+ THUMB( "2: str" __t " " __reg_oper0 ", [%1, #4]\n" ) \
"3:\n" \
" .pushsection .text.fixup,\"ax\"\n" \
" .align 2\n" \
@@ -510,7 +511,49 @@ do { \
: "r" (x), "i" (-EFAULT) \
: "cc")
-#endif /* !CONFIG_CPU_SPECTRE */
+#define HAVE_GET_KERNEL_NOFAULT
+
+#define __get_kernel_nofault(dst, src, type, err_label) \
+do { \
+ const type *__pk_ptr = (src); \
+ unsigned long __src = (unsigned long)(__pk_ptr); \
+ type __val; \
+ int __err = 0; \
+ switch (sizeof(type)) { \
+ case 1: __get_user_asm_byte(__val, __src, __err, ""); break; \
+ case 2: __get_user_asm_half(__val, __src, __err, ""); break; \
+ case 4: __get_user_asm_word(__val, __src, __err, ""); break; \
+ case 8: { \
+ u32 *__v32 = (u32*)&__val; \
+ __get_user_asm_word(__v32[0], __src, __err, ""); \
+ if (__err) \
+ break; \
+ __get_user_asm_word(__v32[1], __src+4, __err, ""); \
+ break; \
+ } \
+ default: __err = __get_user_bad(); break; \
+ } \
+ *(type *)(dst) = __val; \
+ if (__err) \
+ goto err_label; \
+} while (0)
+
+#define __put_kernel_nofault(dst, src, type, err_label) \
+do { \
+ const type *__pk_ptr = (dst); \
+ unsigned long __dst = (unsigned long)__pk_ptr; \
+ int __err = 0; \
+ type __val = *(type *)src; \
+ switch (sizeof(type)) { \
+ case 1: __put_user_asm_byte(__val, __dst, __err, ""); break; \
+ case 2: __put_user_asm_half(__val, __dst, __err, ""); break; \
+ case 4: __put_user_asm_word(__val, __dst, __err, ""); break; \
+ case 8: __put_user_asm_dword(__val, __dst, __err, ""); break; \
+ default: __err = __put_user_bad(); break; \
+ } \
+ if (__err) \
+ goto err_label; \
+} while (0)
#ifdef CONFIG_MMU
extern unsigned long __must_check
--
2.29.2
From my high-level perspective this looks great and I'd love to see it
merged:
Acked-by: Christoph Hellwig <[email protected]>
On Wed, Aug 11, 2021 at 8:39 AM Christoph Hellwig <[email protected]> wrote:
>
> From my high-level perspective this looks great and I'd love to see it
> merged:
>
> Acked-by: Christoph Hellwig <[email protected]>
Thank you for taking another look, and for the reminder. I've now added the
series to Russell's patch tracker.
Arnd
On Mon, Jul 26, 2021 at 04:11:39PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <[email protected]>
>
> These mimic the behavior of get_user and put_user, except
> for domain switching, address limit checking and handling
> of mismatched sizes, none of which are relevant here.
>
> To work with pre-Armv6 kernels, this has to avoid TUSER()
> inside of the new macros, the new approach passes the "t"
> string along with the opcode, which is a bit uglier but
> avoids duplicating more code.
>
> As there is no __get_user_asm_dword(), I work around it
> by copying 32 bit at a time, which is possible because
> the output size is known.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
I've just been bisecting some regressions running the kgdbts tests on
arm and this patch came up.
It looks like once this patch applies then copy_from_kernel_nofault()
starts faulting when it called from kgdb. I've put an example stack
trace at the bottom of this mail and the most simplified reproduction
I currently have is:
~~~
make multi_v7_defconfig
../scripts/config --enable KGDB --enable KGDB_TESTS
make olddefconfig
make -j `nproc`
qemu-system-arm -M virt -m 1G -nographic \
-kernel arch/arm/boot/zImage -initrd rootfs.cpio.gz
# Boot and login
echo V1 > /sys/module/kgdbts/parameters/kgdbts
~~~
I suspect this will reproduce on any arm system with CONFIG_KGDB and
CONFIG_KGDB_TESTS enabled simply by running that last echo command...
but I have only tested on QEMU for now.
Daniel.
Stack trace:
~~~
# echo kgdbts=V1F1000 > /sys/module/kgdbts/parameters/kgdbts
[ 34.995507] KGDB: Registered I/O driver kgdbts
[ 35.038102] kgdbts:RUN plant and detach test
Entering kdb (current=0xd4264380, pid 134) on processor 0 due to Keyboard Entry
[0]kdb> [ 35.056005] kgdbts:RUN sw breakpoint test
[ 35.062309] kgdbts:RUN bad memory access test
[ 35.063619] 8<--- cut here ---
[ 35.064022] Unhandled fault: page domain fault (0x01b) at 0x00000000
[ 35.064212] pgd = (ptrval)
[ 35.064459] [00000000] *pgd=942dc835, *pte=00000000, *ppte=00000000
[ 35.065071] Internal error: : 1b [#1] SMP ARM
[ 35.065381] KGDB: re-enter exception: ALL breakpoints killed
[ 35.065850] ---[ end trace 909d8c43057666be ]---
[ 35.066088] 8<--- cut here ---
[ 35.066189] Unhandled fault: page domain fault (0x01b) at 0x00000000
[ 35.066332] pgd = (ptrval)
[ 35.066406] [00000000] *pgd=942dc835, *pte=00000000, *ppte=00000000
[ 35.066597] Internal error: : 1b [#2] SMP ARM
[ 35.066906] CPU: 0 PID: 134 Comm: sh Tainted: G D 5.14.0-rc1-00013-g2df4c9a741a0 #60
[ 35.067152] Hardware name: ARM-Versatile Express
[ 35.067432] [<c0311bdc>] (unwind_backtrace) from [<c030bdc0>] (show_stack+0x10/0x14)
[ 35.067880] [<c030bdc0>] (show_stack) from [<c114b9c8>] (dump_stack_lvl+0x58/0x70)
[ 35.068054] [<c114b9c8>] (dump_stack_lvl) from [<c0430cdc>] (kgdb_reenter_check+0x104/0x150)
[ 35.068213] [<c0430cdc>] (kgdb_reenter_check) from [<c0430dcc>] (kgdb_handle_exception+0xa4/0x114)
[ 35.068395] [<c0430dcc>] (kgdb_handle_exception) from [<c0311268>] (kgdb_notify+0x30/0x74)
[ 35.068563] [<c0311268>] (kgdb_notify) from [<c037422c>] (atomic_notifier_call_chain+0xac/0x194)
[ 35.068745] [<c037422c>] (atomic_notifier_call_chain) from [<c0374370>] (notify_die+0x5c/0xbc)
[ 35.068933] [<c0374370>] (notify_die) from [<c030bf04>] (die+0x140/0x544)
[ 35.069079] [<c030bf04>] (die) from [<c03164d4>] (do_DataAbort+0xb8/0xbc)
[ 35.069220] [<c03164d4>] (do_DataAbort) from [<c0300afc>] (__dabt_svc+0x5c/0xa0)
[ 35.069434] Exception stack(0xd4249c10 to 0xd4249c58)
[ 35.069616] 9c00: ???????? ???????? ???????? ????????
[ 35.069776] 9c20: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
[ 35.069943] 9c40: ???????? ???????? ???????? ???????? ???????? ????????
[ 35.070107] [<c0300afc>] (__dabt_svc) from [<c049c8c4>] (copy_from_kernel_nofault+0x114/0x13c)
[ 35.070291] [<c049c8c4>] (copy_from_kernel_nofault) from [<c0431688>] (kgdb_mem2hex+0x1c/0x88)
[ 35.070463] [<c0431688>] (kgdb_mem2hex) from [<c04322b0>] (gdb_serial_stub+0x8c4/0x1088)
[ 35.070640] [<c04322b0>] (gdb_serial_stub) from [<c04302e8>] (kgdb_cpu_enter+0x4f4/0x988)
[ 35.070796] [<c04302e8>] (kgdb_cpu_enter) from [<c0430e08>] (kgdb_handle_exception+0xe0/0x114)
[ 35.070982] [<c0430e08>] (kgdb_handle_exception) from [<c0311210>] (kgdb_compiled_brk_fn+0x24/0x2c)
[ 35.071166] [<c0311210>] (kgdb_compiled_brk_fn) from [<c030c40c>] (do_undefinstr+0x104/0x230)
[ 35.071342] [<c030c40c>] (do_undefinstr) from [<c0300c6c>] (__und_svc_finish+0x0/0x54)
[ 35.071502] Exception stack(0xd4249dc8 to 0xd4249e10)
[ 35.071614] 9dc0: ???????? ???????? ???????? ???????? ???????? ????????
[ 35.071778] 9de0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
[ 35.071944] 9e00: ???????? ???????? ???????? ????????
[ 35.072054] [<c0300c6c>] (__und_svc_finish) from [<c042fd20>] (kgdb_breakpoint+0x30/0x58)
[ 35.072211] [<c042fd20>] (kgdb_breakpoint) from [<c0b14b08>] (configure_kgdbts+0x228/0x68c)
[ 35.072395] [<c0b14b08>] (configure_kgdbts) from [<c036fdcc>] (param_attr_store+0x60/0xb8)
[ 35.072560] [<c036fdcc>] (param_attr_store) from [<c05bcf14>] (kernfs_fop_write_iter+0x110/0x1d4)
[ 35.072745] [<c05bcf14>] (kernfs_fop_write_iter) from [<c050f074>] (vfs_write+0x350/0x508)
[ 35.072920] [<c050f074>] (vfs_write) from [<c050f370>] (ksys_write+0x64/0xdc)
[ 35.073075] [<c050f370>] (ksys_write) from [<c03000c0>] (ret_fast_syscall+0x0/0x2c)
[ 35.073259] Exception stack(0xd4249fa8 to 0xd4249ff0)
[ 35.073372] 9fa0: ???????? ???????? ???????? ???????? ???????? ????????
[ 35.073527] 9fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
[ 35.073679] 9fe0: ???????? ???????? ???????? ????????
[ 35.073960] Kernel panic - not syncing: Recursive entry to debugger
[ 36.286118] SMP: failed to stop secondary CPUs
[ 36.286568] ---[ end Kernel panic - not syncing: Recursive entry to debugger ]---
~~~
On Wed, Jan 12, 2022 at 05:29:03PM +0000, Daniel Thompson wrote:
> On Mon, Jul 26, 2021 at 04:11:39PM +0200, Arnd Bergmann wrote:
> > From: Arnd Bergmann <[email protected]>
> >
> > These mimic the behavior of get_user and put_user, except
> > for domain switching, address limit checking and handling
> > of mismatched sizes, none of which are relevant here.
> >
> > To work with pre-Armv6 kernels, this has to avoid TUSER()
> > inside of the new macros, the new approach passes the "t"
> > string along with the opcode, which is a bit uglier but
> > avoids duplicating more code.
> >
> > As there is no __get_user_asm_dword(), I work around it
> > by copying 32 bit at a time, which is possible because
> > the output size is known.
> >
> > Signed-off-by: Arnd Bergmann <[email protected]>
>
> I've just been bisecting some regressions running the kgdbts tests on
> arm and this patch came up.
So the software PAN code is working :)
The kernel attempted to access an address that is in the userspace
domain (NULL pointer) and took an exception.
I suppose we should handle a domain fault more gracefully - what are
the required semantics if the kernel attempts a userspace access
using one of the _nofault() accessors?
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Wed, Jan 12, 2022 at 06:08:17PM +0000, Russell King (Oracle) wrote:
> On Wed, Jan 12, 2022 at 05:29:03PM +0000, Daniel Thompson wrote:
> > On Mon, Jul 26, 2021 at 04:11:39PM +0200, Arnd Bergmann wrote:
> > > From: Arnd Bergmann <[email protected]>
> > >
> > > These mimic the behavior of get_user and put_user, except
> > > for domain switching, address limit checking and handling
> > > of mismatched sizes, none of which are relevant here.
> > >
> > > To work with pre-Armv6 kernels, this has to avoid TUSER()
> > > inside of the new macros, the new approach passes the "t"
> > > string along with the opcode, which is a bit uglier but
> > > avoids duplicating more code.
> > >
> > > As there is no __get_user_asm_dword(), I work around it
> > > by copying 32 bit at a time, which is possible because
> > > the output size is known.
> > >
> > > Signed-off-by: Arnd Bergmann <[email protected]>
> >
> > I've just been bisecting some regressions running the kgdbts tests on
> > arm and this patch came up.
>
> So the software PAN code is working :)
Interesting. I noticed it was odd that kgdbts works just fine
if launched from kernel command line. I guess that runs before
PAN is activated. Neat.
> The kernel attempted to access an address that is in the userspace
> domain (NULL pointer) and took an exception.
>
> I suppose we should handle a domain fault more gracefully - what are
> the required semantics if the kernel attempts a userspace access
> using one of the _nofault() accessors?
I think the best answer might well be that, if the arch provides
implementations of hooks such as copy_from_kernel_nofault_allowed()
then the kernel should never attempt a userspace access using the
_nofault() accessors. That means they can do whatever they like!
In other words something like the patch below looks like a promising
approach.
Daniel.
From f66a63b504ff582f261a506c54ceab8c0e77a98c Mon Sep 17 00:00:00 2001
From: Daniel Thompson <[email protected]>
Date: Thu, 13 Jan 2022 09:34:45 +0000
Subject: [PATCH] arm: mm: Implement copy_from_kernel_nofault_allowed()
Currently copy_from_kernel_nofault() can actually fault (due to software
PAN) if we attempt userspace access. In any case, the documented
behaviour for this function is to return -ERANGE if we attempt an access
outside of kernel space.
Implementing copy_from_kernel_nofault_allowed() solves both these
problems.
Signed-off-by: Daniel Thompson <[email protected]>
---
arch/arm/mm/Makefile | 2 +-
arch/arm/mm/maccess.c | 9 +++++++++
2 files changed, 10 insertions(+), 1 deletion(-)
create mode 100644 arch/arm/mm/maccess.c
diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 3510503bc5e6..d1c5f4f256de 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -3,7 +3,7 @@
# Makefile for the linux arm-specific parts of the memory manager.
#
-obj-y := extable.o fault.o init.o iomap.o
+obj-y := extable.o fault.o init.o iomap.o maccess.o
obj-y += dma-mapping$(MMUEXT).o
obj-$(CONFIG_MMU) += fault-armv.o flush.o idmap.o ioremap.o \
mmap.o pgd.o mmu.o pageattr.o
diff --git a/arch/arm/mm/maccess.c b/arch/arm/mm/maccess.c
new file mode 100644
index 000000000000..0251062cb40d
--- /dev/null
+++ b/arch/arm/mm/maccess.c
@@ -0,0 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/uaccess.h>
+#include <linux/kernel.h>
+
+bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
+{
+ return (unsigned long)unsafe_src >= TASK_SIZE;
+}
--
2.33.1
On Thu, Jan 13, 2022 at 10:47 AM Daniel Thompson
<[email protected]> wrote:
> On Wed, Jan 12, 2022 at 06:08:17PM +0000, Russell King (Oracle) wrote:
>
> > The kernel attempted to access an address that is in the userspace
> > domain (NULL pointer) and took an exception.
> >
> > I suppose we should handle a domain fault more gracefully - what are
> > the required semantics if the kernel attempts a userspace access
> > using one of the _nofault() accessors?
>
> I think the best answer might well be that, if the arch provides
> implementations of hooks such as copy_from_kernel_nofault_allowed()
> then the kernel should never attempt a userspace access using the
> _nofault() accessors. That means they can do whatever they like!
>
> In other words something like the patch below looks like a promising
> approach.
Right, it seems this is the same as on x86.
> From f66a63b504ff582f261a506c54ceab8c0e77a98c Mon Sep 17 00:00:00 2001
> From: Daniel Thompson <[email protected]>
> Date: Thu, 13 Jan 2022 09:34:45 +0000
> Subject: [PATCH] arm: mm: Implement copy_from_kernel_nofault_allowed()
>
> Currently copy_from_kernel_nofault() can actually fault (due to software
> PAN) if we attempt userspace access. In any case, the documented
> behaviour for this function is to return -ERANGE if we attempt an access
> outside of kernel space.
>
> Implementing copy_from_kernel_nofault_allowed() solves both these
> problems.
>
> Signed-off-by: Daniel Thompson <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
On Thu, Jan 13, 2022 at 12:14:50PM +0100, Arnd Bergmann wrote:
> On Thu, Jan 13, 2022 at 10:47 AM Daniel Thompson
> <[email protected]> wrote:
> > On Wed, Jan 12, 2022 at 06:08:17PM +0000, Russell King (Oracle) wrote:
> >
> > > The kernel attempted to access an address that is in the userspace
> > > domain (NULL pointer) and took an exception.
> > >
> > > I suppose we should handle a domain fault more gracefully - what are
> > > the required semantics if the kernel attempts a userspace access
> > > using one of the _nofault() accessors?
> >
> > I think the best answer might well be that, if the arch provides
> > implementations of hooks such as copy_from_kernel_nofault_allowed()
> > then the kernel should never attempt a userspace access using the
> > _nofault() accessors. That means they can do whatever they like!
> >
> > In other words something like the patch below looks like a promising
> > approach.
>
> Right, it seems this is the same as on x86.
Hmnn...
Looking a bit deeper into copy_from_kernel_nofault() there is an odd
asymmetry between copy_to_kernel_nofault(). Basically there is
copy_from_kernel_nofault_allowed() but no corresponding
copy_to_kernel_nofault_allowed() which means we cannot defend memory
pokes using a helper function.
I checked the behaviour of copy_to_kernel_nofault() on arm, arm64, mips,
powerpc, riscv, x86 kernels (which is pretty much everything where I
know how to fire up qemu). All except arm gracefully handle an
attempt to write to userspace (well, NULL actually) with
copy_to_kernel_nofault() so I think there still a few more changes
to fully fix this.
Looks like we would need a slightly more assertive change, either adding
a copy_to_kernel_nofault_allowed() or modifying the arm dabt handlers to
avoid faults on userspace access.
Any views on which is better?
Daniel.
>
> > From f66a63b504ff582f261a506c54ceab8c0e77a98c Mon Sep 17 00:00:00 2001
> > From: Daniel Thompson <[email protected]>
> > Date: Thu, 13 Jan 2022 09:34:45 +0000
> > Subject: [PATCH] arm: mm: Implement copy_from_kernel_nofault_allowed()
> >
> > Currently copy_from_kernel_nofault() can actually fault (due to software
> > PAN) if we attempt userspace access. In any case, the documented
> > behaviour for this function is to return -ERANGE if we attempt an access
> > outside of kernel space.
> >
> > Implementing copy_from_kernel_nofault_allowed() solves both these
> > problems.
> >
> > Signed-off-by: Daniel Thompson <[email protected]>
>
> Reviewed-by: Arnd Bergmann <[email protected]>
Hi,
It seems that the problem has not been solved so far.
I found that "echo t > /proc/sysrq-trigger" causes the same fault
because "print_worker_info()" also calls "copy_from_kernel_nofault()",
but "worker->current_pwq" can be zero when copying.
Stack trace:
[ 15.303013] 8<--- cut here ---
[ 15.303315] Unhandled fault: page domain fault (0x01b) at 0x00000004
[ 15.303538] [00000004] *pgd=6338f831, *pte=00000000, *ppte=00000000
[ 15.304367] Internal error: : 1b [#1] SMP ARM
[ 15.304721] Modules linked in:
[ 15.305107] CPU: 0 PID: 89 Comm: sh Not tainted 5.19.0-rc5-dirty #332
[ 15.305373] Hardware name: ARM-Versatile Express
[ 15.305529] PC is at copy_from_kernel_nofault+0xf0/0x174
[ 15.305712] LR is at copy_from_kernel_nofault+0x30/0x174
[ 15.305873] pc : [<c0448ea4>] lr : [<c0448de4>] psr: 20000013
[ 15.306078] sp : eac4dde8 ip : 0000bff4 fp : eac4de74
[ 15.306233] r10: 00000007 r9 : 00000000 r8 : c1a09700
[ 15.306397] r7 : c1a04cc8 r6 : 00000004 r5 : eac4de18 r4 : 00000004
[ 15.306586] r3 : 00000000 r2 : c2440000 r1 : 00000004 r0 : 00000001
[ 15.306831] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM
Segment none
[ 15.307120] Control: 10c5387d Table: 633f006a DAC: 00000051
...
[ 15.318121] copy_from_kernel_nofault from print_worker_info+0xd0/0x15c
[ 15.318343] print_worker_info from sched_show_task+0x134/0x180
[ 15.318534] sched_show_task from show_state_filter+0x74/0xa8
[ 15.318714] show_state_filter from sysrq_handle_showstate+0xc/0x14
[ 15.318902] sysrq_handle_showstate from __handle_sysrq+0x88/0x138
[ 15.319173] __handle_sysrq from write_sysrq_trigger+0x4c/0x5c
[ 15.319356] write_sysrq_trigger from proc_reg_write+0xa8/0xd0
[ 15.319541] proc_reg_write from vfs_write+0xb4/0x388
[ 15.319708] vfs_write from ksys_write+0x58/0xd0
[ 15.319851] ksys_write from ret_fast_syscall+0x0/0x54
> On Thu, Jan 13, 2022 at 12:14:50PM +0100, Arnd Bergmann wrote:
> > On Thu, Jan 13, 2022 at 10:47 AM Daniel Thompson
> > <[email protected]> wrote:
> > > On Wed, Jan 12, 2022 at 06:08:17PM +0000, Russell King (Oracle)
> wrote:
> > >
> > > > The kernel attempted to access an address that is in the userspace
> > > > domain (NULL pointer) and took an exception.
> > > >
> > > > I suppose we should handle a domain fault more gracefully - what
> are
> > > > the required semantics if the kernel attempts a userspace access
> > > > using one of the _nofault() accessors?
> > >
> > > I think the best answer might well be that, if the arch provides
> > > implementations of hooks such as copy_from_kernel_nofault_allowed()
> > > then the kernel should never attempt a userspace access using the
> > > _nofault() accessors. That means they can do whatever they like!
> > >
> > > In other words something like the patch below looks like a promising
> > > approach.
> >
> > Right, it seems this is the same as on x86.
>
> Hmnn...
>
> Looking a bit deeper into copy_from_kernel_nofault() there is an odd
> asymmetry between copy_to_kernel_nofault(). Basically there is
> copy_from_kernel_nofault_allowed() but no corresponding
> copy_to_kernel_nofault_allowed() which means we cannot defend memory
> pokes using a helper function.
>
> I checked the behaviour of copy_to_kernel_nofault() on arm, arm64, mips,
> powerpc, riscv, x86 kernels (which is pretty much everything where I
> know how to fire up qemu). All except arm gracefully handle an
> attempt to write to userspace (well, NULL actually) with
> copy_to_kernel_nofault() so I think there still a few more changes
> to fully fix this.
>
> Looks like we would need a slightly more assertive change, either adding
> a copy_to_kernel_nofault_allowed() or modifying the arm dabt handlers to
> avoid faults on userspace access.
>
> Any views on which is better?
>
I've tested the copy_from_kernel_nofault_allowed() and agree that it's a
enough simple and effective solution. There is only one little gap
compared to other arch that it returns -ERANGE while actually it should
be a -EFAULT (refer to other arches).
Anyway if we want to modify the FSR handlers I guess it's also easy
because not we do nothing special for Domain Fault now.
>
> Daniel.
>
> >
> > > From f66a63b504ff582f261a506c54ceab8c0e77a98c Mon Sep 17 00:00:00
> 2001
> > > From: Daniel Thompson <[email protected]>
> > > Date: Thu, 13 Jan 2022 09:34:45 +0000
> > > Subject: [PATCH] arm: mm: Implement
> copy_from_kernel_nofault_allowed()
> > >
> > > Currently copy_from_kernel_nofault() can actually fault (due to
> software
> > > PAN) if we attempt userspace access. In any case, the documented
> > > behaviour for this function is to return -ERANGE if we attempt an
> access
> > > outside of kernel space.
> > >
> > > Implementing copy_from_kernel_nofault_allowed() solves both these
> > > problems.
> > >
> > > Signed-off-by: Daniel Thompson <[email protected]>
> >
> > Reviewed-by: Arnd Bergmann <[email protected]>
Tested-by: Chen Zhongjin <[email protected]>
Best,
Chen
On Mon, Jul 26, 2021 at 04:11:35PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <[email protected]>
>
> The system call number is used in a a couple of places, in particular
> ptrace, seccomp and /proc/<pid>/syscall.
*thread necromancy*
Hi!
So, it seems like the seccomp selftests broke in a few places due to
this change (back in v5.15). I really thought kernelci.org was running
the seccomp tests, but it seems like the coverage is spotty.
Specifically, the syscall_restart selftest fails, as well as syscall_errno
and syscall_faked (both via seccomp and PTRACE), starting with this patch.
> The last one apparently never worked reliably on ARM for tasks that are
> not currently getting traced.
>
> Storing the syscall number in the normal entry path makes it work,
> as well as allowing us to see if the current system call is for OABI
> compat mode, which is the next thing I want to hook into.
>
> Since the thread_info->syscall field is not just the number any more, it
> is now renamed to abi_syscall. In kernels that enable both OABI and EABI,
> the upper bits of this field encode 0x900000 (__NR_OABI_SYSCALL_BASE)
> for OABI tasks, while normal EABI tasks do not set the upper bits. This
> makes it possible to implement the in_oabi_syscall() helper later.
>
> All other users of thread_info->syscall go through the syscall_get_nr()
> helper, which in turn filters out the ABI bits.
While I've reproducing the bisect done by mediatek, I'm still poking
around in here to figure out what's gone wrong. There was a recent patch
to fix this, but it looks like it's not complete:
https://lore.kernel.org/all/[email protected]/
With the above applied, syscall_errno and syscall_faked start working
again, but not the syscall_restart test.
> Note that the ABI information is lost with PTRACE_SET_SYSCALL, so one
> cannot set the internal number to a particular version, but this was
> already the case. We could change it to let gdb encode the ABI type along
> with the syscall in a CONFIG_OABI_COMPAT-enabled kernel, but that itself
> would be a (backwards-compatible) ABI change, so I don't do it here.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
Another issue of note, which may just be "by design" for arm32, is that
an invalid syscall (or, at least, a negative syscall) results in SIGILL,
rather than a errno=ENOSYS failure. This seems to have been true at least
as far back as v5.8 (where this was cleaned up for at least arm64 and
s390). There was a seccomp test added for it in v5.9, but it has been
failing for arm32 since then. :(
I mention this because the behavior of the syscall_restart test looks
like an invalid syscall: on restart a SIGILL is caught instead of the
syscall correctly continuing.
Anyway, I'll keep debugging this, but figured I'd mention it in case
anyone else had been seeing issues in here.
-Kees
--
Kees Cook
On Thu, Aug 03, 2023 at 04:17:24PM -0700, Kees Cook wrote:
> Anyway, I'll keep debugging this, but figured I'd mention it in case
> anyone else had been seeing issues in here.
Okay, I think I have a working patch now. Sent here:
https://lore.kernel.org/lkml/[email protected]/
--
Kees Cook
On Fri, Aug 4, 2023, at 01:17, Kees Cook wrote:
> On Mon, Jul 26, 2021 at 04:11:35PM +0200, Arnd Bergmann wrote:
>> From: Arnd Bergmann <[email protected]>
>>
>> The system call number is used in a a couple of places, in particular
>> ptrace, seccomp and /proc/<pid>/syscall.
>
> *thread necromancy*
>
> Hi!
>
> So, it seems like the seccomp selftests broke in a few places due to
> this change (back in v5.15). I really thought kernelci.org was running
> the seccomp tests, but it seems like the coverage is spotty.
>
> Specifically, the syscall_restart selftest fails, as well as syscall_errno
> and syscall_faked (both via seccomp and PTRACE), starting with this patch.
Thanks for tracking this down!
>> The last one apparently never worked reliably on ARM for tasks that are
>> not currently getting traced.
>>
>> Storing the syscall number in the normal entry path makes it work,
>> as well as allowing us to see if the current system call is for OABI
>> compat mode, which is the next thing I want to hook into.
>>
>> Since the thread_info->syscall field is not just the number any more, it
>> is now renamed to abi_syscall. In kernels that enable both OABI and EABI,
>> the upper bits of this field encode 0x900000 (__NR_OABI_SYSCALL_BASE)
>> for OABI tasks, while normal EABI tasks do not set the upper bits. This
>> makes it possible to implement the in_oabi_syscall() helper later.
>>
>> All other users of thread_info->syscall go through the syscall_get_nr()
>> helper, which in turn filters out the ABI bits.
>
> While I've reproducing the bisect done by mediatek, I'm still poking
> around in here to figure out what's gone wrong. There was a recent patch
> to fix this, but it looks like it's not complete:
> https://lore.kernel.org/all/[email protected]/
>
> With the above applied, syscall_errno and syscall_faked start working
> again, but not the syscall_restart test.
Right, I also see you addressed this better in your follow-up patch,
I'll comment there.
>> Note that the ABI information is lost with PTRACE_SET_SYSCALL, so one
>> cannot set the internal number to a particular version, but this was
>> already the case. We could change it to let gdb encode the ABI type along
>> with the syscall in a CONFIG_OABI_COMPAT-enabled kernel, but that itself
>> would be a (backwards-compatible) ABI change, so I don't do it here.
>>
>> Signed-off-by: Arnd Bergmann <[email protected]>
>
> Another issue of note, which may just be "by design" for arm32, is that
> an invalid syscall (or, at least, a negative syscall) results in SIGILL,
> rather than a errno=ENOSYS failure. This seems to have been true at least
> as far back as v5.8 (where this was cleaned up for at least arm64 and
> s390). There was a seccomp test added for it in v5.9, but it has been
> failing for arm32 since then. :(
>
> I mention this because the behavior of the syscall_restart test looks
> like an invalid syscall: on restart a SIGILL is caught instead of the
> syscall correctly continuing.
The odd arm behavior came up on IRC recently, and I saw that this
was what arm has always done, but I could not figure out why this
is done. I tried to see where s390 and arm64 changed the behavior
but can't find it. Do you have the commit IDs?
Arnd