2021-05-19 07:44:34

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH v3 0/4] compat: remove compat_alloc_user_space callers

From: Arnd Bergmann <[email protected]>

Going through compat_alloc_user_space() to convert indirect system call
arguments tends to add complexity compared to handling the native and
compat logic in the same code.

There is one other trivial patch I resent for the atomisp staging driver
and a longer series for networking ioctls that I need to revalidate
before submitting again.

Sorry for the long delay before resending this. Once everyone is happy
with the latest version, I would add it to the asm-generic tree for
a 5.14 merge.

Arnd
---

Changes in v3:

- fix whitespace as pointed out by Christoph Hellwig
- minor build fixes
- rebase to v5.13-rc1

Changes in v2:

- address review comments from Christoph Hellwig
- split syscall removal into a separate patch
- replace __X32_COND_SYSCALL() with individual macros for x32

Link: https://lore.kernel.org/lkml/[email protected]/

Arnd Bergmann (4):
kexec: simplify compat_sys_kexec_load
mm: simplify compat_sys_move_pages
mm: simplify compat numa syscalls
compat: remove some compat entry points

arch/arm64/include/asm/unistd32.h | 12 +-
arch/mips/kernel/syscalls/syscall_n32.tbl | 12 +-
arch/mips/kernel/syscalls/syscall_o32.tbl | 12 +-
arch/parisc/kernel/syscalls/syscall.tbl | 10 +-
arch/powerpc/kernel/syscalls/syscall.tbl | 12 +-
arch/s390/kernel/syscalls/syscall.tbl | 12 +-
arch/sparc/kernel/syscalls/syscall.tbl | 12 +-
arch/x86/entry/syscall_x32.c | 2 +
arch/x86/entry/syscalls/syscall_32.tbl | 6 +-
arch/x86/entry/syscalls/syscall_64.tbl | 4 +-
include/linux/compat.h | 43 +----
include/linux/kexec.h | 2 -
include/uapi/asm-generic/unistd.h | 12 +-
kernel/kexec.c | 90 ++++------
kernel/sys_ni.c | 5 -
mm/mempolicy.c | 195 +++++-----------------
mm/migrate.c | 50 +++---
17 files changed, 164 insertions(+), 327 deletions(-)

--
2.29.2

Cc: Christoph Hellwig <[email protected]>
Cc: linux-arch <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Linux ARM <[email protected]>
Cc: [email protected]
Cc: Linux-MM <[email protected]>
Cc: [email protected]


2021-05-19 07:44:36

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH v3 1/4] kexec: simplify compat_sys_kexec_load

From: Arnd Bergmann <[email protected]>

The compat version of sys_kexec_load() uses compat_alloc_user_space to
convert the user-provided arguments into the native format.

Move the conversion into the regular implementation with
an in_compat_syscall() check to simplify it and avoid the
compat_alloc_user_space() call.

compat_sys_kexec_load() now behaves the same as sys_kexec_load().

Signed-off-by: Arnd Bergmann <[email protected]>
---
include/linux/kexec.h | 2 -
kernel/kexec.c | 95 +++++++++++++++++++------------------------
2 files changed, 42 insertions(+), 55 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 0c994ae37729..f61e310d7a85 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -88,14 +88,12 @@ struct kexec_segment {
size_t memsz;
};

-#ifdef CONFIG_COMPAT
struct compat_kexec_segment {
compat_uptr_t buf;
compat_size_t bufsz;
compat_ulong_t mem; /* User space sees this as a (void *) ... */
compat_size_t memsz;
};
-#endif

#ifdef CONFIG_KEXEC_FILE
struct purgatory_info {
diff --git a/kernel/kexec.c b/kernel/kexec.c
index c82c6c06f051..6618b1d9f00b 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -19,21 +19,46 @@

#include "kexec_internal.h"

+static int copy_user_compat_segment_list(struct kimage *image,
+ unsigned long nr_segments,
+ void __user *segments)
+{
+ struct compat_kexec_segment __user *cs = segments;
+ struct compat_kexec_segment segment;
+ int i;
+
+ for (i = 0; i < nr_segments; i++) {
+ if (copy_from_user(&segment, &cs[i], sizeof(segment)))
+ return -EFAULT;
+
+ image->segment[i] = (struct kexec_segment) {
+ .buf = compat_ptr(segment.buf),
+ .bufsz = segment.bufsz,
+ .mem = segment.mem,
+ .memsz = segment.memsz,
+ };
+ }
+
+ return 0;
+}
+
+
static int copy_user_segment_list(struct kimage *image,
unsigned long nr_segments,
struct kexec_segment __user *segments)
{
- int ret;
size_t segment_bytes;

/* Read in the segments */
image->nr_segments = nr_segments;
segment_bytes = nr_segments * sizeof(*segments);
- ret = copy_from_user(image->segment, segments, segment_bytes);
- if (ret)
- ret = -EFAULT;
+ if (in_compat_syscall())
+ return copy_user_compat_segment_list(image, nr_segments, segments);

- return ret;
+ if (copy_from_user(image->segment, segments, segment_bytes))
+ return -EFAULT;
+
+ return 0;
}

static int kimage_alloc_init(struct kimage **rimage, unsigned long entry,
@@ -233,8 +258,9 @@ static inline int kexec_load_check(unsigned long nr_segments,
return 0;
}

-SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
- struct kexec_segment __user *, segments, unsigned long, flags)
+static int kernel_kexec_load(unsigned long entry, unsigned long nr_segments,
+ struct kexec_segment __user * segments,
+ unsigned long flags)
{
int result;

@@ -265,57 +291,20 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
return result;
}

+SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
+ struct kexec_segment __user *, segments, unsigned long, flags)
+{
+ return kernel_kexec_load(entry, nr_segments, segments, flags);
+}
+
#ifdef CONFIG_COMPAT
COMPAT_SYSCALL_DEFINE4(kexec_load, compat_ulong_t, entry,
compat_ulong_t, nr_segments,
struct compat_kexec_segment __user *, segments,
compat_ulong_t, flags)
{
- struct compat_kexec_segment in;
- struct kexec_segment out, __user *ksegments;
- unsigned long i, result;
-
- result = kexec_load_check(nr_segments, flags);
- if (result)
- return result;
-
- /* Don't allow clients that don't understand the native
- * architecture to do anything.
- */
- if ((flags & KEXEC_ARCH_MASK) == KEXEC_ARCH_DEFAULT)
- return -EINVAL;
-
- ksegments = compat_alloc_user_space(nr_segments * sizeof(out));
- for (i = 0; i < nr_segments; i++) {
- result = copy_from_user(&in, &segments[i], sizeof(in));
- if (result)
- return -EFAULT;
-
- out.buf = compat_ptr(in.buf);
- out.bufsz = in.bufsz;
- out.mem = in.mem;
- out.memsz = in.memsz;
-
- result = copy_to_user(&ksegments[i], &out, sizeof(out));
- if (result)
- return -EFAULT;
- }
-
- /* Because we write directly to the reserved memory
- * region when loading crash kernels we need a mutex here to
- * prevent multiple crash kernels from attempting to load
- * simultaneously, and to prevent a crash kernel from loading
- * over the top of a in use crash kernel.
- *
- * KISS: always take the mutex.
- */
- if (!mutex_trylock(&kexec_mutex))
- return -EBUSY;
-
- result = do_kexec_load(entry, nr_segments, ksegments, flags);
-
- mutex_unlock(&kexec_mutex);
-
- return result;
+ return kernel_kexec_load(entry, nr_segments,
+ (struct kexec_segment __user *)segments,
+ flags);
}
#endif
--
2.29.2


2021-05-19 07:44:37

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH v3 2/4] mm: simplify compat_sys_move_pages

From: Arnd Bergmann <[email protected]>

The compat move_pages() implementation uses compat_alloc_user_space()
for converting the pointer array. Moving the compat handling into
the function itself is a bit simpler and lets us avoid the
compat_alloc_user_space() call.

Signed-off-by: Arnd Bergmann <[email protected]>
---
mm/migrate.c | 45 ++++++++++++++++++++++++++++++---------------
1 file changed, 30 insertions(+), 15 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index b234c3f3acb7..a68d07f19a1a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1855,6 +1855,23 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
mmap_read_unlock(mm);
}

+static int put_compat_pages_array(const void __user *chunk_pages[],
+ const void __user * __user *pages,
+ unsigned long chunk_nr)
+{
+ compat_uptr_t __user *pages32 = (compat_uptr_t __user *)pages;
+ compat_uptr_t p;
+ int i;
+
+ for (i = 0; i < chunk_nr; i++) {
+ if (get_user(p, pages32 + i))
+ return -EFAULT;
+ chunk_pages[i] = compat_ptr(p);
+ }
+
+ return 0;
+}
+
/*
* Determine the nodes of a user array of pages and store it in
* a user array of status.
@@ -1874,8 +1891,15 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
if (chunk_nr > DO_PAGES_STAT_CHUNK_NR)
chunk_nr = DO_PAGES_STAT_CHUNK_NR;

- if (copy_from_user(chunk_pages, pages, chunk_nr * sizeof(*chunk_pages)))
- break;
+ if (in_compat_syscall()) {
+ if (put_compat_pages_array(chunk_pages, pages,
+ chunk_nr))
+ break;
+ } else {
+ if (copy_from_user(chunk_pages, pages,
+ chunk_nr * sizeof(*chunk_pages)))
+ break;
+ }

do_pages_stat_array(mm, chunk_nr, chunk_pages, chunk_status);

@@ -1980,23 +2004,14 @@ SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages,

#ifdef CONFIG_COMPAT
COMPAT_SYSCALL_DEFINE6(move_pages, pid_t, pid, compat_ulong_t, nr_pages,
- compat_uptr_t __user *, pages32,
+ compat_uptr_t __user *, pages,
const int __user *, nodes,
int __user *, status,
int, flags)
{
- const void __user * __user *pages;
- int i;
-
- pages = compat_alloc_user_space(nr_pages * sizeof(void *));
- for (i = 0; i < nr_pages; i++) {
- compat_uptr_t p;
-
- if (get_user(p, pages32 + i) ||
- put_user(compat_ptr(p), pages + i))
- return -EFAULT;
- }
- return kernel_move_pages(pid, nr_pages, pages, nodes, status, flags);
+ return kernel_move_pages(pid, nr_pages,
+ (const void __user *__user *)pages,
+ nodes, status, flags);
}
#endif /* CONFIG_COMPAT */

--
2.29.2


2021-05-19 07:44:39

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH v3 3/4] mm: simplify compat numa syscalls

From: Arnd Bergmann <[email protected]>

The compat implementations for mbind, get_mempolicy, set_mempolicy
and migrate_pages are just there to handle the subtly different
layout of bitmaps on 32-bit hosts.

The compat implementation however lacks some of the checks that
are present in the native one, in particular for checking that
the extra bits are all zero when user space has a larger mask
size than the kernel. Worse, those extra bits do not get cleared
when copying in or out of the kernel, which can lead to incorrect
data as well.

Unify the implementation to handle the compat bitmap layout directly
in the get_nodes() and copy_nodes_to_user() helpers. Splitting out
the get_bitmap() helper from get_nodes() also helps readability of the
native case.

On x86, two additional problems are addressed by this: compat tasks can
pass a bitmap at the end of a mapping, causing a fault when reading
across the page boundary for a 64-bit word. x32 tasks might also run
into problems with get_mempolicy corrupting data when an odd number of
32-bit words gets passed.

On parisc the migrate_pages() system call apparently had the wrong
calling convention, as big-endian architectures expect the words
inside of a bitmap to be swapped. This is not a problem though
since parisc has no NUMA support.

Signed-off-by: Arnd Bergmann <[email protected]>
---
include/linux/compat.h | 17 ++--
mm/mempolicy.c | 174 +++++++++++++----------------------------
2 files changed, 62 insertions(+), 129 deletions(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index 98dd7b324c35..69d98fa66247 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -395,14 +395,6 @@ struct compat_kexec_segment;
struct compat_mq_attr;
struct compat_msgbuf;

-#define BITS_PER_COMPAT_LONG (8*sizeof(compat_long_t))
-
-#define BITS_TO_COMPAT_LONGS(bits) DIV_ROUND_UP(bits, BITS_PER_COMPAT_LONG)
-
-long compat_get_bitmap(unsigned long *mask, const compat_ulong_t __user *umask,
- unsigned long bitmap_size);
-long compat_put_bitmap(compat_ulong_t __user *umask, unsigned long *mask,
- unsigned long bitmap_size);
void copy_siginfo_to_external32(struct compat_siginfo *to,
const struct kernel_siginfo *from);
int copy_siginfo_from_user32(kernel_siginfo_t *to,
@@ -978,6 +970,15 @@ static inline bool in_compat_syscall(void) { return false; }

#endif /* CONFIG_COMPAT */

+#define BITS_PER_COMPAT_LONG (8*sizeof(compat_long_t))
+
+#define BITS_TO_COMPAT_LONGS(bits) DIV_ROUND_UP(bits, BITS_PER_COMPAT_LONG)
+
+long compat_get_bitmap(unsigned long *mask, const compat_ulong_t __user *umask,
+ unsigned long bitmap_size);
+long compat_put_bitmap(compat_ulong_t __user *umask, unsigned long *mask,
+ unsigned long bitmap_size);
+
/*
* Some legacy ABIs like the i386 one use less than natural alignment for 64-bit
* types, and will need special compat treatment for that. Most architectures
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d79fa299b70c..a3ecd5b922be 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1381,16 +1381,32 @@ static long do_mbind(unsigned long start, unsigned long len,
/*
* User space interface with variable sized bitmaps for nodelists.
*/
+static int get_bitmap(unsigned long *mask, const unsigned long __user *nmask,
+ unsigned long maxnode)
+{
+ unsigned long nlongs = BITS_TO_LONGS(maxnode);
+ int ret;
+
+ if (in_compat_syscall())
+ ret = compat_get_bitmap(mask,
+ (const compat_ulong_t __user *)nmask,
+ maxnode);
+ else
+ ret = copy_from_user(mask, nmask, nlongs*sizeof(unsigned long));
+
+ if (ret)
+ return -EFAULT;
+
+ if (maxnode % BITS_PER_LONG)
+ mask[nlongs-1] &= (1UL << (maxnode % BITS_PER_LONG)) - 1;
+
+ return 0;
+}

/* Copy a node mask from user space. */
static int get_nodes(nodemask_t *nodes, const unsigned long __user *nmask,
unsigned long maxnode)
{
- unsigned long k;
- unsigned long t;
- unsigned long nlongs;
- unsigned long endmask;
-
--maxnode;
nodes_clear(*nodes);
if (maxnode == 0 || !nmask)
@@ -1398,49 +1414,29 @@ static int get_nodes(nodemask_t *nodes, const unsigned long __user *nmask,
if (maxnode > PAGE_SIZE*BITS_PER_BYTE)
return -EINVAL;

- nlongs = BITS_TO_LONGS(maxnode);
- if ((maxnode % BITS_PER_LONG) == 0)
- endmask = ~0UL;
- else
- endmask = (1UL << (maxnode % BITS_PER_LONG)) - 1;
-
/*
* When the user specified more nodes than supported just check
- * if the non supported part is all zero.
- *
- * If maxnode have more longs than MAX_NUMNODES, check
- * the bits in that area first. And then go through to
- * check the rest bits which equal or bigger than MAX_NUMNODES.
- * Otherwise, just check bits [MAX_NUMNODES, maxnode).
+ * if the non supported part is all zero, one word at a time,
+ * starting at the end.
*/
- if (nlongs > BITS_TO_LONGS(MAX_NUMNODES)) {
- for (k = BITS_TO_LONGS(MAX_NUMNODES); k < nlongs; k++) {
- if (get_user(t, nmask + k))
- return -EFAULT;
- if (k == nlongs - 1) {
- if (t & endmask)
- return -EINVAL;
- } else if (t)
- return -EINVAL;
- }
- nlongs = BITS_TO_LONGS(MAX_NUMNODES);
- endmask = ~0UL;
- }
+ while (maxnode > MAX_NUMNODES) {
+ unsigned long bits = min_t(unsigned long, maxnode, BITS_PER_LONG);
+ unsigned long t;

- if (maxnode > MAX_NUMNODES && MAX_NUMNODES % BITS_PER_LONG != 0) {
- unsigned long valid_mask = endmask;
-
- valid_mask &= ~((1UL << (MAX_NUMNODES % BITS_PER_LONG)) - 1);
- if (get_user(t, nmask + nlongs - 1))
+ if (get_bitmap(&t, &nmask[maxnode / BITS_PER_LONG], bits))
return -EFAULT;
- if (t & valid_mask)
+
+ if (maxnode - bits >= MAX_NUMNODES) {
+ maxnode -= bits;
+ } else {
+ maxnode = MAX_NUMNODES;
+ t &= ~((1UL << (MAX_NUMNODES % BITS_PER_LONG)) - 1);
+ }
+ if (t)
return -EINVAL;
}

- if (copy_from_user(nodes_addr(*nodes), nmask, nlongs*sizeof(unsigned long)))
- return -EFAULT;
- nodes_addr(*nodes)[nlongs-1] &= endmask;
- return 0;
+ return get_bitmap(nodes_addr(*nodes), nmask, maxnode);
}

/* Copy a kernel node mask to user space */
@@ -1449,6 +1445,10 @@ static int copy_nodes_to_user(unsigned long __user *mask, unsigned long maxnode,
{
unsigned long copy = ALIGN(maxnode-1, 64) / 8;
unsigned int nbytes = BITS_TO_LONGS(nr_node_ids) * sizeof(long);
+ bool compat = in_compat_syscall();
+
+ if (compat)
+ nbytes = BITS_TO_COMPAT_LONGS(nr_node_ids) * sizeof(compat_long_t);

if (copy > nbytes) {
if (copy > PAGE_SIZE)
@@ -1457,6 +1457,11 @@ static int copy_nodes_to_user(unsigned long __user *mask, unsigned long maxnode,
return -EFAULT;
copy = nbytes;
}
+
+ if (compat)
+ return compat_put_bitmap((compat_ulong_t __user *)mask,
+ nodes_addr(*nodes), maxnode);
+
return copy_to_user(mask, nodes_addr(*nodes), copy) ? -EFAULT : 0;
}

@@ -1655,72 +1660,22 @@ COMPAT_SYSCALL_DEFINE5(get_mempolicy, int __user *, policy,
compat_ulong_t, maxnode,
compat_ulong_t, addr, compat_ulong_t, flags)
{
- long err;
- unsigned long __user *nm = NULL;
- unsigned long nr_bits, alloc_size;
- DECLARE_BITMAP(bm, MAX_NUMNODES);
-
- nr_bits = min_t(unsigned long, maxnode-1, nr_node_ids);
- alloc_size = ALIGN(nr_bits, BITS_PER_LONG) / 8;
-
- if (nmask)
- nm = compat_alloc_user_space(alloc_size);
-
- err = kernel_get_mempolicy(policy, nm, nr_bits+1, addr, flags);
-
- if (!err && nmask) {
- unsigned long copy_size;
- copy_size = min_t(unsigned long, sizeof(bm), alloc_size);
- err = copy_from_user(bm, nm, copy_size);
- /* ensure entire bitmap is zeroed */
- err |= clear_user(nmask, ALIGN(maxnode-1, 8) / 8);
- err |= compat_put_bitmap(nmask, bm, nr_bits);
- }
-
- return err;
+ return kernel_get_mempolicy(policy, (unsigned long __user *)nmask,
+ maxnode, addr, flags);
}

COMPAT_SYSCALL_DEFINE3(set_mempolicy, int, mode, compat_ulong_t __user *, nmask,
compat_ulong_t, maxnode)
{
- unsigned long __user *nm = NULL;
- unsigned long nr_bits, alloc_size;
- DECLARE_BITMAP(bm, MAX_NUMNODES);
-
- nr_bits = min_t(unsigned long, maxnode-1, MAX_NUMNODES);
- alloc_size = ALIGN(nr_bits, BITS_PER_LONG) / 8;
-
- if (nmask) {
- if (compat_get_bitmap(bm, nmask, nr_bits))
- return -EFAULT;
- nm = compat_alloc_user_space(alloc_size);
- if (copy_to_user(nm, bm, alloc_size))
- return -EFAULT;
- }
-
- return kernel_set_mempolicy(mode, nm, nr_bits+1);
+ return kernel_set_mempolicy(mode, (unsigned long __user *)nmask, maxnode);
}

COMPAT_SYSCALL_DEFINE6(mbind, compat_ulong_t, start, compat_ulong_t, len,
compat_ulong_t, mode, compat_ulong_t __user *, nmask,
compat_ulong_t, maxnode, compat_ulong_t, flags)
{
- unsigned long __user *nm = NULL;
- unsigned long nr_bits, alloc_size;
- nodemask_t bm;
-
- nr_bits = min_t(unsigned long, maxnode-1, MAX_NUMNODES);
- alloc_size = ALIGN(nr_bits, BITS_PER_LONG) / 8;
-
- if (nmask) {
- if (compat_get_bitmap(nodes_addr(bm), nmask, nr_bits))
- return -EFAULT;
- nm = compat_alloc_user_space(alloc_size);
- if (copy_to_user(nm, nodes_addr(bm), alloc_size))
- return -EFAULT;
- }
-
- return kernel_mbind(start, len, mode, nm, nr_bits+1, flags);
+ return kernel_mbind(start, len, mode, (unsigned long __user *)nmask,
+ maxnode, flags);
}

COMPAT_SYSCALL_DEFINE4(migrate_pages, compat_pid_t, pid,
@@ -1728,32 +1683,9 @@ COMPAT_SYSCALL_DEFINE4(migrate_pages, compat_pid_t, pid,
const compat_ulong_t __user *, old_nodes,
const compat_ulong_t __user *, new_nodes)
{
- unsigned long __user *old = NULL;
- unsigned long __user *new = NULL;
- nodemask_t tmp_mask;
- unsigned long nr_bits;
- unsigned long size;
-
- nr_bits = min_t(unsigned long, maxnode - 1, MAX_NUMNODES);
- size = ALIGN(nr_bits, BITS_PER_LONG) / 8;
- if (old_nodes) {
- if (compat_get_bitmap(nodes_addr(tmp_mask), old_nodes, nr_bits))
- return -EFAULT;
- old = compat_alloc_user_space(new_nodes ? size * 2 : size);
- if (new_nodes)
- new = old + size / sizeof(unsigned long);
- if (copy_to_user(old, nodes_addr(tmp_mask), size))
- return -EFAULT;
- }
- if (new_nodes) {
- if (compat_get_bitmap(nodes_addr(tmp_mask), new_nodes, nr_bits))
- return -EFAULT;
- if (new == NULL)
- new = compat_alloc_user_space(size);
- if (copy_to_user(new, nodes_addr(tmp_mask), size))
- return -EFAULT;
- }
- return kernel_migrate_pages(pid, nr_bits + 1, old, new);
+ return kernel_migrate_pages(pid, maxnode,
+ (const unsigned long __user *)old_nodes,
+ (const unsigned long __user *)new_nodes);
}

#endif /* CONFIG_COMPAT */
--
2.29.2


2021-05-19 08:00:57

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH v3 4/4] compat: remove some compat entry points

From: Arnd Bergmann <[email protected]>

These are all handled correctly when calling the native
system call entry point, so remove the special cases.

Signed-off-by: Arnd Bergmann <[email protected]>
---
arch/arm64/include/asm/unistd32.h | 12 ++++----
arch/mips/kernel/syscalls/syscall_n32.tbl | 12 ++++----
arch/mips/kernel/syscalls/syscall_o32.tbl | 12 ++++----
arch/parisc/kernel/syscalls/syscall.tbl | 10 +++---
arch/powerpc/kernel/syscalls/syscall.tbl | 12 ++++----
arch/s390/kernel/syscalls/syscall.tbl | 12 ++++----
arch/sparc/kernel/syscalls/syscall.tbl | 12 ++++----
arch/x86/entry/syscall_x32.c | 2 ++
arch/x86/entry/syscalls/syscall_32.tbl | 6 ++--
arch/x86/entry/syscalls/syscall_64.tbl | 4 +--
include/linux/compat.h | 26 ----------------
include/uapi/asm-generic/unistd.h | 12 ++++----
kernel/kexec.c | 23 ++------------
kernel/sys_ni.c | 5 ---
mm/mempolicy.c | 37 -----------------------
mm/migrate.c | 13 --------
16 files changed, 56 insertions(+), 154 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 7859749d6628..0966ee946636 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -649,11 +649,11 @@ __SYSCALL(__NR_inotify_add_watch, sys_inotify_add_watch)
#define __NR_inotify_rm_watch 318
__SYSCALL(__NR_inotify_rm_watch, sys_inotify_rm_watch)
#define __NR_mbind 319
-__SYSCALL(__NR_mbind, compat_sys_mbind)
+__SYSCALL(__NR_mbind, sys_mbind)
#define __NR_get_mempolicy 320
-__SYSCALL(__NR_get_mempolicy, compat_sys_get_mempolicy)
+__SYSCALL(__NR_get_mempolicy, sys_get_mempolicy)
#define __NR_set_mempolicy 321
-__SYSCALL(__NR_set_mempolicy, compat_sys_set_mempolicy)
+__SYSCALL(__NR_set_mempolicy, sys_set_mempolicy)
#define __NR_openat 322
__SYSCALL(__NR_openat, compat_sys_openat)
#define __NR_mkdirat 323
@@ -699,13 +699,13 @@ __SYSCALL(__NR_tee, sys_tee)
#define __NR_vmsplice 343
__SYSCALL(__NR_vmsplice, sys_vmsplice)
#define __NR_move_pages 344
-__SYSCALL(__NR_move_pages, compat_sys_move_pages)
+__SYSCALL(__NR_move_pages, sys_move_pages)
#define __NR_getcpu 345
__SYSCALL(__NR_getcpu, sys_getcpu)
#define __NR_epoll_pwait 346
__SYSCALL(__NR_epoll_pwait, compat_sys_epoll_pwait)
#define __NR_kexec_load 347
-__SYSCALL(__NR_kexec_load, compat_sys_kexec_load)
+__SYSCALL(__NR_kexec_load, sys_kexec_load)
#define __NR_utimensat 348
__SYSCALL(__NR_utimensat, sys_utimensat_time32)
#define __NR_signalfd 349
@@ -811,7 +811,7 @@ __SYSCALL(__NR_rseq, sys_rseq)
#define __NR_io_pgetevents 399
__SYSCALL(__NR_io_pgetevents, compat_sys_io_pgetevents)
#define __NR_migrate_pages 400
-__SYSCALL(__NR_migrate_pages, compat_sys_migrate_pages)
+__SYSCALL(__NR_migrate_pages, sys_migrate_pages)
#define __NR_kexec_file_load 401
__SYSCALL(__NR_kexec_file_load, sys_kexec_file_load)
/* 402 is unused */
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 5e0096657251..8aea1d9c9189 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -239,9 +239,9 @@
228 n32 clock_nanosleep sys_clock_nanosleep_time32
229 n32 tgkill sys_tgkill
230 n32 utimes sys_utimes_time32
-231 n32 mbind compat_sys_mbind
-232 n32 get_mempolicy compat_sys_get_mempolicy
-233 n32 set_mempolicy compat_sys_set_mempolicy
+231 n32 mbind sys_mbind
+232 n32 get_mempolicy sys_get_mempolicy
+233 n32 set_mempolicy sys_set_mempolicy
234 n32 mq_open compat_sys_mq_open
235 n32 mq_unlink sys_mq_unlink
236 n32 mq_timedsend sys_mq_timedsend_time32
@@ -258,7 +258,7 @@
247 n32 inotify_init sys_inotify_init
248 n32 inotify_add_watch sys_inotify_add_watch
249 n32 inotify_rm_watch sys_inotify_rm_watch
-250 n32 migrate_pages compat_sys_migrate_pages
+250 n32 migrate_pages sys_migrate_pages
251 n32 openat sys_openat
252 n32 mkdirat sys_mkdirat
253 n32 mknodat sys_mknodat
@@ -279,10 +279,10 @@
268 n32 sync_file_range sys_sync_file_range
269 n32 tee sys_tee
270 n32 vmsplice sys_vmsplice
-271 n32 move_pages compat_sys_move_pages
+271 n32 move_pages sys_move_pages
272 n32 set_robust_list compat_sys_set_robust_list
273 n32 get_robust_list compat_sys_get_robust_list
-274 n32 kexec_load compat_sys_kexec_load
+274 n32 kexec_load sys_kexec_load
275 n32 getcpu sys_getcpu
276 n32 epoll_pwait compat_sys_epoll_pwait
277 n32 ioprio_set sys_ioprio_set
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 39d6e71e57b6..3645a7713480 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -279,9 +279,9 @@
265 o32 clock_nanosleep sys_clock_nanosleep_time32
266 o32 tgkill sys_tgkill
267 o32 utimes sys_utimes_time32
-268 o32 mbind sys_mbind compat_sys_mbind
-269 o32 get_mempolicy sys_get_mempolicy compat_sys_get_mempolicy
-270 o32 set_mempolicy sys_set_mempolicy compat_sys_set_mempolicy
+268 o32 mbind sys_mbind
+269 o32 get_mempolicy sys_get_mempolicy
+270 o32 set_mempolicy sys_set_mempolicy
271 o32 mq_open sys_mq_open compat_sys_mq_open
272 o32 mq_unlink sys_mq_unlink
273 o32 mq_timedsend sys_mq_timedsend_time32
@@ -298,7 +298,7 @@
284 o32 inotify_init sys_inotify_init
285 o32 inotify_add_watch sys_inotify_add_watch
286 o32 inotify_rm_watch sys_inotify_rm_watch
-287 o32 migrate_pages sys_migrate_pages compat_sys_migrate_pages
+287 o32 migrate_pages sys_migrate_pages
288 o32 openat sys_openat compat_sys_openat
289 o32 mkdirat sys_mkdirat
290 o32 mknodat sys_mknodat
@@ -319,10 +319,10 @@
305 o32 sync_file_range sys_sync_file_range sys32_sync_file_range
306 o32 tee sys_tee
307 o32 vmsplice sys_vmsplice
-308 o32 move_pages sys_move_pages compat_sys_move_pages
+308 o32 move_pages sys_move_pages
309 o32 set_robust_list sys_set_robust_list compat_sys_set_robust_list
310 o32 get_robust_list sys_get_robust_list compat_sys_get_robust_list
-311 o32 kexec_load sys_kexec_load compat_sys_kexec_load
+311 o32 kexec_load sys_kexec_load
312 o32 getcpu sys_getcpu
313 o32 epoll_pwait sys_epoll_pwait compat_sys_epoll_pwait
314 o32 ioprio_set sys_ioprio_set
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 5ac80b83d745..814b2a6a64c2 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -292,9 +292,9 @@
258 32 clock_nanosleep sys_clock_nanosleep_time32
258 64 clock_nanosleep sys_clock_nanosleep
259 common tgkill sys_tgkill
-260 common mbind sys_mbind compat_sys_mbind
-261 common get_mempolicy sys_get_mempolicy compat_sys_get_mempolicy
-262 common set_mempolicy sys_set_mempolicy compat_sys_set_mempolicy
+260 common mbind sys_mbind
+261 common get_mempolicy sys_get_mempolicy
+262 common set_mempolicy sys_set_mempolicy
# 263 was vserver
264 common add_key sys_add_key
265 common request_key sys_request_key
@@ -331,12 +331,12 @@
292 64 sync_file_range sys_sync_file_range
293 common tee sys_tee
294 common vmsplice sys_vmsplice
-295 common move_pages sys_move_pages compat_sys_move_pages
+295 common move_pages sys_move_pages
296 common getcpu sys_getcpu
297 common epoll_pwait sys_epoll_pwait compat_sys_epoll_pwait
298 common statfs64 sys_statfs64 compat_sys_statfs64
299 common fstatfs64 sys_fstatfs64 compat_sys_fstatfs64
-300 common kexec_load sys_kexec_load compat_sys_kexec_load
+300 common kexec_load sys_kexec_load
301 32 utimensat sys_utimensat_time32
301 64 utimensat sys_utimensat
302 common signalfd sys_signalfd compat_sys_signalfd
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 2e68fbb57cc6..13d51ae3f94d 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -330,10 +330,10 @@
256 64 sys_debug_setcontext sys_ni_syscall
256 spu sys_debug_setcontext sys_ni_syscall
# 257 reserved for vserver
-258 nospu migrate_pages sys_migrate_pages compat_sys_migrate_pages
-259 nospu mbind sys_mbind compat_sys_mbind
-260 nospu get_mempolicy sys_get_mempolicy compat_sys_get_mempolicy
-261 nospu set_mempolicy sys_set_mempolicy compat_sys_set_mempolicy
+258 nospu migrate_pages sys_migrate_pages
+259 nospu mbind sys_mbind
+260 nospu get_mempolicy sys_get_mempolicy
+261 nospu set_mempolicy sys_set_mempolicy
262 nospu mq_open sys_mq_open compat_sys_mq_open
263 nospu mq_unlink sys_mq_unlink
264 32 mq_timedsend sys_mq_timedsend_time32
@@ -342,7 +342,7 @@
265 64 mq_timedreceive sys_mq_timedreceive
266 nospu mq_notify sys_mq_notify compat_sys_mq_notify
267 nospu mq_getsetattr sys_mq_getsetattr compat_sys_mq_getsetattr
-268 nospu kexec_load sys_kexec_load compat_sys_kexec_load
+268 nospu kexec_load sys_kexec_load
269 nospu add_key sys_add_key
270 nospu request_key sys_request_key
271 nospu keyctl sys_keyctl compat_sys_keyctl
@@ -381,7 +381,7 @@
298 common faccessat sys_faccessat
299 common get_robust_list sys_get_robust_list compat_sys_get_robust_list
300 common set_robust_list sys_set_robust_list compat_sys_set_robust_list
-301 common move_pages sys_move_pages compat_sys_move_pages
+301 common move_pages sys_move_pages
302 common getcpu sys_getcpu
303 nospu epoll_pwait sys_epoll_pwait compat_sys_epoll_pwait
304 32 utimensat sys_utimensat_time32
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 7e4a2aba366d..9e4bd886458a 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -274,16 +274,16 @@
265 common statfs64 sys_statfs64 compat_sys_statfs64
266 common fstatfs64 sys_fstatfs64 compat_sys_fstatfs64
267 common remap_file_pages sys_remap_file_pages sys_remap_file_pages
-268 common mbind sys_mbind compat_sys_mbind
-269 common get_mempolicy sys_get_mempolicy compat_sys_get_mempolicy
-270 common set_mempolicy sys_set_mempolicy compat_sys_set_mempolicy
+268 common mbind sys_mbind sys_mbind
+269 common get_mempolicy sys_get_mempolicy sys_get_mempolicy
+270 common set_mempolicy sys_set_mempolicy sys_set_mempolicy
271 common mq_open sys_mq_open compat_sys_mq_open
272 common mq_unlink sys_mq_unlink sys_mq_unlink
273 common mq_timedsend sys_mq_timedsend sys_mq_timedsend_time32
274 common mq_timedreceive sys_mq_timedreceive sys_mq_timedreceive_time32
275 common mq_notify sys_mq_notify compat_sys_mq_notify
276 common mq_getsetattr sys_mq_getsetattr compat_sys_mq_getsetattr
-277 common kexec_load sys_kexec_load compat_sys_kexec_load
+277 common kexec_load sys_kexec_load sys_kexec_load
278 common add_key sys_add_key sys_add_key
279 common request_key sys_request_key sys_request_key
280 common keyctl sys_keyctl compat_sys_keyctl
@@ -293,7 +293,7 @@
284 common inotify_init sys_inotify_init sys_inotify_init
285 common inotify_add_watch sys_inotify_add_watch sys_inotify_add_watch
286 common inotify_rm_watch sys_inotify_rm_watch sys_inotify_rm_watch
-287 common migrate_pages sys_migrate_pages compat_sys_migrate_pages
+287 common migrate_pages sys_migrate_pages sys_migrate_pages
288 common openat sys_openat compat_sys_openat
289 common mkdirat sys_mkdirat sys_mkdirat
290 common mknodat sys_mknodat sys_mknodat
@@ -317,7 +317,7 @@
307 common sync_file_range sys_sync_file_range compat_sys_s390_sync_file_range
308 common tee sys_tee sys_tee
309 common vmsplice sys_vmsplice sys_vmsplice
-310 common move_pages sys_move_pages compat_sys_move_pages
+310 common move_pages sys_move_pages sys_move_pages
311 common getcpu sys_getcpu sys_getcpu
312 common epoll_pwait sys_epoll_pwait compat_sys_epoll_pwait
313 common utimes sys_utimes sys_utimes_time32
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index b9e1c0e735b7..2e43ef6533ab 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -365,12 +365,12 @@
299 common unshare sys_unshare
300 common set_robust_list sys_set_robust_list compat_sys_set_robust_list
301 common get_robust_list sys_get_robust_list compat_sys_get_robust_list
-302 common migrate_pages sys_migrate_pages compat_sys_migrate_pages
-303 common mbind sys_mbind compat_sys_mbind
-304 common get_mempolicy sys_get_mempolicy compat_sys_get_mempolicy
-305 common set_mempolicy sys_set_mempolicy compat_sys_set_mempolicy
-306 common kexec_load sys_kexec_load compat_sys_kexec_load
-307 common move_pages sys_move_pages compat_sys_move_pages
+302 common migrate_pages sys_migrate_pages
+303 common mbind sys_mbind
+304 common get_mempolicy sys_get_mempolicy
+305 common set_mempolicy sys_set_mempolicy
+306 common kexec_load sys_kexec_load
+307 common move_pages sys_move_pages
308 common getcpu sys_getcpu
309 common epoll_pwait sys_epoll_pwait compat_sys_epoll_pwait
310 32 utimensat sys_utimensat_time32
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index f2fe0a33bcfd..921473281497 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -19,6 +19,8 @@
#define __x32_sys_vmsplice __x64_sys_vmsplice
#define __x32_sys_process_vm_readv __x64_sys_process_vm_readv
#define __x32_sys_process_vm_writev __x64_sys_process_vm_writev
+#define __x32_sys_kexec_load __x64_sys_kexec_load
+#define __x32_sys_move_pages __x64_sys_move_pages

#define __SYSCALL_64(nr, sym)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 28a1423ce32e..c28c4c51c946 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -286,7 +286,7 @@
272 i386 fadvise64_64 sys_ia32_fadvise64_64
273 i386 vserver
274 i386 mbind sys_mbind
-275 i386 get_mempolicy sys_get_mempolicy compat_sys_get_mempolicy
+275 i386 get_mempolicy sys_get_mempolicy
276 i386 set_mempolicy sys_set_mempolicy
277 i386 mq_open sys_mq_open compat_sys_mq_open
278 i386 mq_unlink sys_mq_unlink
@@ -294,7 +294,7 @@
280 i386 mq_timedreceive sys_mq_timedreceive_time32
281 i386 mq_notify sys_mq_notify compat_sys_mq_notify
282 i386 mq_getsetattr sys_mq_getsetattr compat_sys_mq_getsetattr
-283 i386 kexec_load sys_kexec_load compat_sys_kexec_load
+283 i386 kexec_load sys_kexec_load
284 i386 waitid sys_waitid compat_sys_waitid
# 285 sys_setaltroot
286 i386 add_key sys_add_key
@@ -328,7 +328,7 @@
314 i386 sync_file_range sys_ia32_sync_file_range
315 i386 tee sys_tee
316 i386 vmsplice sys_vmsplice
-317 i386 move_pages sys_move_pages compat_sys_move_pages
+317 i386 move_pages sys_move_pages
318 i386 getcpu sys_getcpu
319 i386 epoll_pwait sys_epoll_pwait
320 i386 utimensat sys_utimensat_time32
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index ecd551b08d05..fa05090a924f 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -391,12 +391,12 @@
525 x32 sigaltstack compat_sys_sigaltstack
526 x32 timer_create compat_sys_timer_create
527 x32 mq_notify compat_sys_mq_notify
-528 x32 kexec_load compat_sys_kexec_load
+528 x32 kexec_load sys_kexec_load
529 x32 waitid compat_sys_waitid
530 x32 set_robust_list compat_sys_set_robust_list
531 x32 get_robust_list compat_sys_get_robust_list
532 x32 vmsplice sys_vmsplice
-533 x32 move_pages compat_sys_move_pages
+533 x32 move_pages sys_move_pages
534 x32 preadv compat_sys_preadv64
535 x32 pwritev compat_sys_pwritev64
536 x32 rt_tgsigqueueinfo compat_sys_rt_tgsigqueueinfo
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 69d98fa66247..1560c9166677 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -689,12 +689,6 @@ asmlinkage long compat_sys_setitimer(int which,
struct old_itimerval32 __user *in,
struct old_itimerval32 __user *out);

-/* kernel/kexec.c */
-asmlinkage long compat_sys_kexec_load(compat_ulong_t entry,
- compat_ulong_t nr_segments,
- struct compat_kexec_segment __user *,
- compat_ulong_t flags);
-
/* kernel/posix-timers.c */
asmlinkage long compat_sys_timer_create(clockid_t which_clock,
struct compat_sigevent __user *timer_event_spec,
@@ -801,26 +795,6 @@ asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr
/* mm/fadvise.c: No generic prototype for fadvise64_64 */

/* mm/, CONFIG_MMU only */
-asmlinkage long compat_sys_mbind(compat_ulong_t start, compat_ulong_t len,
- compat_ulong_t mode,
- compat_ulong_t __user *nmask,
- compat_ulong_t maxnode, compat_ulong_t flags);
-asmlinkage long compat_sys_get_mempolicy(int __user *policy,
- compat_ulong_t __user *nmask,
- compat_ulong_t maxnode,
- compat_ulong_t addr,
- compat_ulong_t flags);
-asmlinkage long compat_sys_set_mempolicy(int mode, compat_ulong_t __user *nmask,
- compat_ulong_t maxnode);
-asmlinkage long compat_sys_migrate_pages(compat_pid_t pid,
- compat_ulong_t maxnode, const compat_ulong_t __user *old_nodes,
- const compat_ulong_t __user *new_nodes);
-asmlinkage long compat_sys_move_pages(pid_t pid, compat_ulong_t nr_pages,
- __u32 __user *pages,
- const int __user *nodes,
- int __user *status,
- int flags);
-
asmlinkage long compat_sys_rt_tgsigqueueinfo(compat_pid_t tgid,
compat_pid_t pid, int sig,
struct compat_siginfo __user *uinfo);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 6de5a7fc066b..c604584f98b1 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -342,7 +342,7 @@ __SC_COMP(__NR_setitimer, sys_setitimer, compat_sys_setitimer)

/* kernel/kexec.c */
#define __NR_kexec_load 104
-__SC_COMP(__NR_kexec_load, sys_kexec_load, compat_sys_kexec_load)
+__SYSCALL(__NR_kexec_load, sys_kexec_load)

/* kernel/module.c */
#define __NR_init_module 105
@@ -673,15 +673,15 @@ __SYSCALL(__NR_madvise, sys_madvise)
#define __NR_remap_file_pages 234
__SYSCALL(__NR_remap_file_pages, sys_remap_file_pages)
#define __NR_mbind 235
-__SC_COMP(__NR_mbind, sys_mbind, compat_sys_mbind)
+__SYSCALL(__NR_mbind, sys_mbind)
#define __NR_get_mempolicy 236
-__SC_COMP(__NR_get_mempolicy, sys_get_mempolicy, compat_sys_get_mempolicy)
+__SYSCALL(__NR_get_mempolicy, sys_get_mempolicy)
#define __NR_set_mempolicy 237
-__SC_COMP(__NR_set_mempolicy, sys_set_mempolicy, compat_sys_set_mempolicy)
+__SYSCALL(__NR_set_mempolicy, sys_set_mempolicy)
#define __NR_migrate_pages 238
-__SC_COMP(__NR_migrate_pages, sys_migrate_pages, compat_sys_migrate_pages)
+__SYSCALL(__NR_migrate_pages, sys_migrate_pages)
#define __NR_move_pages 239
-__SC_COMP(__NR_move_pages, sys_move_pages, compat_sys_move_pages)
+__SYSCALL(__NR_move_pages, sys_move_pages)
#endif

#define __NR_rt_tgsigqueueinfo 240
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 6618b1d9f00b..702e86bba6ad 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -258,9 +258,8 @@ static inline int kexec_load_check(unsigned long nr_segments,
return 0;
}

-static int kernel_kexec_load(unsigned long entry, unsigned long nr_segments,
- struct kexec_segment __user * segments,
- unsigned long flags)
+SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
+ struct kexec_segment __user *, segments, unsigned long, flags)
{
int result;

@@ -290,21 +289,3 @@ static int kernel_kexec_load(unsigned long entry, unsigned long nr_segments,

return result;
}
-
-SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
- struct kexec_segment __user *, segments, unsigned long, flags)
-{
- return kernel_kexec_load(entry, nr_segments, segments, flags);
-}
-
-#ifdef CONFIG_COMPAT
-COMPAT_SYSCALL_DEFINE4(kexec_load, compat_ulong_t, entry,
- compat_ulong_t, nr_segments,
- struct compat_kexec_segment __user *, segments,
- compat_ulong_t, flags)
-{
- return kernel_kexec_load(entry, nr_segments,
- (struct kexec_segment __user *)segments,
- flags);
-}
-#endif
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 0ea8128468c3..67a35449bd0d 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -291,15 +291,10 @@ COND_SYSCALL(madvise);
COND_SYSCALL(process_madvise);
COND_SYSCALL(remap_file_pages);
COND_SYSCALL(mbind);
-COND_SYSCALL_COMPAT(mbind);
COND_SYSCALL(get_mempolicy);
-COND_SYSCALL_COMPAT(get_mempolicy);
COND_SYSCALL(set_mempolicy);
-COND_SYSCALL_COMPAT(set_mempolicy);
COND_SYSCALL(migrate_pages);
-COND_SYSCALL_COMPAT(migrate_pages);
COND_SYSCALL(move_pages);
-COND_SYSCALL_COMPAT(move_pages);

COND_SYSCALL(perf_event_open);
COND_SYSCALL(accept4);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a3ecd5b922be..b04f8a9fe506 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1653,43 +1653,6 @@ SYSCALL_DEFINE5(get_mempolicy, int __user *, policy,
return kernel_get_mempolicy(policy, nmask, maxnode, addr, flags);
}

-#ifdef CONFIG_COMPAT
-
-COMPAT_SYSCALL_DEFINE5(get_mempolicy, int __user *, policy,
- compat_ulong_t __user *, nmask,
- compat_ulong_t, maxnode,
- compat_ulong_t, addr, compat_ulong_t, flags)
-{
- return kernel_get_mempolicy(policy, (unsigned long __user *)nmask,
- maxnode, addr, flags);
-}
-
-COMPAT_SYSCALL_DEFINE3(set_mempolicy, int, mode, compat_ulong_t __user *, nmask,
- compat_ulong_t, maxnode)
-{
- return kernel_set_mempolicy(mode, (unsigned long __user *)nmask, maxnode);
-}
-
-COMPAT_SYSCALL_DEFINE6(mbind, compat_ulong_t, start, compat_ulong_t, len,
- compat_ulong_t, mode, compat_ulong_t __user *, nmask,
- compat_ulong_t, maxnode, compat_ulong_t, flags)
-{
- return kernel_mbind(start, len, mode, (unsigned long __user *)nmask,
- maxnode, flags);
-}
-
-COMPAT_SYSCALL_DEFINE4(migrate_pages, compat_pid_t, pid,
- compat_ulong_t, maxnode,
- const compat_ulong_t __user *, old_nodes,
- const compat_ulong_t __user *, new_nodes)
-{
- return kernel_migrate_pages(pid, maxnode,
- (const unsigned long __user *)old_nodes,
- (const unsigned long __user *)new_nodes);
-}
-
-#endif /* CONFIG_COMPAT */
-
bool vma_migratable(struct vm_area_struct *vma)
{
if (vma->vm_flags & (VM_IO | VM_PFNMAP))
diff --git a/mm/migrate.c b/mm/migrate.c
index a68d07f19a1a..f1a11ac10144 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2002,19 +2002,6 @@ SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages,
return kernel_move_pages(pid, nr_pages, pages, nodes, status, flags);
}

-#ifdef CONFIG_COMPAT
-COMPAT_SYSCALL_DEFINE6(move_pages, pid_t, pid, compat_ulong_t, nr_pages,
- compat_uptr_t __user *, pages,
- const int __user *, nodes,
- int __user *, status,
- int, flags)
-{
- return kernel_move_pages(pid, nr_pages,
- (const void __user *__user *)pages,
- nodes, status, flags);
-}
-#endif /* CONFIG_COMPAT */
-
#ifdef CONFIG_NUMA_BALANCING
/*
* Returns true if this is a safe migration target node for misplaced NUMA
--
2.29.2


2021-05-19 11:49:49

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] kexec: simplify compat_sys_kexec_load

Arnd Bergmann <[email protected]> writes:

> From: Arnd Bergmann <[email protected]>
>
> The compat version of sys_kexec_load() uses compat_alloc_user_space to
> convert the user-provided arguments into the native format.
>
> Move the conversion into the regular implementation with
> an in_compat_syscall() check to simplify it and avoid the
> compat_alloc_user_space() call.
>
> compat_sys_kexec_load() now behaves the same as sys_kexec_load().

Is it possible to do this without in_compat_syscall(),
and casting pointers to a wrong type?

We open ourselves up to bugs whenever we lie to the type system.

Skimming through the code it looks like it should be possible
to not need the in_compat_syscall and the casts to the wrong
type by changing the order of the code a little bit.

Eric


> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> include/linux/kexec.h | 2 -
> kernel/kexec.c | 95 +++++++++++++++++++------------------------
> 2 files changed, 42 insertions(+), 55 deletions(-)
>
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 0c994ae37729..f61e310d7a85 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -88,14 +88,12 @@ struct kexec_segment {
> size_t memsz;
> };
>
> -#ifdef CONFIG_COMPAT
> struct compat_kexec_segment {
> compat_uptr_t buf;
> compat_size_t bufsz;
> compat_ulong_t mem; /* User space sees this as a (void *) ... */
> compat_size_t memsz;
> };
> -#endif
>
> #ifdef CONFIG_KEXEC_FILE
> struct purgatory_info {
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index c82c6c06f051..6618b1d9f00b 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -19,21 +19,46 @@
>
> #include "kexec_internal.h"
>
> +static int copy_user_compat_segment_list(struct kimage *image,
> + unsigned long nr_segments,
> + void __user *segments)
> +{
> + struct compat_kexec_segment __user *cs = segments;
> + struct compat_kexec_segment segment;
> + int i;
> +
> + for (i = 0; i < nr_segments; i++) {
> + if (copy_from_user(&segment, &cs[i], sizeof(segment)))
> + return -EFAULT;
> +
> + image->segment[i] = (struct kexec_segment) {
> + .buf = compat_ptr(segment.buf),
> + .bufsz = segment.bufsz,
> + .mem = segment.mem,
> + .memsz = segment.memsz,
> + };
> + }
> +
> + return 0;
> +}
> +
> +
> static int copy_user_segment_list(struct kimage *image,
> unsigned long nr_segments,
> struct kexec_segment __user *segments)
> {
> - int ret;
> size_t segment_bytes;
>
> /* Read in the segments */
> image->nr_segments = nr_segments;
> segment_bytes = nr_segments * sizeof(*segments);
> - ret = copy_from_user(image->segment, segments, segment_bytes);
> - if (ret)
> - ret = -EFAULT;
> + if (in_compat_syscall())
> + return copy_user_compat_segment_list(image, nr_segments, segments);
>
> - return ret;
> + if (copy_from_user(image->segment, segments, segment_bytes))
> + return -EFAULT;
> +
> + return 0;
> }
>
> static int kimage_alloc_init(struct kimage **rimage, unsigned long entry,
> @@ -233,8 +258,9 @@ static inline int kexec_load_check(unsigned long nr_segments,
> return 0;
> }
>
> -SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
> - struct kexec_segment __user *, segments, unsigned long, flags)
> +static int kernel_kexec_load(unsigned long entry, unsigned long nr_segments,
> + struct kexec_segment __user * segments,
> + unsigned long flags)
> {
> int result;
>
> @@ -265,57 +291,20 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
> return result;
> }
>
> +SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
> + struct kexec_segment __user *, segments, unsigned long, flags)
> +{
> + return kernel_kexec_load(entry, nr_segments, segments, flags);
> +}
> +
> #ifdef CONFIG_COMPAT
> COMPAT_SYSCALL_DEFINE4(kexec_load, compat_ulong_t, entry,
> compat_ulong_t, nr_segments,
> struct compat_kexec_segment __user *, segments,
> compat_ulong_t, flags)
> {
> - struct compat_kexec_segment in;
> - struct kexec_segment out, __user *ksegments;
> - unsigned long i, result;
> -
> - result = kexec_load_check(nr_segments, flags);
> - if (result)
> - return result;
> -
> - /* Don't allow clients that don't understand the native
> - * architecture to do anything.
> - */
> - if ((flags & KEXEC_ARCH_MASK) == KEXEC_ARCH_DEFAULT)
> - return -EINVAL;
> -
> - ksegments = compat_alloc_user_space(nr_segments * sizeof(out));
> - for (i = 0; i < nr_segments; i++) {
> - result = copy_from_user(&in, &segments[i], sizeof(in));
> - if (result)
> - return -EFAULT;
> -
> - out.buf = compat_ptr(in.buf);
> - out.bufsz = in.bufsz;
> - out.mem = in.mem;
> - out.memsz = in.memsz;
> -
> - result = copy_to_user(&ksegments[i], &out, sizeof(out));
> - if (result)
> - return -EFAULT;
> - }
> -
> - /* Because we write directly to the reserved memory
> - * region when loading crash kernels we need a mutex here to
> - * prevent multiple crash kernels from attempting to load
> - * simultaneously, and to prevent a crash kernel from loading
> - * over the top of a in use crash kernel.
> - *
> - * KISS: always take the mutex.
> - */
> - if (!mutex_trylock(&kexec_mutex))
> - return -EBUSY;
> -
> - result = do_kexec_load(entry, nr_segments, ksegments, flags);
> -
> - mutex_unlock(&kexec_mutex);
> -
> - return result;
> + return kernel_kexec_load(entry, nr_segments,
> + (struct kexec_segment __user *)segments,
> + flags);
> }
> #endif

2021-05-19 13:54:24

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] kexec: simplify compat_sys_kexec_load

On Mon, May 17, 2021 at 10:57:24PM -0500, Eric W. Biederman wrote:
> We open ourselves up to bugs whenever we lie to the type system.
>
> Skimming through the code it looks like it should be possible
> to not need the in_compat_syscall and the casts to the wrong
> type by changing the order of the code a little bit.

What kind of bug do you expect? We must only copy from user addresses
once anyway. I've never seen bugs due the use of in_compat_syscall,
but plenty due to cruft code trying to avoid it.

2021-05-19 13:54:24

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] kexec: simplify compat_sys_kexec_load

> + if (in_compat_syscall())
> + return copy_user_compat_segment_list(image, nr_segments, segments);

Annoying overly lone line here.

Otherwise:

Reviewed-by: Christoph Hellwig <[email protected]>


2021-05-19 15:22:47

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] mm: simplify compat_sys_move_pages

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2021-05-19 15:22:59

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] mm: simplify compat numa syscalls

Except for the various annoying overly long lines this looks fine to me:

Reviewed-by: Christoph Hellwig <[email protected]>

2021-05-19 15:23:33

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] compat: remove some compat entry points

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2021-05-19 16:25:38

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] kexec: simplify compat_sys_kexec_load

On Tue, May 18, 2021 at 8:38 AM Christoph Hellwig <[email protected]> wrote:
>
> > + if (in_compat_syscall())
> > + return copy_user_compat_segment_list(image, nr_segments, segments);
>
> Annoying overly lone line here.

Oops, I was sure I had fixed all of these when you pointed this out before.
I probably rebased a slightly older branch that did not have the fixes.

> Otherwise:
>
> Reviewed-by: Christoph Hellwig <[email protected]>

Thanks,

Arnd

2021-05-19 16:26:08

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] kexec: simplify compat_sys_kexec_load

On Tue, May 18, 2021 at 8:40 AM Christoph Hellwig <[email protected]> wrote:
>
> On Mon, May 17, 2021 at 10:57:24PM -0500, Eric W. Biederman wrote:
> > We open ourselves up to bugs whenever we lie to the type system.
> >
> > Skimming through the code it looks like it should be possible
> > to not need the in_compat_syscall and the casts to the wrong
> > type by changing the order of the code a little bit.

There are obviously other ways of doing the same. The reason for doing it
this specific way is so I can eliminate the compat entry point entirely in
patch 4/4.

> What kind of bug do you expect? We must only copy from user addresses
> once anyway. I've never seen bugs due the use of in_compat_syscall,
> but plenty due to cruft code trying to avoid it.

Right, I've used the same approach of passing a native-typed __user pointer
and converting it in a copy_from_user/copy_to_user wrapper in a number of
other places, as this tends to produce the most readable version by
concentrating the tricky logic in the one place that already has to be careful.

Most of the bugs I've seen with compat code are from duplicated code paths
that diverge over time when a bugfix for the native version is applied
incorrectly
or not at all to the compat version.

Arnd

2021-05-19 18:12:20

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] kexec: simplify compat_sys_kexec_load

Arnd Bergmann <[email protected]> writes:

> From: Arnd Bergmann <[email protected]>
>
> The compat version of sys_kexec_load() uses compat_alloc_user_space to
> convert the user-provided arguments into the native format.
>
> Move the conversion into the regular implementation with
> an in_compat_syscall() check to simplify it and avoid the
> compat_alloc_user_space() call.
>
> compat_sys_kexec_load() now behaves the same as sys_kexec_load().

Nacked-by: "Eric W. Biederman" <[email protected]>

The patch is wrong.

The logic between the compat entry point and the ordinary entry point
are by necessity different. This unifies the logic and breaks the compat
entry point.

The fundamentally necessity is that the code being loaded needs to know
which mode the kernel is running in so it can safely transition to the
new kernel.

Given that the two entry points fundamentally need different logic,
and that difference was not preserved and the goal of this patchset
was to unify that which fundamentally needs to be different. I don't
think this patch series makes any sense for kexec.

Eric




>
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> include/linux/kexec.h | 2 -
> kernel/kexec.c | 95 +++++++++++++++++++------------------------
> 2 files changed, 42 insertions(+), 55 deletions(-)
>
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 0c994ae37729..f61e310d7a85 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -88,14 +88,12 @@ struct kexec_segment {
> size_t memsz;
> };
>
> -#ifdef CONFIG_COMPAT
> struct compat_kexec_segment {
> compat_uptr_t buf;
> compat_size_t bufsz;
> compat_ulong_t mem; /* User space sees this as a (void *) ... */
> compat_size_t memsz;
> };
> -#endif
>
> #ifdef CONFIG_KEXEC_FILE
> struct purgatory_info {
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index c82c6c06f051..6618b1d9f00b 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -19,21 +19,46 @@
>
> #include "kexec_internal.h"
>
> +static int copy_user_compat_segment_list(struct kimage *image,
> + unsigned long nr_segments,
> + void __user *segments)
> +{
> + struct compat_kexec_segment __user *cs = segments;
> + struct compat_kexec_segment segment;
> + int i;
> +
> + for (i = 0; i < nr_segments; i++) {
> + if (copy_from_user(&segment, &cs[i], sizeof(segment)))
> + return -EFAULT;
> +
> + image->segment[i] = (struct kexec_segment) {
> + .buf = compat_ptr(segment.buf),
> + .bufsz = segment.bufsz,
> + .mem = segment.mem,
> + .memsz = segment.memsz,
> + };
> + }
> +
> + return 0;
> +}
> +
> +
> static int copy_user_segment_list(struct kimage *image,
> unsigned long nr_segments,
> struct kexec_segment __user *segments)
> {
> - int ret;
> size_t segment_bytes;
>
> /* Read in the segments */
> image->nr_segments = nr_segments;
> segment_bytes = nr_segments * sizeof(*segments);
> - ret = copy_from_user(image->segment, segments, segment_bytes);
> - if (ret)
> - ret = -EFAULT;
> + if (in_compat_syscall())
> + return copy_user_compat_segment_list(image, nr_segments, segments);
>
> - return ret;
> + if (copy_from_user(image->segment, segments, segment_bytes))
> + return -EFAULT;
> +
> + return 0;
> }
>
> static int kimage_alloc_init(struct kimage **rimage, unsigned long entry,
> @@ -233,8 +258,9 @@ static inline int kexec_load_check(unsigned long nr_segments,
> return 0;
> }
>
> -SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
> - struct kexec_segment __user *, segments, unsigned long, flags)
> +static int kernel_kexec_load(unsigned long entry, unsigned long nr_segments,
> + struct kexec_segment __user * segments,
> + unsigned long flags)
> {
> int result;
>
> @@ -265,57 +291,20 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
> return result;
> }
>
> +SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
> + struct kexec_segment __user *, segments, unsigned long, flags)
> +{
> + return kernel_kexec_load(entry, nr_segments, segments, flags);
> +}
> +
> #ifdef CONFIG_COMPAT
> COMPAT_SYSCALL_DEFINE4(kexec_load, compat_ulong_t, entry,
> compat_ulong_t, nr_segments,
> struct compat_kexec_segment __user *, segments,
> compat_ulong_t, flags)
> {
> - struct compat_kexec_segment in;
> - struct kexec_segment out, __user *ksegments;
> - unsigned long i, result;
> -
> - result = kexec_load_check(nr_segments, flags);
> - if (result)
> - return result;
> -
> - /* Don't allow clients that don't understand the native
> - * architecture to do anything.
> - */
> - if ((flags & KEXEC_ARCH_MASK) == KEXEC_ARCH_DEFAULT)
> - return -EINVAL;
> -
> - ksegments = compat_alloc_user_space(nr_segments * sizeof(out));
> - for (i = 0; i < nr_segments; i++) {
> - result = copy_from_user(&in, &segments[i], sizeof(in));
> - if (result)
> - return -EFAULT;
> -
> - out.buf = compat_ptr(in.buf);
> - out.bufsz = in.bufsz;
> - out.mem = in.mem;
> - out.memsz = in.memsz;
> -
> - result = copy_to_user(&ksegments[i], &out, sizeof(out));
> - if (result)
> - return -EFAULT;
> - }
> -
> - /* Because we write directly to the reserved memory
> - * region when loading crash kernels we need a mutex here to
> - * prevent multiple crash kernels from attempting to load
> - * simultaneously, and to prevent a crash kernel from loading
> - * over the top of a in use crash kernel.
> - *
> - * KISS: always take the mutex.
> - */
> - if (!mutex_trylock(&kexec_mutex))
> - return -EBUSY;
> -
> - result = do_kexec_load(entry, nr_segments, ksegments, flags);
> -
> - mutex_unlock(&kexec_mutex);
> -
> - return result;
> + return kernel_kexec_load(entry, nr_segments,
> + (struct kexec_segment __user *)segments,
> + flags);
> }
> #endif

2021-05-19 18:14:26

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] kexec: simplify compat_sys_kexec_load

On Tue, May 18, 2021 at 3:41 PM Eric W. Biederman <[email protected]> wrote:
>
> Arnd Bergmann <[email protected]> writes:
>
> > From: Arnd Bergmann <[email protected]>KEXEC_ARCH_DEFAULT
> >
> > The compat version of sys_kexec_load() uses compat_alloc_user_space to
> > convert the user-provided arguments into the native format.
> >
> > Move the conversion into the regular implementation with
> > an in_compat_syscall() check to simplify it and avoid the
> > compat_alloc_user_space() call.
> >
> > compat_sys_kexec_load() now behaves the same as sys_kexec_load().
>
> Nacked-by: "Eric W. Biederman" <[email protected]>
>KEXEC_ARCH_DEFAULT
> The patch is wrong.
>
> The logic between the compat entry point and the ordinary entry point
> are by necessity different. This unifies the logic and breaks the compat
> entry point.
>
> The fundamentally necessity is that the code being loaded needs to know
> which mode the kernel is running in so it can safely transition to the
> new kernel.
>
> Given that the two entry points fundamentally need different logic,
> and that difference was not preserved and the goal of this patchset
> was to unify that which fundamentally needs to be different. I don't
> think this patch series makes any sense for kexec.

Sorry, I'm not following that explanation. Can you clarify what different
modes of the kernel you are referring to here, and how my patch
changes this?

The only difference I can see between the native and compat entry
points is the layout of the kexec_segment structure, and that is
obviously preserved by my patch.

Arnd

2021-05-19 18:16:00

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] kexec: simplify compat_sys_kexec_load

On Tue, May 18, 2021 at 4:05 PM Arnd Bergmann <[email protected]> wrote:
>
> On Tue, May 18, 2021 at 3:41 PM Eric W. Biederman <[email protected]> wrote:
> >
> > Arnd Bergmann <[email protected]> writes:
> >
> > > From: Arnd Bergmann <[email protected]>KEXEC_ARCH_DEFAULT
> > >
> > > The compat version of sys_kexec_load() uses compat_alloc_user_space to
> > > convert the user-provided arguments into the native format.
> > >
> > > Move the conversion into the regular implementation with
> > > an in_compat_syscall() check to simplify it and avoid the
> > > compat_alloc_user_space() call.
> > >
> > > compat_sys_kexec_load() now behaves the same as sys_kexec_load().
> >
> > Nacked-by: "Eric W. Biederman" <[email protected]>
> >KEXEC_ARCH_DEFAULT
> > The patch is wrong.
> >
> > The logic between the compat entry point and the ordinary entry point
> > are by necessity different. This unifies the logic and breaks the compat
> > entry point.
> >
> > The fundamentally necessity is that the code being loaded needs to know
> > which mode the kernel is running in so it can safely transition to the
> > new kernel.
> >
> > Given that the two entry points fundamentally need different logic,
> > and that difference was not preserved and the goal of this patchset
> > was to unify that which fundamentally needs to be different. I don't
> > think this patch series makes any sense for kexec.
>
> Sorry, I'm not following that explanation. Can you clarify what different
> modes of the kernel you are referring to here, and how my patch
> changes this?

I think I figured it out now myself after comparing the two functions:

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -269,7 +269,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry,
unsigned long, nr_segments,

/* Verify we are on the appropriate architecture */
if (((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH) &&
- ((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH_DEFAULT))
+ (in_compat_syscall() ||
+ ((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH_DEFAULT)))
return -EINVAL;

/* Because we write directly to the reserved memory

Not sure if that's the best way of doing it, but it looks like folding this
in restores the current behavior.

Arnd

2021-05-19 18:24:45

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] kexec: simplify compat_sys_kexec_load

Arnd Bergmann <[email protected]> writes:

> On Tue, May 18, 2021 at 4:05 PM Arnd Bergmann <[email protected]> wrote:
>>
>> On Tue, May 18, 2021 at 3:41 PM Eric W. Biederman <[email protected]> wrote:
>> >
>> > Arnd Bergmann <[email protected]> writes:
>> >
>> > > From: Arnd Bergmann <[email protected]>KEXEC_ARCH_DEFAULT
>> > >
>> > > The compat version of sys_kexec_load() uses compat_alloc_user_space to
>> > > convert the user-provided arguments into the native format.
>> > >
>> > > Move the conversion into the regular implementation with
>> > > an in_compat_syscall() check to simplify it and avoid the
>> > > compat_alloc_user_space() call.
>> > >
>> > > compat_sys_kexec_load() now behaves the same as sys_kexec_load().
>> >
>> > Nacked-by: "Eric W. Biederman" <[email protected]>
>> >KEXEC_ARCH_DEFAULT
>> > The patch is wrong.
>> >
>> > The logic between the compat entry point and the ordinary entry point
>> > are by necessity different. This unifies the logic and breaks the compat
>> > entry point.
>> >
>> > The fundamentally necessity is that the code being loaded needs to know
>> > which mode the kernel is running in so it can safely transition to the
>> > new kernel.
>> >
>> > Given that the two entry points fundamentally need different logic,
>> > and that difference was not preserved and the goal of this patchset
>> > was to unify that which fundamentally needs to be different. I don't
>> > think this patch series makes any sense for kexec.
>>
>> Sorry, I'm not following that explanation. Can you clarify what different
>> modes of the kernel you are referring to here, and how my patch
>> changes this?
>
> I think I figured it out now myself after comparing the two functions:
>
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -269,7 +269,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry,
> unsigned long, nr_segments,
>
> /* Verify we are on the appropriate architecture */
> if (((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH) &&
> - ((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH_DEFAULT))
> + (in_compat_syscall() ||
> + ((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH_DEFAULT)))
> return -EINVAL;
>
> /* Because we write directly to the reserved memory
>
> Not sure if that's the best way of doing it, but it looks like folding this
> in restores the current behavior.

Yes. That is pretty much all there is.

I personally can't stand the sight of in_compat_syscall() doubly so when
you have to lie to the type system with casts. The cognitive dissonance
I experience is extreme.

I will be happy to help you find another way to get rid of
compat_alloc_user, but not that way.


There is a whole mess in there that was introduced when someone added
do_kexec_load while I was napping in 2017 that makes the system calls an
absolute mess. It all needs to be cleaned up.

Eric

2021-05-19 18:35:02

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v3 2/4] mm: simplify compat_sys_move_pages

From: Arnd Bergmann
> Sent: 17 May 2021 21:34
>
> The compat move_pages() implementation uses compat_alloc_user_space()
> for converting the pointer array. Moving the compat handling into
> the function itself is a bit simpler and lets us avoid the
> compat_alloc_user_space() call.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> mm/migrate.c | 45 ++++++++++++++++++++++++++++++---------------
> 1 file changed, 30 insertions(+), 15 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index b234c3f3acb7..a68d07f19a1a 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1855,6 +1855,23 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
> mmap_read_unlock(mm);
> }
>
> +static int put_compat_pages_array(const void __user *chunk_pages[],
> + const void __user * __user *pages,
> + unsigned long chunk_nr)
> +{

Should that be get_compat_pages_array() ?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2021-05-19 18:35:46

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v3 1/4] kexec: simplify compat_sys_kexec_load

From: Arnd Bergmann
> Sent: 17 May 2021 21:34
>
> The compat version of sys_kexec_load() uses compat_alloc_user_space to
> convert the user-provided arguments into the native format.
>
> Move the conversion into the regular implementation with
> an in_compat_syscall() check to simplify it and avoid the
> compat_alloc_user_space() call.
>
> compat_sys_kexec_load() now behaves the same as sys_kexec_load().
>
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> include/linux/kexec.h | 2 -
> kernel/kexec.c | 95 +++++++++++++++++++------------------------
> 2 files changed, 42 insertions(+), 55 deletions(-)
>
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 0c994ae37729..f61e310d7a85 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -88,14 +88,12 @@ struct kexec_segment {
> size_t memsz;
> };
>
> -#ifdef CONFIG_COMPAT
> struct compat_kexec_segment {
> compat_uptr_t buf;
> compat_size_t bufsz;
> compat_ulong_t mem; /* User space sees this as a (void *) ... */
> compat_size_t memsz;
> };
> -#endif
>
> #ifdef CONFIG_KEXEC_FILE
> struct purgatory_info {
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index c82c6c06f051..6618b1d9f00b 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -19,21 +19,46 @@
>
> #include "kexec_internal.h"
>
> +static int copy_user_compat_segment_list(struct kimage *image,
> + unsigned long nr_segments,
> + void __user *segments)
> +{
> + struct compat_kexec_segment __user *cs = segments;
> + struct compat_kexec_segment segment;
> + int i;
> +
> + for (i = 0; i < nr_segments; i++) {
> + if (copy_from_user(&segment, &cs[i], sizeof(segment)))
> + return -EFAULT;

How many segments are there?
The multiple copy_from_user() will be slow.

> +
> + image->segment[i] = (struct kexec_segment) {
> + .buf = compat_ptr(segment.buf),
> + .bufsz = segment.bufsz,
> + .mem = segment.mem,
> + .memsz = segment.memsz,
> + };
> + }
> +
> + return 0;
> +}
> +
> +
> static int copy_user_segment_list(struct kimage *image,
> unsigned long nr_segments,
> struct kexec_segment __user *segments)
> {
> - int ret;
> size_t segment_bytes;
>
> /* Read in the segments */
> image->nr_segments = nr_segments;
> segment_bytes = nr_segments * sizeof(*segments);

Should there be a bound check on nr_segments?
I can't see one in the code in this patch.

> - ret = copy_from_user(image->segment, segments, segment_bytes);
> - if (ret)
> - ret = -EFAULT;
> + if (in_compat_syscall())
> + return copy_user_compat_segment_list(image, nr_segments, segments);
>
> - return ret;
> + if (copy_from_user(image->segment, segments, segment_bytes))
> + return -EFAULT;
> +
> + return 0;

An alternate sequence (which Eric will like even less!) is to
do a single copy_from_user() for the entire compat size array
into the 'normal' buffer and then do a reverse order conversion
of each array entry from 'compat' to '64 bit'.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2021-05-19 18:43:47

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] kexec: simplify compat_sys_kexec_load


Arnd Bergmann <[email protected]> writes:

> On Tue, May 18, 2021 at 4:05 PM Arnd Bergmann <[email protected]> wrote:
>>
>> On Tue, May 18, 2021 at 3:41 PM Eric W. Biederman <[email protected]> wrote:
>> >
>> > Arnd Bergmann <[email protected]> writes:
>> >
>> > > From: Arnd Bergmann <[email protected]>KEXEC_ARCH_DEFAULT
>> > >
>> > > The compat version of sys_kexec_load() uses compat_alloc_user_space to
>> > > convert the user-provided arguments into the native format.
>> > >
>> > > Move the conversion into the regular implementation with
>> > > an in_compat_syscall() check to simplify it and avoid the
>> > > compat_alloc_user_space() call.
>> > >
>> > > compat_sys_kexec_load() now behaves the same as sys_kexec_load().
>> >
>> > Nacked-by: "Eric W. Biederman" <[email protected]>
>> >KEXEC_ARCH_DEFAULT
>> > The patch is wrong.
>> >
>> > The logic between the compat entry point and the ordinary entry point
>> > are by necessity different. This unifies the logic and breaks the compat
>> > entry point.
>> >
>> > The fundamentally necessity is that the code being loaded needs to know
>> > which mode the kernel is running in so it can safely transition to the
>> > new kernel.
>> >
>> > Given that the two entry points fundamentally need different logic,
>> > and that difference was not preserved and the goal of this patchset
>> > was to unify that which fundamentally needs to be different. I don't
>> > think this patch series makes any sense for kexec.
>>
>> Sorry, I'm not following that explanation. Can you clarify what different
>> modes of the kernel you are referring to here, and how my patch
>> changes this?

I think something like the untested diff below is enough to get rid of
compat_alloc_user cleanly.

Certainly it should be enough to give any idea what I am thinking.

diff --git a/kernel/kexec.c b/kernel/kexec.c
index c82c6c06f051..ce69a5d68023 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -19,26 +19,21 @@

#include "kexec_internal.h"

-static int copy_user_segment_list(struct kimage *image,
+static void copy_user_segment_list(struct kimage *image,
unsigned long nr_segments,
- struct kexec_segment __user *segments)
+ struct kexec_segment *segments)
{
- int ret;
size_t segment_bytes;

/* Read in the segments */
image->nr_segments = nr_segments;
segment_bytes = nr_segments * sizeof(*segments);
- ret = copy_from_user(image->segment, segments, segment_bytes);
- if (ret)
- ret = -EFAULT;
-
- return ret;
+ memcpy(image->segment, segments, segment_bytes);
}

static int kimage_alloc_init(struct kimage **rimage, unsigned long entry,
unsigned long nr_segments,
- struct kexec_segment __user *segments,
+ struct kexec_segment *segments,
unsigned long flags)
{
int ret;
@@ -59,9 +54,7 @@ static int kimage_alloc_init(struct kimage **rimage, unsigned long entry,

image->start = entry;

- ret = copy_user_segment_list(image, nr_segments, segments);
- if (ret)
- goto out_free_image;
+ copy_user_segment_list(image, nr_segments, segments);

if (kexec_on_panic) {
/* Enable special crash kernel control page alloc policy. */
@@ -103,8 +96,8 @@ static int kimage_alloc_init(struct kimage **rimage, unsigned long entry,
return ret;
}

-static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
- struct kexec_segment __user *segments, unsigned long flags)
+static int do_kexec_load_locked(unsigned long entry, unsigned long nr_segments,
+ struct kexec_segment *segments, unsigned long flags)
{
struct kimage **dest_image, *image;
unsigned long i;
@@ -174,6 +167,27 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
return ret;
}

+static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
+ struct kexec_segment *segments, unsigned long flags)
+{
+ int result;
+
+ /* Because we write directly to the reserved memory
+ * region when loading crash kernels we need a mutex here to
+ * prevent multiple crash kernels from attempting to load
+ * simultaneously, and to prevent a crash kernel from loading
+ * over the top of a in use crash kernel.
+ *
+ * KISS: always take the mutex.
+ */
+ if (!mutex_trylock(&kexec_mutex))
+ return -EBUSY;
+
+ result = do_kexec_load_locked(entry, nr_segments, segments, flags);
+ mutex_unlock(&kexec_mutex);
+ return result;
+}
+
/*
* Exec Kernel system call: for obvious reasons only root may call it.
*
@@ -224,6 +238,11 @@ static inline int kexec_load_check(unsigned long nr_segments,
if ((flags & KEXEC_FLAGS) != (flags & ~KEXEC_ARCH_MASK))
return -EINVAL;

+ /* Verify we are on the appropriate architecture */
+ if (((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH) &&
+ ((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH_DEFAULT))
+ return -EINVAL;
+
/* Put an artificial cap on the number
* of segments passed to kexec_load.
*/
@@ -236,33 +255,29 @@ static inline int kexec_load_check(unsigned long nr_segments,
SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
struct kexec_segment __user *, segments, unsigned long, flags)
{
- int result;
+ struct kexec_segment *ksegments;
+ unsigned long bytes, result;

result = kexec_load_check(nr_segments, flags);
if (result)
return result;

- /* Verify we are on the appropriate architecture */
- if (((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH) &&
- ((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH_DEFAULT))
- return -EINVAL;
-
- /* Because we write directly to the reserved memory
- * region when loading crash kernels we need a mutex here to
- * prevent multiple crash kernels from attempting to load
- * simultaneously, and to prevent a crash kernel from loading
- * over the top of a in use crash kernel.
- *
- * KISS: always take the mutex.
- */
- if (!mutex_trylock(&kexec_mutex))
- return -EBUSY;
+ bytes = nr_segments * sizeof(ksegments[0]);
+ ksegments = kmalloc(bytes, GFP_KERNEL);
+ if (!ksegments)
+ return -ENOMEM;

+ result = copy_from_user(ksegments, segments, bytes);
+ if (result)
+ goto fail;
+
result = do_kexec_load(entry, nr_segments, segments, flags);
+ kfree(ksegments);

- mutex_unlock(&kexec_mutex);
-
+fail:
+ kfree(ksegments);
return result;
+
}

#ifdef CONFIG_COMPAT
@@ -272,9 +287,9 @@ COMPAT_SYSCALL_DEFINE4(kexec_load, compat_ulong_t, entry,
compat_ulong_t, flags)
{
struct compat_kexec_segment in;
- struct kexec_segment out, __user *ksegments;
- unsigned long i, result;
-
+ struct kexec_segment *ksegments;
+ unsigned long bytes, i, result;
+
result = kexec_load_check(nr_segments, flags);
if (result)
return result;
@@ -285,37 +300,26 @@ COMPAT_SYSCALL_DEFINE4(kexec_load, compat_ulong_t, entry,
if ((flags & KEXEC_ARCH_MASK) == KEXEC_ARCH_DEFAULT)
return -EINVAL;

- ksegments = compat_alloc_user_space(nr_segments * sizeof(out));
+ bytes = nr_segments * sizeof(ksegments[0]);
+ ksegments = kmalloc(bytes, GFP_KERNEL);
+ if (!ksegments)
+ return -ENOMEM;
+
for (i = 0; i < nr_segments; i++) {
result = copy_from_user(&in, &segments[i], sizeof(in));
if (result)
- return -EFAULT;
-
- out.buf = compat_ptr(in.buf);
- out.bufsz = in.bufsz;
- out.mem = in.mem;
- out.memsz = in.memsz;
+ goto fail;

- result = copy_to_user(&ksegments[i], &out, sizeof(out));
- if (result)
- return -EFAULT;
+ ksegments[i].buf = compat_ptr(in.buf);
+ ksegments[i].bufsz = in.bufsz;
+ ksegments[i].mem = in.mem;
+ ksegments[i].memsz = in.memsz;
}

- /* Because we write directly to the reserved memory
- * region when loading crash kernels we need a mutex here to
- * prevent multiple crash kernels from attempting to load
- * simultaneously, and to prevent a crash kernel from loading
- * over the top of a in use crash kernel.
- *
- * KISS: always take the mutex.
- */
- if (!mutex_trylock(&kexec_mutex))
- return -EBUSY;
-
result = do_kexec_load(entry, nr_segments, ksegments, flags);

- mutex_unlock(&kexec_mutex);
-
+fail:
+ kfree(ksegments);
return result;
}
#endif

2021-05-19 19:26:35

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] kexec: simplify compat_sys_kexec_load

On Wed, May 19, 2021 at 12:45 AM Eric W. Biederman
<[email protected]> wrote:
> Arnd Bergmann <[email protected]> writes:
> > On Tue, May 18, 2021 at 4:05 PM Arnd Bergmann <[email protected]> wrote:
> >> On Tue, May 18, 2021 at 3:41 PM Eric W. Biederman <[email protected]> wrote:
>
> I think something like the untested diff below is enough to get rid of
> compat_alloc_user cleanly.
>
> Certainly it should be enough to give any idea what I am thinking.

Yes, that looks sufficient to me. I had started a slightly different
approach by trying
to move the kimage_alloc_init() into the top-level entry points to
avoid the extra
kmalloc, but that got rather complicated, and your patch is simpler overall.

The allocation could still be combined with kexec_load_check() into a new
function to reduce the number of duplicate lines, but if you think the current
version is ok, then I'll leave this part as it is.

I've fixed a duplicate kfree() and some whitespace damage, and rebased the
rest of my series on top of this to give it a spin on the build test boxes.
I'll send a v4 series once I have made sure there are no build-time regressions.

Can I add your Signed-off-by for the patch?
Is there a set of tests I should run on it?

Arnd

2021-05-19 20:34:53

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] compat: remove some compat entry points

On Mon, May 17 2021 at 22:33, Arnd Bergmann wrote:
> From: Arnd Bergmann <[email protected]>
>
> These are all handled correctly when calling the native
> system call entry point, so remove the special cases.
> arch/x86/entry/syscall_x32.c | 2 ++
> arch/x86/entry/syscalls/syscall_32.tbl | 6 ++--
> arch/x86/entry/syscalls/syscall_64.tbl | 4 +--

That conflicts with

https://lore.kernel.org/lkml/[email protected]/

which I'm picking up. We have more changes in that area coming in.

Thanks,

tglx



2021-05-19 21:15:31

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] mm: simplify compat_sys_move_pages

On Tue, May 18, 2021 at 10:49 PM David Laight <[email protected]> wrote:
> >
> > +static int put_compat_pages_array(const void __user *chunk_pages[],
> > + const void __user * __user *pages,
> > + unsigned long chunk_nr)
> > +{
>
> Should that be get_compat_pages_array() ?

Nice catch, thanks!

Fixed now.

Arnd

2021-05-19 21:18:27

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] compat: remove some compat entry points

On Wed, May 19, 2021 at 10:33 PM Thomas Gleixner <[email protected]> wrote:
>
> On Mon, May 17 2021 at 22:33, Arnd Bergmann wrote:
> > From: Arnd Bergmann <[email protected]>
> >
> > These are all handled correctly when calling the native
> > system call entry point, so remove the special cases.
> > arch/x86/entry/syscall_x32.c | 2 ++
> > arch/x86/entry/syscalls/syscall_32.tbl | 6 ++--
> > arch/x86/entry/syscalls/syscall_64.tbl | 4 +--
>
> That conflicts with
>
> https://lore.kernel.org/lkml/[email protected]/
>
> which I'm picking up. We have more changes in that area coming in.

Ok, thanks for the heads-up. I'll try a merge or rebase to see how this can be
handled. If both the drivers/net and drivers/media get picked up for 5.14, maybe
the rebased patches can go through -mm on top, along with the final
removal of compat_alloc_user_space()/copy_in_user(). If not, I suppose these
four patches can also wait another release.

Arnd

2021-05-20 14:42:14

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] compat: remove some compat entry points

On Wed, May 19, 2021 at 11:00 PM Arnd Bergmann <[email protected]> wrote:
>
> On Wed, May 19, 2021 at 10:33 PM Thomas Gleixner <[email protected]> wrote:
> >
> > On Mon, May 17 2021 at 22:33, Arnd Bergmann wrote:
> > > From: Arnd Bergmann <[email protected]>
> > >
> > > These are all handled correctly when calling the native
> > > system call entry point, so remove the special cases.
> > > arch/x86/entry/syscall_x32.c | 2 ++
> > > arch/x86/entry/syscalls/syscall_32.tbl | 6 ++--
> > > arch/x86/entry/syscalls/syscall_64.tbl | 4 +--
> >
> > That conflicts with
> >
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > which I'm picking up. We have more changes in that area coming in.
>
> Ok, thanks for the heads-up. I'll try a merge or rebase to see how this can be
> handled. If both the drivers/net and drivers/media get picked up for 5.14, maybe
> the rebased patches can go through -mm on top, along with the final
> removal of compat_alloc_user_space()/copy_in_user(). If not, I suppose these
> four patches can also wait another release.

On second thought, this patch 4/4 is not even required here to kill off
compat_alloc_user_space, so the easiest alternative might be to merge the
other patches first, and then do this part together with the removal of
the unused functions in a follow-up series.

Arnd