2016-12-30 16:14:56

by Dmitry Safonov

[permalink] [raw]
Subject: [RFC 0/4] x86: keep TASK_SIZE in sync with mm->task_size

At this moment, we have following task_size-related things:
- TASK_SIZE_OF() macro, which is unused;
- current->mm->task_size which is used in half and TASK_SIZE() macro
which is used in the other half of code
- TIF_ADDR32, which is used to detect 32-bit address space and is
x86-specific, where some other arches misused TIF_32BIT
- personality ADDR_LIMIT_32BIT, which is used on arm/alpha
- ADDR_LIMIT_3GB, which is x86-specific and can be used to change
running task's TASK_SIZE 3GB <-> 4GB

This patches set removes unused definition of TASK_SIZE_OF (1),
defines TASK_SIZE macro as current->mm->task_size (3).
I would suggest define TASK_SIZE this way in generic version,
but currently I test it only on x86.
It also frees thread info flag (2) and adds arch_prctl()
on x86_64 to get/set current virtual address space size - as
it's needed by now only for CRIU, hide it under CHECKPOINT_RESTORE
config.
Hope those patches will help to clean task_size-related code
at least a bit (and helps me to restore vaddr limits).

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: [email protected]

Dmitry Safonov (4):
mm: remove unused TASK_SIZE_OF()
x86/thread_info: kill TIF_ADDR32 in favour of ADDR_LIMIT_32BIT
x86/mm: define TASK_SIZE as current->mm->task_size
x86/arch_prctl: add ARCH_{GET,SET}_TASK_SIZE

arch/arm64/include/asm/memory.h | 2 --
arch/mips/include/asm/processor.h | 3 ---
arch/parisc/include/asm/processor.h | 3 +--
arch/powerpc/include/asm/processor.h | 3 +--
arch/s390/include/asm/processor.h | 3 +--
arch/sparc/include/asm/processor_64.h | 3 ---
arch/x86/include/asm/elf.h | 7 +++++--
arch/x86/include/asm/processor.h | 19 +++++++++----------
arch/x86/include/asm/thread_info.h | 4 +---
arch/x86/include/uapi/asm/prctl.h | 3 +++
arch/x86/kernel/process_64.c | 17 +++++++++++++++--
arch/x86/kernel/sys_x86_64.c | 4 ++--
arch/x86/um/asm/segment.h | 2 +-
arch/x86/xen/mmu.c | 4 ++--
fs/exec.c | 17 +++++++++++------
include/linux/sched.h | 4 ----
16 files changed, 52 insertions(+), 46 deletions(-)

--
2.11.0


2016-12-30 16:15:00

by Dmitry Safonov

[permalink] [raw]
Subject: [RFC 4/4] x86/arch_prctl: add ARCH_{GET,SET}_TASK_SIZE

Add arch_prctl getters/setters for size of virtual address space of task.
This adds ability to change task's virtual address space limit.
I need this for correctly restore virtual address space limits in CRIU.
Currently, on x86 there are three task sizes: 3GB for some old 32 bit java
apps, 4Gb for ordinary 32-bit compatible apps and 47-bits for native
x86_64 processes.
32-bit applications are restored by CRIU with the help of 64-bit clone()-d
child, and on restore we need to place correct address space limitations
back - otherwise 32-bit restored application may mmap() address over
4Gb space and as this address will not fit into 4-byte pointer, it
will silently reuse/corrupt the pointer that has the same lower 4-bytes.

Signed-off-by: Dmitry Safonov <[email protected]>
---
arch/x86/include/uapi/asm/prctl.h | 3 +++
arch/x86/kernel/process_64.c | 13 +++++++++++++
2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index 835aa51c7f6e..122a8ce5b051 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -6,6 +6,9 @@
#define ARCH_GET_FS 0x1003
#define ARCH_GET_GS 0x1004

+#define ARCH_SET_TASK_SIZE 0x1005
+#define ARCH_GET_TASK_SIZE 0x1006
+
#define ARCH_MAP_VDSO_X32 0x2001
#define ARCH_MAP_VDSO_32 0x2002
#define ARCH_MAP_VDSO_64 0x2003
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 8ce30d40bb33..ed6a792f7932 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -599,6 +599,19 @@ long do_arch_prctl(struct task_struct *task, int code, unsigned long addr)
}

#ifdef CONFIG_CHECKPOINT_RESTORE
+ case ARCH_SET_TASK_SIZE:
+ if (addr >= TASK_SIZE_MAX)
+ return -EINVAL;
+ if (find_vma(current->mm, addr) != 0)
+ return -ENOMEM;
+ current->mm->task_size = addr;
+ break;
+
+ case ARCH_GET_TASK_SIZE:
+ ret = put_user(current->mm->task_size,
+ (unsigned long __user *)addr);
+ break;
+
# ifdef CONFIG_X86_X32_ABI
case ARCH_MAP_VDSO_X32:
return prctl_map_vdso(&vdso_image_x32, addr);
--
2.11.0

2016-12-30 19:34:38

by Dmitry Safonov

[permalink] [raw]
Subject: [RFC 2/4] x86/thread_info: kill TIF_ADDR32 in favour of ADDR_LIMIT_32BIT

This thread flag is completely x86-specific, consolidate it with
ADDR_LIMIT_32BIT personality which is defined but not used on x86.
It will free one of thread flags and consolidate personality with
other arches.
After this commit ADDR_LIMIT_32BIT is set by the kernel automatically
in COMPAT_SET_PERSONALITY() for 32-bit ELF files and for 32-bit a.out.
It's cleared in SET_PERSONALITY() for 64-bit ELFs.

Signed-off-by: Dmitry Safonov <[email protected]>
---
arch/x86/include/asm/elf.h | 7 +++++--
arch/x86/include/asm/processor.h | 2 +-
arch/x86/include/asm/thread_info.h | 4 +---
arch/x86/kernel/process_64.c | 4 ++--
arch/x86/kernel/sys_x86_64.c | 4 ++--
5 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index e7f155c3045e..02f39b363e61 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -5,6 +5,8 @@
* ELF register definitions..
*/
#include <linux/thread_info.h>
+#include <linux/personality.h>
+#include <linux/sched.h>

#include <asm/ptrace.h>
#include <asm/user.h>
@@ -295,7 +297,8 @@ do { \
#else /* CONFIG_X86_32 */

/* 1GB for 64bit, 8MB for 32bit */
-#define STACK_RND_MASK (test_thread_flag(TIF_ADDR32) ? 0x7ff : 0x3fffff)
+#define STACK_RND_MASK \
+ ((current->personality & ADDR_LIMIT_32BIT) ? 0x7ff : 0x3fffff)

#define ARCH_DLINFO \
do { \
@@ -346,7 +349,7 @@ static inline int mmap_is_ia32(void)
{
return IS_ENABLED(CONFIG_X86_32) ||
(IS_ENABLED(CONFIG_COMPAT) &&
- test_thread_flag(TIF_ADDR32));
+ (current->personality & ADDR_LIMIT_32BIT));
}

/* Do not change the values. See get_align_mask() */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 090a860b792a..dbc7dec5fa84 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -817,7 +817,7 @@ static inline void spin_lock_prefetch(const void *x)
#define IA32_PAGE_OFFSET ((current->personality & ADDR_LIMIT_3GB) ? \
0xc0000000 : 0xFFFFe000)

-#define TASK_SIZE (test_thread_flag(TIF_ADDR32) ? \
+#define TASK_SIZE (current->personality & ADDR_LIMIT_32BIT ? \
IA32_PAGE_OFFSET : TASK_SIZE_MAX)

#define STACK_TOP TASK_SIZE
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index ad6f5eb07a95..6a5763e6ca1b 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -97,8 +97,7 @@ struct thread_info {
#define TIF_BLOCKSTEP 25 /* set when we want DEBUGCTLMSR_BTF */
#define TIF_LAZY_MMU_UPDATES 27 /* task is updating the mmu lazily */
#define TIF_SYSCALL_TRACEPOINT 28 /* syscall tracepoint instrumentation */
-#define TIF_ADDR32 29 /* 32-bit address space on 64 bits */
-#define TIF_X32 30 /* 32-bit native x86-64 binary */
+#define TIF_X32 29 /* 32-bit native x86-64 binary */

#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
@@ -119,7 +118,6 @@ struct thread_info {
#define _TIF_BLOCKSTEP (1 << TIF_BLOCKSTEP)
#define _TIF_LAZY_MMU_UPDATES (1 << TIF_LAZY_MMU_UPDATES)
#define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT)
-#define _TIF_ADDR32 (1 << TIF_ADDR32)
#define _TIF_X32 (1 << TIF_X32)

/*
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index a61e141b6891..8ce30d40bb33 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -489,8 +489,8 @@ void set_personality_64bit(void)

/* Make sure to be in 64bit mode */
clear_thread_flag(TIF_IA32);
- clear_thread_flag(TIF_ADDR32);
clear_thread_flag(TIF_X32);
+ current->personality &= ~ADDR_LIMIT_32BIT;

/* Ensure the corresponding mm is not marked. */
if (current->mm)
@@ -508,7 +508,7 @@ void set_personality_ia32(bool x32)
/* inherit personality from parent */

/* Make sure to be in 32bit mode */
- set_thread_flag(TIF_ADDR32);
+ current->personality |= ADDR_LIMIT_32BIT;

/* Mark the associated mm as containing 32-bit tasks. */
if (x32) {
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index a55ed63b9f91..e836a7318f1f 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -100,7 +100,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
static void find_start_end(unsigned long flags, unsigned long *begin,
unsigned long *end)
{
- if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT)) {
+ if (!(current->personality & ADDR_LIMIT_32BIT) && (flags & MAP_32BIT)) {
/* This is usually used needed to map code in small
model, so it needs to be in the first 31bit. Limit
it to that. This means we need to move the
@@ -175,7 +175,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
return addr;

/* for MAP_32BIT mappings we force the legacy mmap base */
- if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT))
+ if (!(current->personality & ADDR_LIMIT_32BIT) && (flags & MAP_32BIT))
goto bottomup;

/* requesting a specific address */
--
2.11.0

2016-12-30 23:33:55

by Dmitry Safonov

[permalink] [raw]
Subject: [RFC 3/4] x86/mm: define TASK_SIZE as current->mm->task_size

Keep task's virtual address space size as mm_struct field which
exists for a long time - it's initialized in setup_new_exec()
depending on the new task's personality.
This way TASK_SIZE will always be the same as current->mm->task_size.
Previously, there could be an issue about different values of
TASK_SIZE and current->mm->task_size: e.g, a 32-bit process can unset
ADDR_LIMIT_3GB personality (with personality syscall) and
so TASK_SIZE will be 4Gb, which is larger than mm->task_size = 3Gb.
As TASK_SIZE *and* current->mm->task_size are used both in code
frequently, this difference creates a subtle situations, for example:
one can mmap addresses > 3Gb, but they will be hidden in
/proc/pid/pagemap as it checks mm->task_size.
I've moved initialization of mm->task_size earlier in setup_new_exec()
as arch_pick_mmap_layout() initializes mmap_legacy_base with
TASK_UNMAPPED_BASE, which depends on TASK_SIZE.

Signed-off-by: Dmitry Safonov <[email protected]>
---
arch/x86/include/asm/processor.h | 17 +++++++++--------
arch/x86/um/asm/segment.h | 2 +-
arch/x86/xen/mmu.c | 4 ++--
fs/exec.c | 17 +++++++++++------
4 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index dbc7dec5fa84..47ebde614f06 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -768,10 +768,8 @@ static inline void spin_lock_prefetch(const void *x)
/*
* User space process size: 3GB (default).
*/
-#define TASK_SIZE PAGE_OFFSET
-#define TASK_SIZE_MAX TASK_SIZE
-#define STACK_TOP TASK_SIZE
-#define STACK_TOP_MAX STACK_TOP
+#define TASK_SIZE_MAX PAGE_OFFSET
+#define INIT_TASK_SIZE TASK_SIZE_MAX

#define INIT_THREAD { \
.sp0 = TOP_OF_INIT_STACK, \
@@ -817,12 +815,9 @@ static inline void spin_lock_prefetch(const void *x)
#define IA32_PAGE_OFFSET ((current->personality & ADDR_LIMIT_3GB) ? \
0xc0000000 : 0xFFFFe000)

-#define TASK_SIZE (current->personality & ADDR_LIMIT_32BIT ? \
+#define INIT_TASK_SIZE (current->personality & ADDR_LIMIT_32BIT ? \
IA32_PAGE_OFFSET : TASK_SIZE_MAX)

-#define STACK_TOP TASK_SIZE
-#define STACK_TOP_MAX TASK_SIZE_MAX
-
#define INIT_THREAD { \
.sp0 = TOP_OF_INIT_STACK, \
.addr_limit = KERNEL_DS, \
@@ -833,6 +828,12 @@ extern unsigned long KSTK_ESP(struct task_struct *task);

#endif /* CONFIG_X86_64 */

+#define TASK_SIZE \
+ ((current->mm) ? current->mm->task_size : TASK_SIZE_MAX)
+
+#define STACK_TOP TASK_SIZE
+#define STACK_TOP_MAX TASK_SIZE_MAX
+
extern unsigned long thread_saved_pc(struct task_struct *tsk);

extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
diff --git a/arch/x86/um/asm/segment.h b/arch/x86/um/asm/segment.h
index 41dd5e1f3cd7..3a9aa9f050df 100644
--- a/arch/x86/um/asm/segment.h
+++ b/arch/x86/um/asm/segment.h
@@ -13,6 +13,6 @@ typedef struct {

#define MAKE_MM_SEG(s) ((mm_segment_t) { (s) })
#define KERNEL_DS MAKE_MM_SEG(~0UL)
-#define USER_DS MAKE_MM_SEG(TASK_SIZE)
+#define USER_DS MAKE_MM_SEG(TASK_SIZE_MAX)

#endif
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 7d5afdb417cc..264ca3b7be58 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -830,7 +830,7 @@ static void __xen_pgd_pin(struct mm_struct *mm, pgd_t *pgd)
#else /* CONFIG_X86_32 */
#ifdef CONFIG_X86_PAE
/* Need to make sure unshared kernel PMD is pinnable */
- xen_pin_page(mm, pgd_page(pgd[pgd_index(TASK_SIZE)]),
+ xen_pin_page(mm, pgd_page(pgd[pgd_index(TASK_SIZE_MAX)]),
PT_PMD);
#endif
xen_do_pin(MMUEXT_PIN_L3_TABLE, PFN_DOWN(__pa(pgd)));
@@ -949,7 +949,7 @@ static void __xen_pgd_unpin(struct mm_struct *mm, pgd_t *pgd)

#ifdef CONFIG_X86_PAE
/* Need to make sure unshared kernel PMD is unpinned */
- xen_unpin_page(mm, pgd_page(pgd[pgd_index(TASK_SIZE)]),
+ xen_unpin_page(mm, pgd_page(pgd[pgd_index(TASK_SIZE_MAX)]),
PT_PMD);
#endif

diff --git a/fs/exec.c b/fs/exec.c
index e57946610733..826b73600fc2 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1303,8 +1303,19 @@ void would_dump(struct linux_binprm *bprm, struct file *file)
}
EXPORT_SYMBOL(would_dump);

+#ifndef INIT_TASK_SIZE
+#define INIT_TASK_SIZE TASK_SIZE
+#endif
+
void setup_new_exec(struct linux_binprm * bprm)
{
+
+ /* Set the new mm task size. We have to do that late because it may
+ * depend on TIF_32BIT which is only updated in flush_thread() on
+ * some architectures like powerpc
+ */
+ current->mm->task_size = INIT_TASK_SIZE;
+
arch_pick_mmap_layout(current->mm);

/* This is the point of no return */
@@ -1318,12 +1329,6 @@ void setup_new_exec(struct linux_binprm * bprm)
perf_event_exec();
__set_task_comm(current, kbasename(bprm->filename), true);

- /* Set the new mm task size. We have to do that late because it may
- * depend on TIF_32BIT which is only updated in flush_thread() on
- * some architectures like powerpc
- */
- current->mm->task_size = TASK_SIZE;
-
/* install the new credentials */
if (!uid_eq(bprm->cred->uid, current_euid()) ||
!gid_eq(bprm->cred->gid, current_egid())) {
--
2.11.0

2016-12-31 01:36:54

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC 1/4] mm: remove unused TASK_SIZE_OF()

On Fri, Dec 30, 2016 at 7:56 AM, Dmitry Safonov <[email protected]> wrote:
> All users of TASK_SIZE_OF(tsk) have migrated to mm->task_size or
> TASK_SIZE_MAX since:
> commit d696ca016d57 ("x86/fsgsbase/64: Use TASK_SIZE_MAX for
> FSBASE/GSBASE upper limits"),
> commit a06db751c321 ("pagemap: check permissions and capabilities at
> open time"),

I like this.

Reviewed-by: Andy Lutomirski <[email protected]> # for x86

2016-12-31 01:38:40

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC 2/4] x86/thread_info: kill TIF_ADDR32 in favour of ADDR_LIMIT_32BIT

On Fri, Dec 30, 2016 at 7:56 AM, Dmitry Safonov <[email protected]> wrote:
> This thread flag is completely x86-specific, consolidate it with
> ADDR_LIMIT_32BIT personality which is defined but not used on x86.
> It will free one of thread flags and consolidate personality with
> other arches.
> After this commit ADDR_LIMIT_32BIT is set by the kernel automatically
> in COMPAT_SET_PERSONALITY() for 32-bit ELF files and for 32-bit a.out.
> It's cleared in SET_PERSONALITY() for 64-bit ELFs.

I'm okay with this as a plain cleanup, but I'm not convinced that this
is really the right long-term solution. See next email.

2016-12-31 02:02:37

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC 4/4] x86/arch_prctl: add ARCH_{GET,SET}_TASK_SIZE

On Fri, Dec 30, 2016 at 7:56 AM, Dmitry Safonov <[email protected]> wrote:
> Add arch_prctl getters/setters for size of virtual address space of task.
> This adds ability to change task's virtual address space limit.
> I need this for correctly restore virtual address space limits in CRIU.
> Currently, on x86 there are three task sizes: 3GB for some old 32 bit java
> apps, 4Gb for ordinary 32-bit compatible apps and 47-bits for native
> x86_64 processes.
> 32-bit applications are restored by CRIU with the help of 64-bit clone()-d
> child, and on restore we need to place correct address space limitations
> back - otherwise 32-bit restored application may mmap() address over
> 4Gb space and as this address will not fit into 4-byte pointer, it
> will silently reuse/corrupt the pointer that has the same lower 4-bytes.
>

I agree we need something like this, but this particular justification
is a bit bogus. If 32-bit mmap() returns an address above 2^32, then
I think it's a straight-up bug. The address space limit shouldn't
have anything to do with it -- the kernel *knows* that it's a
"compat" syscall.

2016-12-31 06:35:33

by Dmitry Safonov

[permalink] [raw]
Subject: [RFC 1/4] mm: remove unused TASK_SIZE_OF()

All users of TASK_SIZE_OF(tsk) have migrated to mm->task_size or
TASK_SIZE_MAX since:
commit d696ca016d57 ("x86/fsgsbase/64: Use TASK_SIZE_MAX for
FSBASE/GSBASE upper limits"),
commit a06db751c321 ("pagemap: check permissions and capabilities at
open time"),

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
arch/arm64/include/asm/memory.h | 2 --
arch/mips/include/asm/processor.h | 3 ---
arch/parisc/include/asm/processor.h | 3 +--
arch/powerpc/include/asm/processor.h | 3 +--
arch/s390/include/asm/processor.h | 3 +--
arch/sparc/include/asm/processor_64.h | 3 ---
arch/x86/include/asm/processor.h | 2 --
include/linux/sched.h | 4 ----
8 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index bfe632808d77..329bb4fd543c 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -80,8 +80,6 @@
#define TASK_SIZE_32 UL(0x100000000)
#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
TASK_SIZE_32 : TASK_SIZE_64)
-#define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_32BIT) ? \
- TASK_SIZE_32 : TASK_SIZE_64)
#else
#define TASK_SIZE TASK_SIZE_64
#endif /* CONFIG_COMPAT */
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 95b8c471f572..c2827a5507d4 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -73,9 +73,6 @@ extern unsigned int vced_count, vcei_count;
#define TASK_SIZE (test_thread_flag(TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE64)
#define STACK_TOP_MAX TASK_SIZE64

-#define TASK_SIZE_OF(tsk) \
- (test_tsk_thread_flag(tsk, TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE64)
-
#define TASK_IS_32BIT_ADDR test_thread_flag(TIF_32BIT_ADDR)

#endif
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index a3661ee6b060..8b51ddae8e4a 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -32,8 +32,7 @@

#define HAVE_ARCH_PICK_MMAP_LAYOUT

-#define TASK_SIZE_OF(tsk) ((tsk)->thread.task_size)
-#define TASK_SIZE TASK_SIZE_OF(current)
+#define TASK_SIZE (current->thread.task_size)
#define TASK_UNMAPPED_BASE (current->thread.map_base)

#define DEFAULT_TASK_SIZE32 (0xFFF00000UL)
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 1ba814436c73..04e575ead590 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -111,9 +111,8 @@ void release_thread(struct task_struct *);
*/
#define TASK_SIZE_USER32 (0x0000000100000000UL - (1*PAGE_SIZE))

-#define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_32BIT) ? \
+#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
TASK_SIZE_USER32 : TASK_SIZE_USER64)
-#define TASK_SIZE TASK_SIZE_OF(current)

/* This decides where the kernel will search for a free chunk of vm
* space during mmap's.
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 6bca916a5ba0..c53e8e2a51ac 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -89,10 +89,9 @@ extern void execve_tail(void);
* User space process size: 2GB for 31 bit, 4TB or 8PT for 64 bit.
*/

-#define TASK_SIZE_OF(tsk) ((tsk)->mm->context.asce_limit)
#define TASK_UNMAPPED_BASE (test_thread_flag(TIF_31BIT) ? \
(1UL << 30) : (1UL << 41))
-#define TASK_SIZE TASK_SIZE_OF(current)
+#define TASK_SIZE (current->mm->context.asce_limit)
#define TASK_MAX_SIZE (1UL << 53)

#define STACK_TOP (1UL << (test_thread_flag(TIF_31BIT) ? 31:42))
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index 6448cfc8292f..6ce1a75d7a24 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -36,9 +36,6 @@
#define VPTE_SIZE (1 << (VA_BITS - PAGE_SHIFT + 3))
#endif

-#define TASK_SIZE_OF(tsk) \
- (test_tsk_thread_flag(tsk,TIF_32BIT) ? \
- (1UL << 32UL) : ((unsigned long)-VPTE_SIZE))
#define TASK_SIZE \
(test_thread_flag(TIF_32BIT) ? \
(1UL << 32UL) : ((unsigned long)-VPTE_SIZE))
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index eaf100508c36..090a860b792a 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -819,8 +819,6 @@ static inline void spin_lock_prefetch(const void *x)

#define TASK_SIZE (test_thread_flag(TIF_ADDR32) ? \
IA32_PAGE_OFFSET : TASK_SIZE_MAX)
-#define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, TIF_ADDR32)) ? \
- IA32_PAGE_OFFSET : TASK_SIZE_MAX)

#define STACK_TOP TASK_SIZE
#define STACK_TOP_MAX TASK_SIZE_MAX
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4d1905245c7a..7a2e2f3f38a3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3610,10 +3610,6 @@ static inline void inc_syscw(struct task_struct *tsk)
}
#endif

-#ifndef TASK_SIZE_OF
-#define TASK_SIZE_OF(tsk) TASK_SIZE
-#endif
-
#ifdef CONFIG_MEMCG
extern void mm_update_next_owner(struct mm_struct *mm);
#else
--
2.11.0

2016-12-31 11:10:43

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [RFC 4/4] x86/arch_prctl: add ARCH_{GET,SET}_TASK_SIZE

Hi Andy,
thanks for reviewing!

2016-12-31 5:02 GMT+03:00 Andy Lutomirski <[email protected]>:
> On Fri, Dec 30, 2016 at 7:56 AM, Dmitry Safonov <[email protected]> wrote:
>> Add arch_prctl getters/setters for size of virtual address space of task.
>> This adds ability to change task's virtual address space limit.
>> I need this for correctly restore virtual address space limits in CRIU.
>> Currently, on x86 there are three task sizes: 3GB for some old 32 bit java
>> apps, 4Gb for ordinary 32-bit compatible apps and 47-bits for native
>> x86_64 processes.
>> 32-bit applications are restored by CRIU with the help of 64-bit clone()-d
>> child, and on restore we need to place correct address space limitations
>> back - otherwise 32-bit restored application may mmap() address over
>> 4Gb space and as this address will not fit into 4-byte pointer, it
>> will silently reuse/corrupt the pointer that has the same lower 4-bytes.
>>
>
> I agree we need something like this, but this particular justification
> is a bit bogus. If 32-bit mmap() returns an address above 2^32, then
> I think it's a straight-up bug. The address space limit shouldn't
> have anything to do with it -- the kernel *knows* that it's a
> "compat" syscall.

Yep, I guess, I didn't realize that the real wrong thing is that
compat syscall returns address above 4Gb and not the address
space limits here. Thanks, will look into that.

--
Dmitry