2017-03-21 16:41:08

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

After my changes to mmap(), its code now relies on the bitness of
performing syscall. According to that, it chooses the base of allocation:
mmap_base for 64-bit mmap() and mmap_compat_base for 32-bit syscall.
It was done by:
commit 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for
32-bit mmap()").

The code afterwards relies on in_compat_syscall() returning true for
32-bit syscalls. It's usually so while we're in context of application
that does 32-bit syscalls. But during exec() it is not valid for x32 ELF.
The reason is that the application hasn't yet done any syscall, so x32
bit has not being set.
That results in -ENOMEM for x32 ELF files as there fired BAD_ADDR()
in elf_map(), that is called from do_execve()->load_elf_binary().
For i386 ELFs it works as SET_PERSONALITY() sets TS_COMPAT flag.

I suggest to set x32 bit before first return to userspace, during
setting personality at exec(). This way we can rely on
in_compat_syscall() during exec().

Fixes: commit 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for
32-bit mmap()")
Cc: [email protected]
Cc: [email protected]
Cc: Andrei Vagin <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: [email protected]
Cc: H. Peter Anvin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Reported-by: Adam Borowski <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
---
v2:
- specifying mmap() allocation path which failed during exec()
- fix comment style

arch/x86/kernel/process_64.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index d6b784a5520d..d3d4d9abcaf8 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -519,8 +519,14 @@ void set_personality_ia32(bool x32)
if (current->mm)
current->mm->context.ia32_compat = TIF_X32;
current->personality &= ~READ_IMPLIES_EXEC;
- /* in_compat_syscall() uses the presence of the x32
- syscall bit flag to determine compat status */
+ /*
+ * in_compat_syscall() uses the presence of the x32
+ * syscall bit flag to determine compat status.
+ * On the bitness of syscall relies x86 mmap() code,
+ * so set x32 syscall bit right here to make
+ * in_compat_syscall() work during exec().
+ */
+ task_pt_regs(current)->orig_ax |= __X32_SYSCALL_BIT;
current->thread.status &= ~TS_COMPAT;
} else {
set_thread_flag(TIF_IA32);
--
2.12.0


2017-03-21 17:17:30

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On Tue, Mar 21, 2017 at 07:37:12PM +0300, Dmitry Safonov wrote:
...
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index d6b784a5520d..d3d4d9abcaf8 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -519,8 +519,14 @@ void set_personality_ia32(bool x32)
> if (current->mm)
> current->mm->context.ia32_compat = TIF_X32;
> current->personality &= ~READ_IMPLIES_EXEC;
> - /* in_compat_syscall() uses the presence of the x32
> - syscall bit flag to determine compat status */
> + /*
> + * in_compat_syscall() uses the presence of the x32
> + * syscall bit flag to determine compat status.
> + * On the bitness of syscall relies x86 mmap() code,
> + * so set x32 syscall bit right here to make
> + * in_compat_syscall() work during exec().
> + */
> + task_pt_regs(current)->orig_ax |= __X32_SYSCALL_BIT;
> current->thread.status &= ~TS_COMPAT;

Hi! I must admit I didn't follow close the overall series (so can't
comment much here :) but I have a slightly unrelated question -- is
there a way to figure out if task is running in x32 mode say with
some ptrace or procfs sign?

2017-03-21 17:28:29

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On March 21, 2017 9:37:12 AM PDT, Dmitry Safonov <[email protected]> wrote:
>After my changes to mmap(), its code now relies on the bitness of
>performing syscall. According to that, it chooses the base of
>allocation:
>mmap_base for 64-bit mmap() and mmap_compat_base for 32-bit syscall.
>It was done by:
> commit 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for
>32-bit mmap()").
>
>The code afterwards relies on in_compat_syscall() returning true for
>32-bit syscalls. It's usually so while we're in context of application
>that does 32-bit syscalls. But during exec() it is not valid for x32
>ELF.
>The reason is that the application hasn't yet done any syscall, so x32
>bit has not being set.
>That results in -ENOMEM for x32 ELF files as there fired BAD_ADDR()
>in elf_map(), that is called from do_execve()->load_elf_binary().
>For i386 ELFs it works as SET_PERSONALITY() sets TS_COMPAT flag.
>
>I suggest to set x32 bit before first return to userspace, during
>setting personality at exec(). This way we can rely on
>in_compat_syscall() during exec().
>
>Fixes: commit 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for
>32-bit mmap()")
>Cc: [email protected]
>Cc: [email protected]
>Cc: Andrei Vagin <[email protected]>
>Cc: Cyrill Gorcunov <[email protected]>
>Cc: Borislav Petkov <[email protected]>
>Cc: "Kirill A. Shutemov" <[email protected]>
>Cc: [email protected]
>Cc: H. Peter Anvin <[email protected]>
>Cc: Andy Lutomirski <[email protected]>
>Cc: Ingo Molnar <[email protected]>
>Cc: Thomas Gleixner <[email protected]>
>Reported-by: Adam Borowski <[email protected]>
>Signed-off-by: Dmitry Safonov <[email protected]>
>---
>v2:
>- specifying mmap() allocation path which failed during exec()
>- fix comment style
>
> arch/x86/kernel/process_64.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
>diff --git a/arch/x86/kernel/process_64.c
>b/arch/x86/kernel/process_64.c
>index d6b784a5520d..d3d4d9abcaf8 100644
>--- a/arch/x86/kernel/process_64.c
>+++ b/arch/x86/kernel/process_64.c
>@@ -519,8 +519,14 @@ void set_personality_ia32(bool x32)
> if (current->mm)
> current->mm->context.ia32_compat = TIF_X32;
> current->personality &= ~READ_IMPLIES_EXEC;
>- /* in_compat_syscall() uses the presence of the x32
>- syscall bit flag to determine compat status */
>+ /*
>+ * in_compat_syscall() uses the presence of the x32
>+ * syscall bit flag to determine compat status.
>+ * On the bitness of syscall relies x86 mmap() code,
>+ * so set x32 syscall bit right here to make
>+ * in_compat_syscall() work during exec().
>+ */
>+ task_pt_regs(current)->orig_ax |= __X32_SYSCALL_BIT;
> current->thread.status &= ~TS_COMPAT;
> } else {
> set_thread_flag(TIF_IA32);

You also need to clear the bit for an x32 -> x86-64 exec. Otherwise it seems okay to me.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2017-03-21 17:33:00

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On 03/21/2017 08:27 PM, [email protected] wrote:
> On March 21, 2017 9:37:12 AM PDT, Dmitry Safonov <[email protected]> wrote:
>> After my changes to mmap(), its code now relies on the bitness of
>> performing syscall. According to that, it chooses the base of
>> allocation:
>> mmap_base for 64-bit mmap() and mmap_compat_base for 32-bit syscall.
>> It was done by:
>> commit 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for
>> 32-bit mmap()").
>>
>> The code afterwards relies on in_compat_syscall() returning true for
>> 32-bit syscalls. It's usually so while we're in context of application
>> that does 32-bit syscalls. But during exec() it is not valid for x32
>> ELF.
>> The reason is that the application hasn't yet done any syscall, so x32
>> bit has not being set.
>> That results in -ENOMEM for x32 ELF files as there fired BAD_ADDR()
>> in elf_map(), that is called from do_execve()->load_elf_binary().
>> For i386 ELFs it works as SET_PERSONALITY() sets TS_COMPAT flag.
>>
>> I suggest to set x32 bit before first return to userspace, during
>> setting personality at exec(). This way we can rely on
>> in_compat_syscall() during exec().
>>
>> Fixes: commit 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for
>> 32-bit mmap()")
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: Andrei Vagin <[email protected]>
>> Cc: Cyrill Gorcunov <[email protected]>
>> Cc: Borislav Petkov <[email protected]>
>> Cc: "Kirill A. Shutemov" <[email protected]>
>> Cc: [email protected]
>> Cc: H. Peter Anvin <[email protected]>
>> Cc: Andy Lutomirski <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Reported-by: Adam Borowski <[email protected]>
>> Signed-off-by: Dmitry Safonov <[email protected]>
>> ---
>> v2:
>> - specifying mmap() allocation path which failed during exec()
>> - fix comment style
>>
>> arch/x86/kernel/process_64.c | 10 ++++++++--
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/process_64.c
>> b/arch/x86/kernel/process_64.c
>> index d6b784a5520d..d3d4d9abcaf8 100644
>> --- a/arch/x86/kernel/process_64.c
>> +++ b/arch/x86/kernel/process_64.c
>> @@ -519,8 +519,14 @@ void set_personality_ia32(bool x32)
>> if (current->mm)
>> current->mm->context.ia32_compat = TIF_X32;
>> current->personality &= ~READ_IMPLIES_EXEC;
>> - /* in_compat_syscall() uses the presence of the x32
>> - syscall bit flag to determine compat status */
>> + /*
>> + * in_compat_syscall() uses the presence of the x32
>> + * syscall bit flag to determine compat status.
>> + * On the bitness of syscall relies x86 mmap() code,
>> + * so set x32 syscall bit right here to make
>> + * in_compat_syscall() work during exec().
>> + */
>> + task_pt_regs(current)->orig_ax |= __X32_SYSCALL_BIT;
>> current->thread.status &= ~TS_COMPAT;
>> } else {
>> set_thread_flag(TIF_IA32);
>
> You also need to clear the bit for an x32 -> x86-64 exec. Otherwise it seems okay to me.

Oh, indeed!
Thanks for catching, I'll send v3 with it.

--
Dmitry

2017-03-21 17:46:52

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On Tue, Mar 21, 2017 at 10:17 AM, Cyrill Gorcunov <[email protected]> wrote:
> On Tue, Mar 21, 2017 at 07:37:12PM +0300, Dmitry Safonov wrote:
> ...
>> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
>> index d6b784a5520d..d3d4d9abcaf8 100644
>> --- a/arch/x86/kernel/process_64.c
>> +++ b/arch/x86/kernel/process_64.c
>> @@ -519,8 +519,14 @@ void set_personality_ia32(bool x32)
>> if (current->mm)
>> current->mm->context.ia32_compat = TIF_X32;
>> current->personality &= ~READ_IMPLIES_EXEC;
>> - /* in_compat_syscall() uses the presence of the x32
>> - syscall bit flag to determine compat status */
>> + /*
>> + * in_compat_syscall() uses the presence of the x32
>> + * syscall bit flag to determine compat status.
>> + * On the bitness of syscall relies x86 mmap() code,
>> + * so set x32 syscall bit right here to make
>> + * in_compat_syscall() work during exec().
>> + */
>> + task_pt_regs(current)->orig_ax |= __X32_SYSCALL_BIT;
>> current->thread.status &= ~TS_COMPAT;
>
> Hi! I must admit I didn't follow close the overall series (so can't
> comment much here :) but I have a slightly unrelated question -- is
> there a way to figure out if task is running in x32 mode say with
> some ptrace or procfs sign?

You should be able to figure out of a *syscall* is x32 by simply
looking at bit 30 in the syscall number. (This is unlike i386, which
is currently not reflected in ptrace.)

Do we actually have an x32 per-task mode at all? If so, maybe we can
just remove it on top of Dmitry's series.

2017-03-21 18:05:37

by Cyrill Gorcunov

[permalink] [raw]
Subject: [Q] Figuring out task mode

/I renamed the mail's subject/

On Tue, Mar 21, 2017 at 10:45:57AM -0700, Andy Lutomirski wrote:
> >> + task_pt_regs(current)->orig_ax |= __X32_SYSCALL_BIT;
> >> current->thread.status &= ~TS_COMPAT;
> >
> > Hi! I must admit I didn't follow close the overall series (so can't
> > comment much here :) but I have a slightly unrelated question -- is
> > there a way to figure out if task is running in x32 mode say with
> > some ptrace or procfs sign?
>
> You should be able to figure out of a *syscall* is x32 by simply
> looking at bit 30 in the syscall number. (This is unlike i386, which
> is currently not reflected in ptrace.)

Yes, syscall number will help but from criu perpspective (until
Dima's patches are merged into mainlie) we need to figure out
if we can dump x32 tasks without running parasite code inside,
ie via plain ptrace call or some procfs output. But looks like
it's impossible for now.

> Do we actually have an x32 per-task mode at all? If so, maybe we can
> just remove it on top of Dmitry's series.

Don't think so, x32 should be set upon exec and without Dima's series
it is immutable I think.

2017-03-21 18:28:47

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On 03/21/2017 08:45 PM, Andy Lutomirski wrote:
> On Tue, Mar 21, 2017 at 10:17 AM, Cyrill Gorcunov <[email protected]> wrote:
>> On Tue, Mar 21, 2017 at 07:37:12PM +0300, Dmitry Safonov wrote:
>> ...
>>> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
>>> index d6b784a5520d..d3d4d9abcaf8 100644
>>> --- a/arch/x86/kernel/process_64.c
>>> +++ b/arch/x86/kernel/process_64.c
>>> @@ -519,8 +519,14 @@ void set_personality_ia32(bool x32)
>>> if (current->mm)
>>> current->mm->context.ia32_compat = TIF_X32;
>>> current->personality &= ~READ_IMPLIES_EXEC;
>>> - /* in_compat_syscall() uses the presence of the x32
>>> - syscall bit flag to determine compat status */
>>> + /*
>>> + * in_compat_syscall() uses the presence of the x32
>>> + * syscall bit flag to determine compat status.
>>> + * On the bitness of syscall relies x86 mmap() code,
>>> + * so set x32 syscall bit right here to make
>>> + * in_compat_syscall() work during exec().
>>> + */
>>> + task_pt_regs(current)->orig_ax |= __X32_SYSCALL_BIT;
>>> current->thread.status &= ~TS_COMPAT;
>>
>> Hi! I must admit I didn't follow close the overall series (so can't
>> comment much here :) but I have a slightly unrelated question -- is
>> there a way to figure out if task is running in x32 mode say with
>> some ptrace or procfs sign?
>
> You should be able to figure out of a *syscall* is x32 by simply
> looking at bit 30 in the syscall number. (This is unlike i386, which
> is currently not reflected in ptrace.)

The process could be stopped with PTRACE_SEIZE and I think, it'll not
have x32 syscall bit at that moment.

I guess the question comes from that we're releasing CRIU 3.0 with
32-bit C/R and some other cool stuff, but we don't support x32 yet.
As we don't want release a thing that we aren't properly testing.
So for a while we should error on dumping x32 applications.

I think, the best way for now is to check physicall address of vdso
from /proc/.../pagemap. If it's CONFIG_VDSO=n kernel, I guess we could
also add check for %ds from ptrace's register set. For x32 it's set to
__USER_DS, while for native it's 0 (looking at start_thread() and
compat_start_thread()). The application can simply change it without
any consequence - so it's not very reliable, we could only warn at
catching it, not rely on this.

>
> Do we actually have an x32 per-task mode at all? If so, maybe we can
> just remove it on top of Dmitry's series.

--
Dmitry

2017-03-21 18:41:33

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On Tue, Mar 21, 2017 at 09:09:40PM +0300, Dmitry Safonov wrote:
>
> I guess the question comes from that we're releasing CRIU 3.0 with
> 32-bit C/R and some other cool stuff, but we don't support x32 yet.
> As we don't want release a thing that we aren't properly testing.
> So for a while we should error on dumping x32 applications.

yes

> I think, the best way for now is to check physicall address of vdso
> from /proc/.../pagemap. If it's CONFIG_VDSO=n kernel, I guess we could
> also add check for %ds from ptrace's register set. For x32 it's set to
> __USER_DS, while for native it's 0 (looking at start_thread() and
> compat_start_thread()). The application can simply change it without
> any consequence - so it's not very reliable, we could only warn at
> catching it, not rely on this.

indeed, thanks!

2017-03-21 18:52:04

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On March 21, 2017 11:40:58 AM PDT, Cyrill Gorcunov <[email protected]> wrote:
>On Tue, Mar 21, 2017 at 09:09:40PM +0300, Dmitry Safonov wrote:
>>
>> I guess the question comes from that we're releasing CRIU 3.0 with
>> 32-bit C/R and some other cool stuff, but we don't support x32 yet.
>> As we don't want release a thing that we aren't properly testing.
>> So for a while we should error on dumping x32 applications.
>
>yes
>
>> I think, the best way for now is to check physicall address of vdso
>> from /proc/.../pagemap. If it's CONFIG_VDSO=n kernel, I guess we
>could
>> also add check for %ds from ptrace's register set. For x32 it's set
>to
>> __USER_DS, while for native it's 0 (looking at start_thread() and
>> compat_start_thread()). The application can simply change it without
>> any consequence - so it's not very reliable, we could only warn at
>> catching it, not rely on this.
>
>indeed, thanks!

I proposed to the ptrace people a virtual register for this and a few other things, but it got bikeshed to death.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2017-03-21 18:51:31

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On March 21, 2017 10:45:57 AM PDT, Andy Lutomirski <[email protected]> wrote:
>On Tue, Mar 21, 2017 at 10:17 AM, Cyrill Gorcunov <[email protected]>
>wrote:
>> On Tue, Mar 21, 2017 at 07:37:12PM +0300, Dmitry Safonov wrote:
>> ...
>>> diff --git a/arch/x86/kernel/process_64.c
>b/arch/x86/kernel/process_64.c
>>> index d6b784a5520d..d3d4d9abcaf8 100644
>>> --- a/arch/x86/kernel/process_64.c
>>> +++ b/arch/x86/kernel/process_64.c
>>> @@ -519,8 +519,14 @@ void set_personality_ia32(bool x32)
>>> if (current->mm)
>>> current->mm->context.ia32_compat = TIF_X32;
>>> current->personality &= ~READ_IMPLIES_EXEC;
>>> - /* in_compat_syscall() uses the presence of the x32
>>> - syscall bit flag to determine compat status */
>>> + /*
>>> + * in_compat_syscall() uses the presence of the x32
>>> + * syscall bit flag to determine compat status.
>>> + * On the bitness of syscall relies x86 mmap() code,
>>> + * so set x32 syscall bit right here to make
>>> + * in_compat_syscall() work during exec().
>>> + */
>>> + task_pt_regs(current)->orig_ax |= __X32_SYSCALL_BIT;
>>> current->thread.status &= ~TS_COMPAT;
>>
>> Hi! I must admit I didn't follow close the overall series (so can't
>> comment much here :) but I have a slightly unrelated question -- is
>> there a way to figure out if task is running in x32 mode say with
>> some ptrace or procfs sign?
>
>You should be able to figure out of a *syscall* is x32 by simply
>looking at bit 30 in the syscall number. (This is unlike i386, which
>is currently not reflected in ptrace.)
>
>Do we actually have an x32 per-task mode at all? If so, maybe we can
>just remove it on top of Dmitry's series.

We do, for things like signal delivery mostly. We have tried relying on it as little as possible, intentionally.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2017-03-21 19:14:09

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On Tue, Mar 21, 2017 at 11:51:09AM -0700, [email protected] wrote:
> >
> >indeed, thanks!
>
> I proposed to the ptrace people a virtual register for this and a few other things, but it got bikeshed to death.

Any mail reference left? Would like to read it.

2017-03-21 19:21:05

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On March 21, 2017 12:07:13 PM PDT, Cyrill Gorcunov <[email protected]> wrote:
>On Tue, Mar 21, 2017 at 11:51:09AM -0700, [email protected] wrote:
>> >
>> >indeed, thanks!
>>
>> I proposed to the ptrace people a virtual register for this and a few
>other things, but it got bikeshed to death.
>
>Any mail reference left? Would like to read it.

Not sure...
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2017-03-21 19:26:48

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On Tue, Mar 21, 2017 at 10:19:01PM +0300, Dmitry Safonov wrote:
> >
> > indeed, thanks!
>
> Also, even more simple-minded: for now we could just check binary magic
> from /proc/.../exe, for now stopping on x32 binaries.

File may not exist and elfheader wiped out as well.

2017-03-21 19:32:56

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On Tue, Mar 21, 2017 at 11:09 AM, Dmitry Safonov <[email protected]> wrote:
> On 03/21/2017 08:45 PM, Andy Lutomirski wrote:
>>
>> On Tue, Mar 21, 2017 at 10:17 AM, Cyrill Gorcunov <[email protected]>
>> wrote:
>>>
>>> On Tue, Mar 21, 2017 at 07:37:12PM +0300, Dmitry Safonov wrote:
>>> ...
>>>>
>>>> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
>>>> index d6b784a5520d..d3d4d9abcaf8 100644
>>>> --- a/arch/x86/kernel/process_64.c
>>>> +++ b/arch/x86/kernel/process_64.c
>>>> @@ -519,8 +519,14 @@ void set_personality_ia32(bool x32)
>>>> if (current->mm)
>>>> current->mm->context.ia32_compat = TIF_X32;
>>>> current->personality &= ~READ_IMPLIES_EXEC;
>>>> - /* in_compat_syscall() uses the presence of the x32
>>>> - syscall bit flag to determine compat status */
>>>> + /*
>>>> + * in_compat_syscall() uses the presence of the x32
>>>> + * syscall bit flag to determine compat status.
>>>> + * On the bitness of syscall relies x86 mmap() code,
>>>> + * so set x32 syscall bit right here to make
>>>> + * in_compat_syscall() work during exec().
>>>> + */
>>>> + task_pt_regs(current)->orig_ax |= __X32_SYSCALL_BIT;
>>>> current->thread.status &= ~TS_COMPAT;
>>>
>>>
>>> Hi! I must admit I didn't follow close the overall series (so can't
>>> comment much here :) but I have a slightly unrelated question -- is
>>> there a way to figure out if task is running in x32 mode say with
>>> some ptrace or procfs sign?
>>
>>
>> You should be able to figure out of a *syscall* is x32 by simply
>> looking at bit 30 in the syscall number. (This is unlike i386, which
>> is currently not reflected in ptrace.)
>
>
> The process could be stopped with PTRACE_SEIZE and I think, it'll not
> have x32 syscall bit at that moment.
>
> I guess the question comes from that we're releasing CRIU 3.0 with
> 32-bit C/R and some other cool stuff, but we don't support x32 yet.
> As we don't want release a thing that we aren't properly testing.
> So for a while we should error on dumping x32 applications.

I'm curious: shouldn't x32 CRIU just work? What goes wrong?

--Andy

2017-03-21 19:36:39

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On Tue, Mar 21, 2017 at 12:31:51PM -0700, Andy Lutomirski wrote:
...
> > I guess the question comes from that we're releasing CRIU 3.0 with
> > 32-bit C/R and some other cool stuff, but we don't support x32 yet.
> > As we don't want release a thing that we aren't properly testing.
> > So for a while we should error on dumping x32 applications.
>
> I'm curious: shouldn't x32 CRIU just work? What goes wrong?

Anything ;) We didn't tried as far as I know but i bet
somthing will be broken for sure.

2017-03-21 19:47:20

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On 03/21/2017 10:31 PM, Andy Lutomirski wrote:
> On Tue, Mar 21, 2017 at 11:09 AM, Dmitry Safonov <[email protected]> wrote:
>> On 03/21/2017 08:45 PM, Andy Lutomirski wrote:
>>>
>>> On Tue, Mar 21, 2017 at 10:17 AM, Cyrill Gorcunov <[email protected]>
>>> wrote:
>>>>
>>>> On Tue, Mar 21, 2017 at 07:37:12PM +0300, Dmitry Safonov wrote:
>>>> ...
>>>>>
>>>>> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
>>>>> index d6b784a5520d..d3d4d9abcaf8 100644
>>>>> --- a/arch/x86/kernel/process_64.c
>>>>> +++ b/arch/x86/kernel/process_64.c
>>>>> @@ -519,8 +519,14 @@ void set_personality_ia32(bool x32)
>>>>> if (current->mm)
>>>>> current->mm->context.ia32_compat = TIF_X32;
>>>>> current->personality &= ~READ_IMPLIES_EXEC;
>>>>> - /* in_compat_syscall() uses the presence of the x32
>>>>> - syscall bit flag to determine compat status */
>>>>> + /*
>>>>> + * in_compat_syscall() uses the presence of the x32
>>>>> + * syscall bit flag to determine compat status.
>>>>> + * On the bitness of syscall relies x86 mmap() code,
>>>>> + * so set x32 syscall bit right here to make
>>>>> + * in_compat_syscall() work during exec().
>>>>> + */
>>>>> + task_pt_regs(current)->orig_ax |= __X32_SYSCALL_BIT;
>>>>> current->thread.status &= ~TS_COMPAT;
>>>>
>>>>
>>>> Hi! I must admit I didn't follow close the overall series (so can't
>>>> comment much here :) but I have a slightly unrelated question -- is
>>>> there a way to figure out if task is running in x32 mode say with
>>>> some ptrace or procfs sign?
>>>
>>>
>>> You should be able to figure out of a *syscall* is x32 by simply
>>> looking at bit 30 in the syscall number. (This is unlike i386, which
>>> is currently not reflected in ptrace.)
>>
>>
>> The process could be stopped with PTRACE_SEIZE and I think, it'll not
>> have x32 syscall bit at that moment.
>>
>> I guess the question comes from that we're releasing CRIU 3.0 with
>> 32-bit C/R and some other cool stuff, but we don't support x32 yet.
>> As we don't want release a thing that we aren't properly testing.
>> So for a while we should error on dumping x32 applications.
>
> I'm curious: shouldn't x32 CRIU just work? What goes wrong?

I also think, it should be quite easy to add, as we have arch_prctl()
for vdso and etc.
But there are things, which will not work if we just dump application
as 64-bit.

For example, what comes to mind: sys_get_robust_list(), it has different
pointers for 64-bit or for x32/ia32 applications: robust_list
and compat_robust_list. So during C/R we should sometimes call
compatible syscalls for x32 applications to dump/restore, as for futex
list e.g., native will return NULL or empty list.

>
> --Andy
>


--
Dmitry

2017-03-21 20:07:47

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On 03/21/2017 10:42 PM, Dmitry Safonov wrote:
> On 03/21/2017 10:31 PM, Andy Lutomirski wrote:
>> On Tue, Mar 21, 2017 at 11:09 AM, Dmitry Safonov
>> <[email protected]> wrote:
>>> On 03/21/2017 08:45 PM, Andy Lutomirski wrote:
>>>>
>>>> On Tue, Mar 21, 2017 at 10:17 AM, Cyrill Gorcunov <[email protected]>
>>>> wrote:
>>>>>
>>>>> On Tue, Mar 21, 2017 at 07:37:12PM +0300, Dmitry Safonov wrote:
>>>>> ...
>>>>>>
>>>>>> diff --git a/arch/x86/kernel/process_64.c
>>>>>> b/arch/x86/kernel/process_64.c
>>>>>> index d6b784a5520d..d3d4d9abcaf8 100644
>>>>>> --- a/arch/x86/kernel/process_64.c
>>>>>> +++ b/arch/x86/kernel/process_64.c
>>>>>> @@ -519,8 +519,14 @@ void set_personality_ia32(bool x32)
>>>>>> if (current->mm)
>>>>>> current->mm->context.ia32_compat = TIF_X32;
>>>>>> current->personality &= ~READ_IMPLIES_EXEC;
>>>>>> - /* in_compat_syscall() uses the presence of the x32
>>>>>> - syscall bit flag to determine compat status */
>>>>>> + /*
>>>>>> + * in_compat_syscall() uses the presence of the x32
>>>>>> + * syscall bit flag to determine compat status.
>>>>>> + * On the bitness of syscall relies x86 mmap() code,
>>>>>> + * so set x32 syscall bit right here to make
>>>>>> + * in_compat_syscall() work during exec().
>>>>>> + */
>>>>>> + task_pt_regs(current)->orig_ax |= __X32_SYSCALL_BIT;
>>>>>> current->thread.status &= ~TS_COMPAT;
>>>>>
>>>>>
>>>>> Hi! I must admit I didn't follow close the overall series (so can't
>>>>> comment much here :) but I have a slightly unrelated question -- is
>>>>> there a way to figure out if task is running in x32 mode say with
>>>>> some ptrace or procfs sign?
>>>>
>>>>
>>>> You should be able to figure out of a *syscall* is x32 by simply
>>>> looking at bit 30 in the syscall number. (This is unlike i386, which
>>>> is currently not reflected in ptrace.)
>>>
>>>
>>> The process could be stopped with PTRACE_SEIZE and I think, it'll not
>>> have x32 syscall bit at that moment.
>>>
>>> I guess the question comes from that we're releasing CRIU 3.0 with
>>> 32-bit C/R and some other cool stuff, but we don't support x32 yet.
>>> As we don't want release a thing that we aren't properly testing.
>>> So for a while we should error on dumping x32 applications.
>>
>> I'm curious: shouldn't x32 CRIU just work? What goes wrong?
>
> I also think, it should be quite easy to add, as we have arch_prctl()
> for vdso and etc.
> But there are things, which will not work if we just dump application
> as 64-bit.
>
> For example, what comes to mind: sys_get_robust_list(), it has different
> pointers for 64-bit or for x32/ia32 applications: robust_list
> and compat_robust_list. So during C/R we should sometimes call
> compatible syscalls for x32 applications to dump/restore, as for futex
> list e.g., native will return NULL or empty list.

Maybe we should just save both pointers with CRIU for simplicity.
Which will add additional syscall for most applications that define only
one of compat/native lists.

I think, there are some other things like that, but it's end of the day
and nothing crosses my mind.

Anyway, I wouldn't want release anything without adding it to regular
tests, so that would need also some time to do. And a funny thing: there
are many folks which runs 32-bit containers on x86_64 to save memory,
but they use ia32, not x32. Maybe because of envoironment which is
easier to get (for x32 there are no templates for example). Maybe just
because.

So, yet I haven't saw any request for x32 C/R while for ia32 there is
crowded room. And quite many people for arm/arm64 where kernel doesn't
support yet vdso mremap() and CRIU doesn't run test for them regulary
yet.

But well, x32 still should be quite easy to add, say, for the next
release after ia32 C/R not sure if it'll be planned though.


--
Dmitry

2017-03-21 20:58:47

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On 03/21/2017 09:40 PM, Cyrill Gorcunov wrote:
> On Tue, Mar 21, 2017 at 09:09:40PM +0300, Dmitry Safonov wrote:
>>
>> I guess the question comes from that we're releasing CRIU 3.0 with
>> 32-bit C/R and some other cool stuff, but we don't support x32 yet.
>> As we don't want release a thing that we aren't properly testing.
>> So for a while we should error on dumping x32 applications.
>
> yes
>
>> I think, the best way for now is to check physicall address of vdso
>> from /proc/.../pagemap. If it's CONFIG_VDSO=n kernel, I guess we could
>> also add check for %ds from ptrace's register set. For x32 it's set to
>> __USER_DS, while for native it's 0 (looking at start_thread() and
>> compat_start_thread()). The application can simply change it without
>> any consequence - so it's not very reliable, we could only warn at
>> catching it, not rely on this.
>
> indeed, thanks!

Also, even more simple-minded: for now we could just check binary magic
from /proc/.../exe, for now stopping on x32 binaries.

--
Dmitry

2017-03-21 21:14:03

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCHv2] x86/mm: set x32 syscall bit in SET_PERSONALITY()

On 03/21/2017 10:24 PM, Cyrill Gorcunov wrote:
> On Tue, Mar 21, 2017 at 10:19:01PM +0300, Dmitry Safonov wrote:
>>>
>>> indeed, thanks!
>>
>> Also, even more simple-minded: for now we could just check binary magic
>> from /proc/.../exe, for now stopping on x32 binaries.
>
> File may not exist and elfheader wiped out as well.

Yep, not very reliable.

--
Dmitry

2017-03-21 23:56:07

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [Q] Figuring out task mode

On Tue, Mar 21, 2017 at 11:05 AM, Cyrill Gorcunov <[email protected]> wrote:
> /I renamed the mail's subject/
>
> On Tue, Mar 21, 2017 at 10:45:57AM -0700, Andy Lutomirski wrote:
>> >> + task_pt_regs(current)->orig_ax |= __X32_SYSCALL_BIT;
>> >> current->thread.status &= ~TS_COMPAT;
>> >
>> > Hi! I must admit I didn't follow close the overall series (so can't
>> > comment much here :) but I have a slightly unrelated question -- is
>> > there a way to figure out if task is running in x32 mode say with
>> > some ptrace or procfs sign?
>>
>> You should be able to figure out of a *syscall* is x32 by simply
>> looking at bit 30 in the syscall number. (This is unlike i386, which
>> is currently not reflected in ptrace.)
>
> Yes, syscall number will help but from criu perpspective (until
> Dima's patches are merged into mainlie) we need to figure out
> if we can dump x32 tasks without running parasite code inside,
> ie via plain ptrace call or some procfs output. But looks like
> it's impossible for now.
>
>> Do we actually have an x32 per-task mode at all? If so, maybe we can
>> just remove it on top of Dmitry's series.
>
> Don't think so, x32 should be set upon exec and without Dima's series
> it is immutable I think.

What I mean is: why should the kernel care about per-task X32 state
*at all*? On top of Dmitry's series, TIF_X32 appears to be used to
determine which vDSO to map, which mm layout to use, *and nothing
else*. Want to write a trivial patch to get rid of it entirely?

Ideally we could get rid of mm->context.ia32_compat, too. The only
interesting use it has is MPX, and we should probably instead track
mm->context.mpx_layout and determine *that* from the prctl() bitness.