LinuxLists.cc - [PATCH] x86-64/entry: add instruction suffix to SYSRET

2019-12-10 10:49:00

Subject: [PATCH] x86-64/entry: add instruction suffix to SYSRET

Omitting suffixes from instructions in AT&T mode is bad practice when
operand size cannot be determined by the assembler from register
operands, and is likely going to be warned about by upstream gas in the
future. Add the missing suffix here.

Signed-off-by: Jan Beulich <[email protected]>

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1728,7 +1728,7 @@ END(nmi)
SYM_CODE_START(ignore_sysret)
UNWIND_HINT_EMPTY
mov $-ENOSYS, %eax
- sysret
+ sysretl
SYM_CODE_END(ignore_sysret)
#endif

2019-12-10 15:30:46

by Andy Lutomirski

[permalink] [raw]

Subject: Re: [PATCH] x86-64/entry: add instruction suffix to SYSRET

> On Dec 10, 2019, at 2:48 AM, Jan Beulich <[email protected]> wrote:
>
> Omitting suffixes from instructions in AT&T mode is bad practice when
> operand size cannot be determined by the assembler from register
> operands, and is likely going to be warned about by upstream gas in the
> future. Add the missing suffix here.
>
> Signed-off-by: Jan Beulich <[email protected]>
>
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1728,7 +1728,7 @@ END(nmi)
> SYM_CODE_START(ignore_sysret)
> UNWIND_HINT_EMPTY
> mov $-ENOSYS, %eax
> - sysret
> + sysretl

Isn’t the default sysretq? sysretl looks more correct, but that suggests that your changelog is wrong.

Is this code even reachable?

> SYM_CODE_END(ignore_sysret)
> #endif
>

2019-12-10 15:42:16

by Jan Beulich

[permalink] [raw]

Subject: Re: [PATCH] x86-64/entry: add instruction suffix to SYSRET

On 10.12.2019 16:29, Andy Lutomirski wrote:
>> On Dec 10, 2019, at 2:48 AM, Jan Beulich <[email protected]> wrote:
>>
>> Omitting suffixes from instructions in AT&T mode is bad practice when
>> operand size cannot be determined by the assembler from register
>> operands, and is likely going to be warned about by upstream gas in the
>> future. Add the missing suffix here.
>>
>> Signed-off-by: Jan Beulich <[email protected]>
>>
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -1728,7 +1728,7 @@ END(nmi)
>> SYM_CODE_START(ignore_sysret)
>> UNWIND_HINT_EMPTY
>> mov $-ENOSYS, %eax
>> - sysret
>> + sysretl
>
> Isn’t the default sysretq? sysretl looks more correct, but that suggests
> that your changelog is wrong.

No, this is different from ret, and more like iret and lret.

> Is this code even reachable?

Yes afaict, supported by the comment ahead of the symbol. syscall_init()
puts its address into MSR_CSTAR when !IA32_EMULATION.

Jan

2019-12-12 21:57:25

by Andy Lutomirski

[permalink] [raw]

Subject: Re: [PATCH] x86-64/entry: add instruction suffix to SYSRET

On Tue, Dec 10, 2019 at 7:40 AM Jan Beulich <[email protected]> wrote:
>
> On 10.12.2019 16:29, Andy Lutomirski wrote:
> >> On Dec 10, 2019, at 2:48 AM, Jan Beulich <[email protected]> wrote:
> >>
> >> Omitting suffixes from instructions in AT&T mode is bad practice when
> >> operand size cannot be determined by the assembler from register
> >> operands, and is likely going to be warned about by upstream gas in the
> >> future. Add the missing suffix here.
> >>
> >> Signed-off-by: Jan Beulich <[email protected]>
> >>
> >> --- a/arch/x86/entry/entry_64.S
> >> +++ b/arch/x86/entry/entry_64.S
> >> @@ -1728,7 +1728,7 @@ END(nmi)
> >> SYM_CODE_START(ignore_sysret)
> >> UNWIND_HINT_EMPTY
> >> mov $-ENOSYS, %eax
> >> - sysret
> >> + sysretl
> >
> > Isn’t the default sysretq? sysretl looks more correct, but that suggests
> > that your changelog is wrong.
>
> No, this is different from ret, and more like iret and lret.
>
> > Is this code even reachable?
>
> Yes afaict, supported by the comment ahead of the symbol. syscall_init()
> puts its address into MSR_CSTAR when !IA32_EMULATION.
>

What I meant was: can a program actually get itself into 32-bit mode
to execute a 32-bit SYSCALL instruction?

Anyway, the change itself is Acked-by: Andy Lutomirski <[email protected]>

But let's please clarify the changelog:

ignore_sysret contains an unsuffixed 'sysret' instruction. gas
correctly interprets this as sysretl, but leaving it up to gas to
guess when there is no register operand that implies a size is bad
practice, and upstream gas is likely to warn about this in the future.
Use 'sysretl' explicitly. This does not change the assembled output.

--Andy

2019-12-13 09:56:14

by Jan Beulich

[permalink] [raw]

Subject: Re: [PATCH] x86-64/entry: add instruction suffix to SYSRET

On 12.12.2019 22:43, Andy Lutomirski wrote:
> On Tue, Dec 10, 2019 at 7:40 AM Jan Beulich <[email protected]> wrote:
>>
>> On 10.12.2019 16:29, Andy Lutomirski wrote:
>>>> On Dec 10, 2019, at 2:48 AM, Jan Beulich <[email protected]> wrote:
>>>>
>>>> Omitting suffixes from instructions in AT&T mode is bad practice when
>>>> operand size cannot be determined by the assembler from register
>>>> operands, and is likely going to be warned about by upstream gas in the
>>>> future. Add the missing suffix here.
>>>>
>>>> Signed-off-by: Jan Beulich <[email protected]>
>>>>
>>>> --- a/arch/x86/entry/entry_64.S
>>>> +++ b/arch/x86/entry/entry_64.S
>>>> @@ -1728,7 +1728,7 @@ END(nmi)
>>>> SYM_CODE_START(ignore_sysret)
>>>> UNWIND_HINT_EMPTY
>>>> mov $-ENOSYS, %eax
>>>> - sysret
>>>> + sysretl
>>>
>>> Isn’t the default sysretq? sysretl looks more correct, but that suggests
>>> that your changelog is wrong.
>>
>> No, this is different from ret, and more like iret and lret.
>>
>>> Is this code even reachable?
>>
>> Yes afaict, supported by the comment ahead of the symbol. syscall_init()
>> puts its address into MSR_CSTAR when !IA32_EMULATION.
>>
>
> What I meant was: can a program actually get itself into 32-bit mode
> to execute a 32-bit SYSCALL instruction?

Why not? It can set up a 32-bit code segment descriptor, far-branch
into it, and then execute SYSCALL. I can't see anything preventing
this in the logic involved in descriptor adjustment system calls. In
fact it looks to be at least partly the opposite - fill_ldt()
disallows creation of 64-bit code segments (oddly enough
fill_user_desc() then still copies the bit back, despite there
apparently being no way for it to get set).

> Anyway, the change itself is Acked-by: Andy Lutomirski <[email protected]>
>
> But let's please clarify the changelog:
>
> ignore_sysret contains an unsuffixed 'sysret' instruction. gas
> correctly interprets this as sysretl, but leaving it up to gas to
> guess when there is no register operand that implies a size is bad
> practice, and upstream gas is likely to warn about this in the future.
> Use 'sysretl' explicitly. This does not change the assembled output.

Fine with me, changed.

Jan

2019-12-13 22:07:24

by Andy Lutomirski

[permalink] [raw]

Subject: Re: [PATCH] x86-64/entry: add instruction suffix to SYSRET

On Fri, Dec 13, 2019 at 1:55 AM Jan Beulich <[email protected]> wrote:
>
> On 12.12.2019 22:43, Andy Lutomirski wrote:
> > On Tue, Dec 10, 2019 at 7:40 AM Jan Beulich <[email protected]> wrote:
> >>
> >> On 10.12.2019 16:29, Andy Lutomirski wrote:
> >>>> On Dec 10, 2019, at 2:48 AM, Jan Beulich <[email protected]> wrote:
> >>>>
> >>>> Omitting suffixes from instructions in AT&T mode is bad practice when
> >>>> operand size cannot be determined by the assembler from register
> >>>> operands, and is likely going to be warned about by upstream gas in the
> >>>> future. Add the missing suffix here.
> >>>>
> >>>> Signed-off-by: Jan Beulich <[email protected]>
> >>>>
> >>>> --- a/arch/x86/entry/entry_64.S
> >>>> +++ b/arch/x86/entry/entry_64.S
> >>>> @@ -1728,7 +1728,7 @@ END(nmi)
> >>>> SYM_CODE_START(ignore_sysret)
> >>>> UNWIND_HINT_EMPTY
> >>>> mov $-ENOSYS, %eax
> >>>> - sysret
> >>>> + sysretl
> >>>
> >>> Isn’t the default sysretq? sysretl looks more correct, but that suggests
> >>> that your changelog is wrong.
> >>
> >> No, this is different from ret, and more like iret and lret.
> >>
> >>> Is this code even reachable?
> >>
> >> Yes afaict, supported by the comment ahead of the symbol. syscall_init()
> >> puts its address into MSR_CSTAR when !IA32_EMULATION.
> >>
> >
> > What I meant was: can a program actually get itself into 32-bit mode
> > to execute a 32-bit SYSCALL instruction?
>
> Why not? It can set up a 32-bit code segment descriptor, far-branch
> into it, and then execute SYSCALL. I can't see anything preventing
> this in the logic involved in descriptor adjustment system calls. In
> fact it looks to be at least partly the opposite - fill_ldt()
> disallows creation of 64-bit code segments (oddly enough
> fill_user_desc() then still copies the bit back, despite there
> apparently being no way for it to get set).

Do we allow creation of 32-bit code segments on !IA32_EMULATION
kernels? I think we shouldn't, but I'm not really sure.

Anyway, this is irrelevant to the patch at hand.

--Andy

2019-12-19 02:40:18

by Andy Lutomirski

[permalink] [raw]

Subject: Re: [PATCH] x86-64/entry: add instruction suffix to SYSRET

On Mon, Dec 16, 2019 at 7:23 AM Brian Gerst <[email protected]> wrote:
>
> On Mon, Dec 16, 2019 at 5:12 AM Jan Beulich <[email protected]> wrote:
> >
> > On 13.12.2019 18:49, Andy Lutomirski wrote:
> > > On Fri, Dec 13, 2019 at 1:55 AM Jan Beulich <[email protected]> wrote:
> > >>
> > >> On 12.12.2019 22:43, Andy Lutomirski wrote:
> > >>> On Tue, Dec 10, 2019 at 7:40 AM Jan Beulich <[email protected]> wrote:
> > >>>>
> > >>>> On 10.12.2019 16:29, Andy Lutomirski wrote:
> > >>>>>> On Dec 10, 2019, at 2:48 AM, Jan Beulich <[email protected]> wrote:
> > >>>>>>
> > >>>>>> Omitting suffixes from instructions in AT&T mode is bad practice when
> > >>>>>> operand size cannot be determined by the assembler from register
> > >>>>>> operands, and is likely going to be warned about by upstream gas in the
> > >>>>>> future. Add the missing suffix here.
> > >>>>>>
> > >>>>>> Signed-off-by: Jan Beulich <[email protected]>
> > >>>>>>
> > >>>>>> --- a/arch/x86/entry/entry_64.S
> > >>>>>> +++ b/arch/x86/entry/entry_64.S
> > >>>>>> @@ -1728,7 +1728,7 @@ END(nmi)
> > >>>>>> SYM_CODE_START(ignore_sysret)
> > >>>>>> UNWIND_HINT_EMPTY
> > >>>>>> mov $-ENOSYS, %eax
> > >>>>>> - sysret
> > >>>>>> + sysretl
> > >>>>>
> > >>>>> Isn’t the default sysretq? sysretl looks more correct, but that suggests
> > >>>>> that your changelog is wrong.
> > >>>>
> > >>>> No, this is different from ret, and more like iret and lret.
> > >>>>
> > >>>>> Is this code even reachable?
> > >>>>
> > >>>> Yes afaict, supported by the comment ahead of the symbol. syscall_init()
> > >>>> puts its address into MSR_CSTAR when !IA32_EMULATION.
> > >>>>
> > >>>
> > >>> What I meant was: can a program actually get itself into 32-bit mode
> > >>> to execute a 32-bit SYSCALL instruction?
> > >>
> > >> Why not? It can set up a 32-bit code segment descriptor, far-branch
> > >> into it, and then execute SYSCALL. I can't see anything preventing
> > >> this in the logic involved in descriptor adjustment system calls. In
> > >> fact it looks to be at least partly the opposite - fill_ldt()
> > >> disallows creation of 64-bit code segments (oddly enough
> > >> fill_user_desc() then still copies the bit back, despite there
> > >> apparently being no way for it to get set).
> > >
> > > Do we allow creation of 32-bit code segments on !IA32_EMULATION
> > > kernels?
> >
> > As per above - I think so.
> >
> > > I think we shouldn't, but I'm not really sure.
> >
> > It may be a little exotic, but I can't see any reason to disallow
> > a 64-bit process to switch to compatibility mode temporarily. One
> > contrived use case could be to be able to invoke INTO or BOUND.
>
> I think it should be kept intact for future use by WINE. WINE is
> currently set up so that 32/16-bit Windows emulation needs a 32-bit
> build against 32-bit Linux libraries, using the kernel compat layer.
> With many distributions wanting to drop 32-bit support this has been a
> big sticking point. If WINE could be modified so that the core is
> always built as 64-bit with 32-bit compatibility handled entirely in
> userspace, that would remove its dependency on 32-bit Linux libraries
> and thus wouldn't require IA32_EMULATION.

I just read an article about WINE on Mac OS doing more or less this.
I'm wondering if we should wire up set_thread_area() on x86_64 for
uses like this. It even has a syscall number already -- it's just not
wired up.

Anyway, this isn't particularly high priority, IMO -- I don't think
many people set IA32_EMULATION=n. What we really ought to do is to
get rid of the special ignore_sysret path and instead go through the
normal syscall path but just do:

if (!IS_ENABLED(IA32_EMULATION))
return -ENOSYS;

or equivalent. The last thing we need is a whole special asm path
that essentially no one uses.