2021-05-10 18:58:00

by H. Peter Anvin

[permalink] [raw]
Subject: [RFC v2 PATCH 0/6] x86/entry: cleanups and consistent syscall number handling

From: "H. Peter Anvin (Intel)" <[email protected]>

This patchset:

1. Cleans up some duplications between <entry/calling.h> and <asm/ptrace-abi.h>.

2. Swaps the arguments to do_syscall_64() for consistency *and* speed.

3. Adds the maximum number of flags to MSR_SYSCALL_MASK; the previous
is more of a minimum. The more flags that are masked, the less the
likelihood of a control leak into the kernel.

4. Consistently treat the system call number as a signed int. This is
what syscall_get_nr() already does, and therefore what all
architecture-independent code (e.g. seccomp) already expects.

5. As per the defined semantics of syscall_get_nr(), only the value -1
is defined as a non-system call, so comparing >= 0 is
incorrect. Change to != -1.

6. Call sys_ni_syscall() for system calls which are out of range
except for -1, which is used by ptrace and seccomp as a "skip
system call" marker) just as for system call numbers that
correspond to holes in the table.

7. In <entry/calling.h>, factor the PUSH_AND_CLEAR_REGS macro into
separate PUSH_REGS and CLEAR_REGS macros which can be used
separately if desired. This will be used by the FRED entry code at
a later date.

Changes from v1:

* Only -1 should be a non-system call per the cross-architectural
definition of sys_ni_syscall().
* Fix/improve patch descriptions.

---
arch/x86/entry/calling.h | 45 ++++++--------------------
arch/x86/entry/common.c | 71 ++++++++++++++++++++++++++++--------------
arch/x86/entry/entry_64.S | 4 +--
arch/x86/include/asm/syscall.h | 13 ++++----
arch/x86/kernel/cpu/common.c | 12 +++++--
arch/x86/kernel/head_64.S | 6 ++--
6 files changed, 77 insertions(+), 74 deletions(-)


2021-05-10 18:58:12

by H. Peter Anvin

[permalink] [raw]
Subject: [RFC v2 PATCH 7/7] x86/entry: use int for syscall number; handle all invalid syscall nrs

From: "H. Peter Anvin (Intel)" <[email protected]>

Redefine the system call number consistently to be "int". The value -1
is a non-system call (which can be poked in by ptrace/seccomp to
indicate that no further processing should be done and that the return
value should be the current value in regs->ax, default to -ENOSYS; any
other value which does not correspond to a valid system call
unconditionally calls sys_ni_syscall() and returns -ENOSYS just like
any system call that corresponds to a hole in the system call table.

This is the defined semantics of syscall_get_nr(), so that is what all
the architecture-independent code already expects. As documented in
<asm-generic/syscall.h> (which is simply the documentation file for
<asm/syscall.h>):

/**
* syscall_get_nr - find what system call a task is executing
* @task: task of interest, must be blocked
* @regs: task_pt_regs() of @task
*
* If @task is executing a system call or is at system call
* tracing about to attempt one, returns the system call number.
* If @task is not executing a system call, i.e. it's blocked
* inside the kernel for a fault or signal, returns -1.
*
* Note this returns int even on 64-bit machines. Only 32 bits of
* system call number can be meaningful. If the actual arch value
* is 64 bits, this truncates to 32 bits so 0xffffffff means -1.
*
* It's only valid to call this when @task is known to be blocked.
*/
int syscall_get_nr(struct task_struct *task, struct pt_regs *regs);

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
---
arch/x86/entry/common.c | 79 +++++++++++++++++++++++-----------
arch/x86/entry/entry_64.S | 2 +-
arch/x86/include/asm/syscall.h | 2 +-
3 files changed, 55 insertions(+), 28 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 00da0f5420de..bf1ccaf101d7 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -36,61 +36,89 @@
#include <asm/irq_stack.h>

#ifdef CONFIG_X86_64
-__visible noinstr void do_syscall_64(struct pt_regs *regs, unsigned long nr)
+
+static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)
+{
+ unsigned long unr = nr;
+
+ if (likely(unr < NR_syscalls)) {
+ unr = array_index_nospec(unr, NR_syscalls);
+ regs->ax = sys_call_table[unr](regs);
+ return true;
+ }
+ return false;
+}
+
+static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
+{
+ unsigned long xnr = nr;
+
+ xnr -= __X32_SYSCALL_BIT;
+
+ if (IS_ENABLED(CONFIG_X86_X32_ABI) &&
+ likely(xnr < X32_NR_syscalls)) {
+ xnr = array_index_nospec(xnr, X32_NR_syscalls);
+ regs->ax = x32_sys_call_table[xnr](regs);
+ return true;
+ }
+ return false;
+}
+
+__visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
{
add_random_kstack_offset();
nr = syscall_enter_from_user_mode(regs, nr);

instrumentation_begin();
- if (likely(nr < NR_syscalls)) {
- nr = array_index_nospec(nr, NR_syscalls);
- regs->ax = sys_call_table[nr](regs);
-#ifdef CONFIG_X86_X32_ABI
- } else if (likely((nr & __X32_SYSCALL_BIT) &&
- (nr & ~__X32_SYSCALL_BIT) < X32_NR_syscalls)) {
- nr = array_index_nospec(nr & ~__X32_SYSCALL_BIT,
- X32_NR_syscalls);
- regs->ax = x32_sys_call_table[nr](regs);
-#endif
+
+ if (!do_syscall_x64(regs, nr) &&
+ !do_syscall_x32(regs, nr) &&
+ nr != -1) {
+ /* Invalid system call, but still a system call? */
+ regs->ax = __x64_sys_ni_syscall(regs);
}
+
instrumentation_end();
syscall_exit_to_user_mode(regs);
}
#endif

#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
-static __always_inline unsigned int syscall_32_enter(struct pt_regs *regs)
+static __always_inline int syscall_32_enter(struct pt_regs *regs)
{
if (IS_ENABLED(CONFIG_IA32_EMULATION))
current_thread_info()->status |= TS_COMPAT;

- return (unsigned int)regs->orig_ax;
+ return (int)regs->orig_ax;
}

/*
* Invoke a 32-bit syscall. Called with IRQs on in CONTEXT_KERNEL.
*/
-static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs,
- unsigned int nr)
+static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs, int nr)
{
- if (likely(nr < IA32_NR_syscalls)) {
- nr = array_index_nospec(nr, IA32_NR_syscalls);
- regs->ax = ia32_sys_call_table[nr](regs);
+ unsigned long unr = nr;
+
+ if (likely(unr < IA32_NR_syscalls)) {
+ unr = array_index_nospec(unr, IA32_NR_syscalls);
+ regs->ax = ia32_sys_call_table[unr](regs);
+ } else if (nr != -1) {
+ regs->ax = __ia32_sys_ni_syscall(regs);
}
}

/* Handles int $0x80 */
__visible noinstr void do_int80_syscall_32(struct pt_regs *regs)
{
- unsigned int nr = syscall_32_enter(regs);
+ int nr = syscall_32_enter(regs);

add_random_kstack_offset();
/*
- * Subtlety here: if ptrace pokes something larger than 2^32-1 into
- * orig_ax, the unsigned int return value truncates it. This may
- * or may not be necessary, but it matches the old asm behavior.
+ * Subtlety here: if ptrace pokes something larger than 2^31-1 into
+ * orig_ax, the int return value truncates it. This matches
+ * the semantics of syscall_get_nr().
*/
- nr = (unsigned int)syscall_enter_from_user_mode(regs, nr);
+ nr = syscall_enter_from_user_mode(regs, nr);
instrumentation_begin();

do_syscall_32_irqs_on(regs, nr);
@@ -101,7 +129,7 @@ __visible noinstr void do_int80_syscall_32(struct pt_regs *regs)

static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
{
- unsigned int nr = syscall_32_enter(regs);
+ int nr = syscall_32_enter(regs);
int res;

add_random_kstack_offset();
@@ -136,8 +164,7 @@ static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
return false;
}

- /* The case truncates any ptrace induced syscall nr > 2^32 -1 */
- nr = (unsigned int)syscall_enter_from_user_mode_work(regs, nr);
+ nr = syscall_enter_from_user_mode_work(regs, nr);

/* Now this is just like a normal syscall. */
do_syscall_32_irqs_on(regs, nr);
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 1d9db15fdc69..85f04ea0e368 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -108,7 +108,7 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)

/* IRQs are off. */
movq %rsp, %rdi
- movq %rax, %rsi
+ movslq %eax, %rsi
call do_syscall_64 /* returns with IRQs disabled */

/*
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index f6593cafdbd9..f7e2d82d24fb 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -159,7 +159,7 @@ static inline int syscall_get_arch(struct task_struct *task)
? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64;
}

-void do_syscall_64(struct pt_regs *regs, unsigned long nr);
+void do_syscall_64(struct pt_regs *regs, int nr);
void do_int80_syscall_32(struct pt_regs *regs);
long do_fast_syscall_32(struct pt_regs *regs);

--
2.31.1

2021-05-12 08:53:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC v2 PATCH 7/7] x86/entry: use int for syscall number; handle all invalid syscall nrs


* H. Peter Anvin <[email protected]> wrote:

> From: "H. Peter Anvin (Intel)" <[email protected]>
>
> Redefine the system call number consistently to be "int". The value -1
> is a non-system call (which can be poked in by ptrace/seccomp to
> indicate that no further processing should be done and that the return
> value should be the current value in regs->ax, default to -ENOSYS; any
> other value which does not correspond to a valid system call
> unconditionally calls sys_ni_syscall() and returns -ENOSYS just like
> any system call that corresponds to a hole in the system call table.
>
> This is the defined semantics of syscall_get_nr(), so that is what all
> the architecture-independent code already expects. As documented in
> <asm-generic/syscall.h> (which is simply the documentation file for
> <asm/syscall.h>):
>
> /**
> * syscall_get_nr - find what system call a task is executing
> * @task: task of interest, must be blocked
> * @regs: task_pt_regs() of @task
> *
> * If @task is executing a system call or is at system call
> * tracing about to attempt one, returns the system call number.
> * If @task is not executing a system call, i.e. it's blocked
> * inside the kernel for a fault or signal, returns -1.
> *
> * Note this returns int even on 64-bit machines. Only 32 bits of
> * system call number can be meaningful. If the actual arch value
> * is 64 bits, this truncates to 32 bits so 0xffffffff means -1.
> *
> * It's only valid to call this when @task is known to be blocked.
> */
> int syscall_get_nr(struct task_struct *task, struct pt_regs *regs);

I've applied patches 1-6, thanks Peter!

Wrt. patch #7 - the changelog is hedging things a bit and the changes are
non-trivial. Does this patch (intend to) change any actual observable
behavior in the system call interface, and if yes, in which areas?

Or is this a pure cleanup with no observable changes expected?

Thanks,

Ingo

2021-05-12 12:09:53

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC v2 PATCH 7/7] x86/entry: use int for syscall number; handle all invalid syscall nrs

On Mon, May 10 2021 at 11:53, H. Peter Anvin wrote:
> Redefine the system call number consistently to be "int". The value -1
> is a non-system call (which can be poked in by ptrace/seccomp to
> indicate that no further processing should be done and that the return
> value should be the current value in regs->ax, default to -ENOSYS; any
> other value which does not correspond to a valid system call
> unconditionally calls sys_ni_syscall() and returns -ENOSYS just like
> any system call that corresponds to a hole in the system call table.

That sentence spawns 6 lines, has a unmatched ( inside and is confusing
at best. I know what you want to say, but heck...

> This is the defined semantics of syscall_get_nr(), so that is what all
> the architecture-independent code already expects. As documented in
> <asm-generic/syscall.h> (which is simply the documentation file for
> <asm/syscall.h>):
>
> /**
> * syscall_get_nr - find what system call a task is executing
> * @task: task of interest, must be blocked
> * @regs: task_pt_regs() of @task
> *
> * If @task is executing a system call or is at system call
> * tracing about to attempt one, returns the system call number.
> * If @task is not executing a system call, i.e. it's blocked
> * inside the kernel for a fault or signal, returns -1.
> *
> * Note this returns int even on 64-bit machines. Only 32 bits of
> * system call number can be meaningful. If the actual arch value
> * is 64 bits, this truncates to 32 bits so 0xffffffff means -1.
> *
> * It's only valid to call this when @task is known to be blocked.
> */
> int syscall_get_nr(struct task_struct *task, struct pt_regs *regs);

No need for copying this comment. Something like this is sufficient:

The syscall number has to be an 'int' as defined by syscall_get_nr().

Aside of that the subject says:

x86/entry: use int for syscall number; handle all invalid syscall nrs

which suggests that something is not handled correctly today. But the
changelog does not say anything about it.

>
> #ifdef CONFIG_X86_64
> -__visible noinstr void do_syscall_64(struct pt_regs *regs, unsigned long nr)
> +
> +static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)
> +{
> + unsigned long unr = nr;

What's the point of this cast? Turn -1 into something larger than
NR_SYSCALLS, right? Comments exist for a reason.

Also why unsigned long? unsigned int is sufficient

> + if (likely(unr < NR_syscalls)) {
> + unr = array_index_nospec(unr, NR_syscalls);
> + regs->ax = sys_call_table[unr](regs);
> + return true;
> + }
> + return false;
> +}

Something like this:

static __always_inline bool do_syscall_x64(struct pt_regs *regs, unsigned int nr)
{
/* nr is unsigned so it catches
if (likely(nr < NR_syscalls)) {
nr = array_index_nospec(nr, NR_syscalls);
regs->ax = sys_call_table[nr](regs);
return true;
}
return false;
}

static __always_inline bool do_syscall_x32(struct pt_regs *regs, unsigned int nr)
{
/*
* If nr < __X32_SYSCALL_BIT then the result will be > __X32_SYSCALL_BIT
* due to unsigned math.
*/
nr -= __X32_SYSCALL_BIT;

if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(nr < X32_NR_syscalls)) {
nr = array_index_nospec(nr, X32_NR_syscalls);
regs->ax = x32_sys_call_table[nr](regs);
return true;
}
return false;
}

> index 1d9db15fdc69..85f04ea0e368 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -108,7 +108,7 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
>
> /* IRQs are off. */
> movq %rsp, %rdi
> - movq %rax, %rsi
> + movslq %eax, %rsi

This is wrong.

syscall(long number,...);

So the above turns syscall(UINT_MAX + N, ...) into syscall(N, ...).

Thanks,

tglx


2021-05-12 19:45:58

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC v2 PATCH 7/7] x86/entry: use int for syscall number; handle all invalid syscall nrs

On 5/12/21 1:51 AM, Ingo Molnar wrote:
>
> I've applied patches 1-6, thanks Peter!
>
> Wrt. patch #7 - the changelog is hedging things a bit and the changes are
> non-trivial. Does this patch (intend to) change any actual observable
> behavior in the system call interface, and if yes, in which areas?
>
> Or is this a pure cleanup with no observable changes expected?
>

I'll clean up the comments a bit [including per tglx' review.] I'm
writing this email in part to organize my own thoughts in how to better
explain the motivation for the patch, and the extent of visible differences.

Right now, *some* code will treat e.g. 0x0000000100000001 as a system
call and some will not. Some of the code, notably in ptrace, will treat
0x000000018000000 as a system call and some will not. Finally, right
now, e.g. 335 for x86-64 will force the exit code to be set to -ENOSYS
even if poked by ptrace, but 548 will not, because there is an
observable difference between an out of range system call and a system
call number that falls outside the range of the table.

The use of a non-system-call number in a system call can come in in a
few ways:

1. Via ptrace;
2. From seccomp;
3. By explicitly passing %eax = -1 to a system call.

#3 isn't really a problem *unless* it for some reason confuses ptrace or
seccomp users -- we could do an early-out for it.

For ptrace and seccomp, we enter with the return value (regs->ax) set to
-ENOSYS, regardless of if the system call number is valid or not.
ptrace/seccomp has the option of independently emulate a system call,
then set regs->orig_ax to -1 and provide any chosen return value in
regs->ax. In that case, we must *not* update regs->ax to -ENOSYS before
returning.

The arch-independent code all assumes that a system call is "int" that
the value -1 specifically and not just any negative value is used for a
non-system call. This is the case on x86 as well when arch-independent
code is involved. The arch-independent API is defined/documented (but
not *implemented*!) in <asm-generic/syscall.h>, and what this patch is
intended to do is to make the x86-specific code follow.

As everyone either does or should know, treating the same data in
different ways in different flows is a security hole just waiting to
happen.

Most of these are relatively recently introduced bugs in x86-64; the
original assembly code version zero-extended %rax, compared it against
the length of the system call table, and would unconditionally return
-ENOSYS otherwise. Then *at least* on two separate occasions someone
"optimized" it by removing "movl %eax, %eax" not understanding that this
is not a noop in x86-64 but a zero-extend (perhaps gas ought to be able
to allow movzlq as an alias...) introducing a critical security bug
which was one of the motivators for the SMAP CPU feature.

On x86-64, the ABI is that the callee is responsible for extending
parameters, so only examining the lower 32 bits is fully consistent with
any "int" argument to any system call, e.g. regs->di for write(2).

Andy L. and I had a fairly long discussion about this, and some of this
was updated after the first RFC, but I fully agree I'm not capturing it
all that well. I hope the above points clear things up better and I'll
rewrite this into a better patch description.

-hpa

2021-05-12 20:14:23

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC v2 PATCH 7/7] x86/entry: use int for syscall number; handle all invalid syscall nrs

On 5/12/21 5:09 AM, Thomas Gleixner wrote:
>
>> index 1d9db15fdc69..85f04ea0e368 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -108,7 +108,7 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
>>
>> /* IRQs are off. */
>> movq %rsp, %rdi
>> - movq %rax, %rsi
>> + movslq %eax, %rsi
>
> This is wrong.
>
> syscall(long number,...);
>
> So the above turns syscall(UINT_MAX + N, ...) into syscall(N, ...).
>

That is intentional, as (again) system calls are int. As stated in my
reply to Ingo, I'll clean the various descriptions and try to capture
the discussion better.

-hpa

2021-05-12 20:23:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC v2 PATCH 7/7] x86/entry: use int for syscall number; handle all invalid syscall nrs

On Wed, May 12 2021 at 11:21, H. Peter Anvin wrote:
> On 5/12/21 5:09 AM, Thomas Gleixner wrote:
>>
>>> index 1d9db15fdc69..85f04ea0e368 100644
>>> --- a/arch/x86/entry/entry_64.S
>>> +++ b/arch/x86/entry/entry_64.S
>>> @@ -108,7 +108,7 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
>>>
>>> /* IRQs are off. */
>>> movq %rsp, %rdi
>>> - movq %rax, %rsi
>>> + movslq %eax, %rsi
>>
>> This is wrong.
>>
>> syscall(long number,...);
>>
>> So the above turns syscall(UINT_MAX + N, ...) into syscall(N, ...).
>>
>
> That is intentional, as (again) system calls are int.

They are 'int' kernel internally, but _NOT_ at the user space visible
side. Again: man syscall

syscall(long number,...);

So that results in a user ABI change.

> As stated in my reply to Ingo, I'll clean the various descriptions and
> try to capture the discussion better.

If we agree to go there then this wants to be a seperate commit which
does nothing else than changing this behaviour.

Thanks,

tglx

2021-05-12 22:24:46

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC v2 PATCH 7/7] x86/entry: use int for syscall number; handle all invalid syscall nrs

On 5/12/21 11:34 AM, Thomas Gleixner wrote:
>>
>> That is intentional, as (again) system calls are int.
>
> They are 'int' kernel internally, but _NOT_ at the user space visible
> side. Again: man syscall
>
> syscall(long number,...);
>
> So that results in a user ABI change.
>
>> As stated in my reply to Ingo, I'll clean the various descriptions and
>> try to capture the discussion better.
>
> If we agree to go there then this wants to be a seperate commit which
> does nothing else than changing this behaviour.
>

Good idea.

As far as this being a user ABI change, this is actually a revert to the
original x86-64 ABI; see my message to Ingo.

-hpa

2021-05-12 22:36:53

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC v2 PATCH 7/7] x86/entry: use int for syscall number; handle all invalid syscall nrs

On Wed, May 12 2021 at 15:09, H. Peter Anvin wrote:
> On 5/12/21 11:34 AM, Thomas Gleixner wrote:
>>>
>>> That is intentional, as (again) system calls are int.
>>
>> They are 'int' kernel internally, but _NOT_ at the user space visible
>> side. Again: man syscall
>>
>> syscall(long number,...);
>>
>> So that results in a user ABI change.
>>
>>> As stated in my reply to Ingo, I'll clean the various descriptions and
>>> try to capture the discussion better.
>>
>> If we agree to go there then this wants to be a seperate commit which
>> does nothing else than changing this behaviour.
>>
>
> Good idea.
>
> As far as this being a user ABI change, this is actually a revert to the
> original x86-64 ABI; see my message to Ingo.

I'm not against that change, but it has to be well justified and the
reasoning wants to be in the changelog. You know the drill :)

Thanks,

tglx

2021-05-12 22:39:45

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC v2 PATCH 7/7] x86/entry: use int for syscall number; handle all invalid syscall nrs

Yes, indeed. I hope my reply to Ingo clarifies it as I'm going to try to wordsmith that into a better piece of text.

On May 12, 2021 3:22:05 PM PDT, Thomas Gleixner <[email protected]> wrote:
>On Wed, May 12 2021 at 15:09, H. Peter Anvin wrote:
>> On 5/12/21 11:34 AM, Thomas Gleixner wrote:
>>>>
>>>> That is intentional, as (again) system calls are int.
>>>
>>> They are 'int' kernel internally, but _NOT_ at the user space
>visible
>>> side. Again: man syscall
>>>
>>> syscall(long number,...);
>>>
>>> So that results in a user ABI change.
>>>
>>>> As stated in my reply to Ingo, I'll clean the various descriptions
>and
>>>> try to capture the discussion better.
>>>
>>> If we agree to go there then this wants to be a seperate commit
>which
>>> does nothing else than changing this behaviour.
>>>
>>
>> Good idea.
>>
>> As far as this being a user ABI change, this is actually a revert to
>the
>> original x86-64 ABI; see my message to Ingo.
>
>I'm not against that change, but it has to be well justified and the
>reasoning wants to be in the changelog. You know the drill :)
>
>Thanks,
>
> tglx

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2021-05-14 02:27:59

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC v2 PATCH 7/7] x86/entry: use int for syscall number; handle all invalid syscall nrs

On 5/12/21 3:22 PM, Thomas Gleixner wrote:
>>
>> As far as this being a user ABI change, this is actually a revert to the
>> original x86-64 ABI; see my message to Ingo.
>
> I'm not against that change, but it has to be well justified and the
> reasoning wants to be in the changelog. You know the drill :)
>

FYI:

So in the process of breaking up and better document this patch, I have
looked at the syscall_numbering_64 (and have rewritten it to be more
complete.)

I found that running it under strace fails, as strace (possibly ptrace,
possibly the strace binary) causes %rax = 2^32 to be clobbered to zero
already...

More motivation, I guess.

-hpa


2021-05-14 13:03:44

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC v2 PATCH 7/7] x86/entry: use int for syscall number; handle all invalid syscall nrs

On 5/13/21 5:38 PM, H. Peter Anvin wrote:
> On 5/12/21 3:22 PM, Thomas Gleixner wrote:
>>>
>>> As far as this being a user ABI change, this is actually a revert to the
>>> original x86-64 ABI; see my message to Ingo.
>>
>> I'm not against that change, but it has to be well justified and the
>> reasoning wants to be in the changelog. You know the drill :)
>>
>
> FYI:
>
> So in the process of breaking up and better document this patch, I have
> looked at the syscall_numbering_64 (and have rewritten it to be more
> complete.)
>
> I found that running it under strace fails, as strace (possibly ptrace,
> possibly the strace binary) causes %rax = 2^32 to be clobbered to zero
> already...
>
> More motivation, I guess.
>

Indeed.

I would love to go back in time and switch to long, but there are plenty
of things that use int now. I suppose we could try to make it long for
real, but seccomp has u32 baked into its ABI.


2021-05-14 13:06:21

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC v2 PATCH 7/7] x86/entry: use int for syscall number; handle all invalid syscall nrs

Yeah. Also, x32 long is 32 bits...

On May 13, 2021 8:18:37 PM PDT, Andy Lutomirski <[email protected]> wrote:
>On 5/13/21 5:38 PM, H. Peter Anvin wrote:
>> On 5/12/21 3:22 PM, Thomas Gleixner wrote:
>>>>
>>>> As far as this being a user ABI change, this is actually a revert
>to the
>>>> original x86-64 ABI; see my message to Ingo.
>>>
>>> I'm not against that change, but it has to be well justified and the
>>> reasoning wants to be in the changelog. You know the drill :)
>>>
>>
>> FYI:
>>
>> So in the process of breaking up and better document this patch, I
>have
>> looked at the syscall_numbering_64 (and have rewritten it to be more
>> complete.)
>>
>> I found that running it under strace fails, as strace (possibly
>ptrace,
>> possibly the strace binary) causes %rax = 2^32 to be clobbered to
>zero
>> already...
>>
>> More motivation, I guess.
>>
>
>Indeed.
>
>I would love to go back in time and switch to long, but there are
>plenty
>of things that use int now. I suppose we could try to make it long for
>real, but seccomp has u32 baked into its ABI.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.