2015-06-11 11:48:34

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH] x86/asm/entry/32: Slightly better handling of syscall errors in auditing

"setbe %al" insn has a register merge stall: it needs to combine
previous %eax value with new value for the lowest byte.
Subsequent "movzbl %al,%edi" in turn depends on its completion.

This patch replaces "setbe %al + movzbl %al,%edi" pair of insns
with "xor %edi,%edi" before the comparison, and conditional "inc %edi".

This results in the same value of %edi as produced by old code,
but first insn has no dependencies, and we end up with having
only one insn with deps which executes only if %eax contains error
return, and both insns are shorter: 2 bytes each versus 3 bytes each.

(The old code was inherited from 32-bit code, where it allowed to avoid
a conditional jump. Here we have to use a jump anyway).

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Alexei Starovoitov <[email protected]>
CC: Will Drewry <[email protected]>
CC: Kees Cook <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/x86/entry/entry_64_compat.S | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index bb187a6..96f33a4 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -213,12 +213,13 @@ sysexit_from_sys_call:
jnz ia32_ret_from_sys_call
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
+ xor %edi, %edi
movl %eax, %esi /* second arg, syscall return value */
cmpl $-MAX_ERRNO, %eax /* is it an error ? */
jbe 1f
movslq %eax, %rsi /* if error sign extend to 64 bits */
-1: setbe %al /* 1 if error, 0 if not */
- movzbl %al, %edi /* zero-extend that into %edi */
+ inc %edi
+1: /* edi: 1 if error, 0 if not */
call __audit_syscall_exit
movq RAX(%rsp), %rax /* reload syscall return value */
movl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %edi
--
1.8.1.4


2015-06-12 23:24:56

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/asm/entry/32: Slightly better handling of syscall errors in auditing

On Thu, Jun 11, 2015 at 4:47 AM, Denys Vlasenko <[email protected]> wrote:
> "setbe %al" insn has a register merge stall: it needs to combine
> previous %eax value with new value for the lowest byte.
> Subsequent "movzbl %al,%edi" in turn depends on its completion.
>
> This patch replaces "setbe %al + movzbl %al,%edi" pair of insns
> with "xor %edi,%edi" before the comparison, and conditional "inc %edi".
>
> This results in the same value of %edi as produced by old code,
> but first insn has no dependencies, and we end up with having
> only one insn with deps which executes only if %eax contains error
> return, and both insns are shorter: 2 bytes each versus 3 bytes each.
>
> (The old code was inherited from 32-bit code, where it allowed to avoid
> a conditional jump. Here we have to use a jump anyway).
>
> Signed-off-by: Denys Vlasenko <[email protected]>
> CC: Linus Torvalds <[email protected]>
> CC: Steven Rostedt <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Borislav Petkov <[email protected]>
> CC: "H. Peter Anvin" <[email protected]>
> CC: Andy Lutomirski <[email protected]>
> CC: Oleg Nesterov <[email protected]>
> CC: Frederic Weisbecker <[email protected]>
> CC: Alexei Starovoitov <[email protected]>
> CC: Will Drewry <[email protected]>
> CC: Kees Cook <[email protected]>
> CC: [email protected]
> CC: [email protected]
> ---
> arch/x86/entry/entry_64_compat.S | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
> index bb187a6..96f33a4 100644
> --- a/arch/x86/entry/entry_64_compat.S
> +++ b/arch/x86/entry/entry_64_compat.S
> @@ -213,12 +213,13 @@ sysexit_from_sys_call:
> jnz ia32_ret_from_sys_call
> TRACE_IRQS_ON
> ENABLE_INTERRUPTS(CLBR_NONE)
> + xor %edi, %edi
> movl %eax, %esi /* second arg, syscall return value */
> cmpl $-MAX_ERRNO, %eax /* is it an error ? */
> jbe 1f

We go here if !be, which is allegedly the error case, which confuses
me, because setbe will set al (and hence edi) if be, which is also
claimed to be the error case. Ignoring the comments for now...

> movslq %eax, %rsi /* if error sign extend to 64 bits */
> -1: setbe %al /* 1 if error, 0 if not */
> - movzbl %al, %edi /* zero-extend that into %edi */

Old code: edi == 1 if be and edi == 0 if !be.

> + inc %edi

New code: edi == 1 if !be and edi == 1 if be.

So I think that both the comment and the new code are wrong.

Am I just confused?

--Andy

> +1: /* edi: 1 if error, 0 if not */
> call __audit_syscall_exit
> movq RAX(%rsp), %rax /* reload syscall return value */
> movl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %edi
> --
> 1.8.1.4
>



--
Andy Lutomirski
AMA Capital Management, LLC

2015-06-13 04:16:48

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86/asm/entry/32: Slightly better handling of syscall errors in auditing

I think you misunderstand partial register stalls. They happen (on some microarchitectures) when you write part of a register and then use the whole register.

As you say, we do need the branch anyway, which is a good reason to do it, but the motivation is wrong.

Sent from my tablet, pardon any formatting problems.

> On Jun 11, 2015, at 04:47, Denys Vlasenko <[email protected]> wrote:
>
> "setbe %al" insn has a register merge stall: it needs to combine
> previous %eax value with new value for the lowest byte.
> Subsequent "movzbl %al,%edi" in turn depends on its completion.
>
> This patch replaces "setbe %al + movzbl %al,%edi" pair of insns
> with "xor %edi,%edi" before the comparison, and conditional "inc %edi".
>
> This results in the same value of %edi as produced by old code,
> but first insn has no dependencies, and we end up with having
> only one insn with deps which executes only if %eax contains error
> return, and both insns are shorter: 2 bytes each versus 3 bytes each.
>
> (The old code was inherited from 32-bit code, where it allowed to avoid
> a conditional jump. Here we have to use a jump anyway).
>
> Signed-off-by: Denys Vlasenko <[email protected]>
> CC: Linus Torvalds <[email protected]>
> CC: Steven Rostedt <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Borislav Petkov <[email protected]>
> CC: "H. Peter Anvin" <[email protected]>
> CC: Andy Lutomirski <[email protected]>
> CC: Oleg Nesterov <[email protected]>
> CC: Frederic Weisbecker <[email protected]>
> CC: Alexei Starovoitov <[email protected]>
> CC: Will Drewry <[email protected]>
> CC: Kees Cook <[email protected]>
> CC: [email protected]
> CC: [email protected]
> ---
> arch/x86/entry/entry_64_compat.S | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
> index bb187a6..96f33a4 100644
> --- a/arch/x86/entry/entry_64_compat.S
> +++ b/arch/x86/entry/entry_64_compat.S
> @@ -213,12 +213,13 @@ sysexit_from_sys_call:
> jnz ia32_ret_from_sys_call
> TRACE_IRQS_ON
> ENABLE_INTERRUPTS(CLBR_NONE)
> + xor %edi, %edi
> movl %eax, %esi /* second arg, syscall return value */
> cmpl $-MAX_ERRNO, %eax /* is it an error ? */
> jbe 1f
> movslq %eax, %rsi /* if error sign extend to 64 bits */
> -1: setbe %al /* 1 if error, 0 if not */
> - movzbl %al, %edi /* zero-extend that into %edi */
> + inc %edi
> +1: /* edi: 1 if error, 0 if not */
> call __audit_syscall_exit
> movq RAX(%rsp), %rax /* reload syscall return value */
> movl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %edi
> --
> 1.8.1.4
>

2015-06-13 06:30:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/asm/entry/32: Slightly better handling of syscall errors in auditing


* H. Peter Anvin <[email protected]> wrote:

> I think you misunderstand partial register stalls. They happen (on some
> microarchitectures) when you write part of a register and then use the whole
> register.

Yes, there's no partial register stall in this or later code handling these
values.

> > "setbe %al" insn has a register merge stall: it needs to combine previous %eax
> > value with new value for the lowest byte. Subsequent "movzbl %al,%edi" in turn
> > depends on its completion.
> >
> > This patch replaces "setbe %al + movzbl %al,%edi" pair of insns with "xor
> > %edi,%edi" before the comparison, and conditional "inc %edi".

So here's the code in wider context:

> cmpl $-MAX_ERRNO, %eax /* is it an error ? */
> jbe 1f
> movslq %eax, %rsi /* if error sign extend to 64 bits */
> 1: setbe %al /* 1 if error, 0 if not */
> movzbl %al, %edi /* zero-extend that into %edi */

What happens here is that at the point the SETBE executes it needs to know the
previous 32-bit value of EAX. But the previous JBE needs to know it already (it
needs the CF and ZF result of the CMPL comparison), so there's no real additional
dependency.

(The MOVSLQ of EAX will likewise already have the full value of EAX, because the
already JBE needs it.)

Furthermore, the following SETBE sets an entirely new value for the 8-bit AL. The
'entirely new value' will be handled by modern uarchs with register renaming (and
marking that it's a rename for the low byte of EAX), giving the new value a
separate, independent path to compute and use - and that renamed register value
will be moved into EDI (zero-extended).

The CPU might eventually have to merge the previous value of EAX with the new
value for AL, but there's no dependency on it in this piece of code. If there was
a dependency on the full value then _that_ would create a partial register stall.

And as it happens, there's no such subsequent dependency, because we call a C
function right away:

call __audit_syscall_exit

and RAX is a freely available register used as the return code. It's being
overwritten early in the __audit_syscall_exit() function's execution by zeroing:

28d4: 19 c0 sbb %eax,%eax

which will fully overwrite the previous partial value without extra dependencies.

So the real motivation of the patch is to simplify the setting of EDI to 0 or 1 by
using a branch we already execute.

Thanks,

Ingo