2021-10-13 14:27:22

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH] tools/nolibc: x86: Remove `r8`, `r9` and `r10` from the clobber list

On Wed, Oct 13, 2021 at 04:20:55PM +0200, Borislav Petkov wrote:
> On Wed, Oct 13, 2021 at 04:07:23PM +0200, Willy Tarreau wrote:
> > Yes I agree with the "potentially" here. If it can potentially be (i.e.
> > the kernel is allowed by contract to later change the way it's currently
> > done) then we have to save them even if it means lower code efficiency.
> >
> > If, however, the kernel performs such savings on purpose because it is
> > willing to observe a stricter saving than the AMD64 ABI, we can follow
> > it but only once it's written down somewhere that it is by contract and
> > will not change.
>
> Right, and Micha noted that such a change to the document can be done.

great.

> And we're basically doing that registers restoring anyway, in POP_REGS.

That's what I based my analysis on when I wanted to verify Ammar's
finding. I would tend to think that if we're burning cycles popping
plenty of registers it's probably for a reason, maybe at least a good
one, which is that it's the only way to make sure we're not leaking
internal kernel data! This is not a concern for kernel->kernel nor
user->user calls but for user->kernel calls it definitely is one, and
I don't think we could relax that series of pop without causing leaks
anyway.

Willy


2021-10-13 16:28:41

by Michael Matz

[permalink] [raw]
Subject: Re: [PATCH] tools/nolibc: x86: Remove `r8`, `r9` and `r10` from the clobber list

Hello,

On Wed, 13 Oct 2021, Willy Tarreau wrote:

> On Wed, Oct 13, 2021 at 04:20:55PM +0200, Borislav Petkov wrote:
> > On Wed, Oct 13, 2021 at 04:07:23PM +0200, Willy Tarreau wrote:
> > > Yes I agree with the "potentially" here. If it can potentially be (i.e.
> > > the kernel is allowed by contract to later change the way it's currently
> > > done) then we have to save them even if it means lower code efficiency.
> > >
> > > If, however, the kernel performs such savings on purpose because it is
> > > willing to observe a stricter saving than the AMD64 ABI, we can follow
> > > it but only once it's written down somewhere that it is by contract and
> > > will not change.
> >
> > Right, and Micha noted that such a change to the document can be done.
>
> great.
>
> > And we're basically doing that registers restoring anyway, in POP_REGS.
>
> That's what I based my analysis on when I wanted to verify Ammar's
> finding. I would tend to think that if we're burning cycles popping
> plenty of registers it's probably for a reason, maybe at least a good
> one, which is that it's the only way to make sure we're not leaking
> internal kernel data! This is not a concern for kernel->kernel nor
> user->user calls but for user->kernel calls it definitely is one, and
> I don't think we could relax that series of pop without causing leaks
> anyway.

It might also be interesting to know that while the wording of the psABI
was indeed intended to imply that all argument registers are potentially
clobbered (like with normal calls) glibc's inline assembler to call
syscalls relies on most registers to actually be preserved:

# define REGISTERS_CLOBBERED_BY_SYSCALL "cc", "r11", "cx"
...
#define internal_syscall6(number, arg1, arg2, arg3, arg4, arg5, arg6) \
({ \
unsigned long int resultvar; \
TYPEFY (arg6, __arg6) = ARGIFY (arg6); \
TYPEFY (arg5, __arg5) = ARGIFY (arg5); \
TYPEFY (arg4, __arg4) = ARGIFY (arg4); \
TYPEFY (arg3, __arg3) = ARGIFY (arg3); \
TYPEFY (arg2, __arg2) = ARGIFY (arg2); \
TYPEFY (arg1, __arg1) = ARGIFY (arg1); \
register TYPEFY (arg6, _a6) asm ("r9") = __arg6; \
register TYPEFY (arg5, _a5) asm ("r8") = __arg5; \
register TYPEFY (arg4, _a4) asm ("r10") = __arg4; \
register TYPEFY (arg3, _a3) asm ("rdx") = __arg3; \
register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \
register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \
asm volatile ( \
"syscall\n\t" \
: "=a" (resultvar) \
: "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4), \
"r" (_a5), "r" (_a6) \
: "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \
(long int) resultvar; \
})


Note in particular the missing clobbers or outputs of any of the argument
regs.

So, even though the psABI (might have) meant something else, as glibc is
doing the above we in fact have a de-facto standard that the kernel can't
clobber any of the argument regs. The wording and the linux x86-64
syscall implementation (and use in glibc) all come from the same time in
2001, so there never was a time when the kernel was not saving/restoring
the arg registers, so it can't stop now.

In effect this means the psABI should be clarified to explicitely say the
the arg registers aren't clobbered, i.e. that the mentioned list of
clobbered regs isn't inclusive but exclusive. I will do that.

When I was discussing this with Boris earlier I hadn't yet looked at glibc
use but only gave my interpretation from memory and reading. Obviously
reality trumps anything like that :-)

In short: Ammars initial claim:

> Linux x86-64 syscall only clobbers rax, rcx and r11 (and "memory").
>
> - rax for the return value.
> - rcx to save the return address.
> - r11 to save the rflags.
>
> Other registers are preserved.

is accurate and I will clarify the psABI to make that explicit.


Ciao,
Michael.

2021-10-13 16:33:26

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH] tools/nolibc: x86: Remove `r8`, `r9` and `r10` from the clobber list

Hello Michael,

On Wed, Oct 13, 2021 at 04:24:28PM +0000, Michael Matz wrote:
(...)
> In short: Ammars initial claim:
>
> > Linux x86-64 syscall only clobbers rax, rcx and r11 (and "memory").
> >
> > - rax for the return value.
> > - rcx to save the return address.
> > - r11 to save the rflags.
> >
> > Other registers are preserved.
>
> is accurate and I will clarify the psABI to make that explicit.

Many thanks for this very detailed explanation! Ammar, I'll take your
patch.

Thanks all,
Willy

2021-10-13 16:53:20

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] tools/nolibc: x86: Remove `r8`, `r9` and `r10` from the clobber list



On Wed, Oct 13, 2021, at 9:30 AM, Willy Tarreau wrote:
> Hello Michael,
>
> On Wed, Oct 13, 2021 at 04:24:28PM +0000, Michael Matz wrote:
> (...)
>> In short: Ammars initial claim:
>>
>> > Linux x86-64 syscall only clobbers rax, rcx and r11 (and "memory").
>> >
>> > - rax for the return value.
>> > - rcx to save the return address.
>> > - r11 to save the rflags.
>> >
>> > Other registers are preserved.
>>
>> is accurate and I will clarify the psABI to make that explicit.
>
> Many thanks for this very detailed explanation! Ammar, I'll take your
> patch.

Acked-by: Andy Lutomirski <[email protected]>

>
> Thanks all,
> Willy

2021-10-19 09:08:59

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] tools/nolibc: x86: Remove `r8`, `r9` and `r10` from the clobber list

From: Michael Matz
> Sent: 13 October 2021 17:24
>
> Hello,
>
> On Wed, 13 Oct 2021, Willy Tarreau wrote:
>
> > On Wed, Oct 13, 2021 at 04:20:55PM +0200, Borislav Petkov wrote:
> > > On Wed, Oct 13, 2021 at 04:07:23PM +0200, Willy Tarreau wrote:
> > > > Yes I agree with the "potentially" here. If it can potentially be (i.e.
> > > > the kernel is allowed by contract to later change the way it's currently
> > > > done) then we have to save them even if it means lower code efficiency.
> > > >
> > > > If, however, the kernel performs such savings on purpose because it is
> > > > willing to observe a stricter saving than the AMD64 ABI, we can follow
> > > > it but only once it's written down somewhere that it is by contract and
> > > > will not change.
> > >
> > > Right, and Micha noted that such a change to the document can be done.
> >
> > great.
> >
> > > And we're basically doing that registers restoring anyway, in POP_REGS.
> >
> > That's what I based my analysis on when I wanted to verify Ammar's
> > finding. I would tend to think that if we're burning cycles popping
> > plenty of registers it's probably for a reason, maybe at least a good
> > one, which is that it's the only way to make sure we're not leaking
> > internal kernel data! This is not a concern for kernel->kernel nor
> > user->user calls but for user->kernel calls it definitely is one, and
> > I don't think we could relax that series of pop without causing leaks
> > anyway.
>
> It might also be interesting to know that while the wording of the psABI
> was indeed intended to imply that all argument registers are potentially
> clobbered (like with normal calls) glibc's inline assembler to call
> syscalls relies on most registers to actually be preserved:
>
> # define REGISTERS_CLOBBERED_BY_SYSCALL "cc", "r11", "cx"
> ...
> #define internal_syscall6(number, arg1, arg2, arg3, arg4, arg5, arg6) \
> ({ \
> unsigned long int resultvar; \
> TYPEFY (arg6, __arg6) = ARGIFY (arg6); \
> TYPEFY (arg5, __arg5) = ARGIFY (arg5); \
> TYPEFY (arg4, __arg4) = ARGIFY (arg4); \
> TYPEFY (arg3, __arg3) = ARGIFY (arg3); \
> TYPEFY (arg2, __arg2) = ARGIFY (arg2); \
> TYPEFY (arg1, __arg1) = ARGIFY (arg1); \
> register TYPEFY (arg6, _a6) asm ("r9") = __arg6; \
> register TYPEFY (arg5, _a5) asm ("r8") = __arg5; \
> register TYPEFY (arg4, _a4) asm ("r10") = __arg4; \
> register TYPEFY (arg3, _a3) asm ("rdx") = __arg3; \
> register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \
> register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \
> asm volatile ( \
> "syscall\n\t" \
> : "=a" (resultvar) \
> : "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4), \
> "r" (_a5), "r" (_a6) \
> : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \
> (long int) resultvar; \
> })
>
>
> Note in particular the missing clobbers or outputs of any of the argument
> regs.

What about all the AVX registers?
These are normally caller-saved - so are unlikely to be live in a gcc stub.
But glibc is unlikely to keep the clobber list up do date - even if they
were ever added.

While the kernel can't return 'junk' in the AVX registers, it may be
significantly cheaper to zero the registers on at least some code paths.

The same is true for the rxx registers.
Executing 'xor %rxx,%rxx' is faster than 'pop $rxx'.
Especially since the xor cam all be execute in parallel.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)