2014-02-27 20:40:55

by Andy Lutomirski

[permalink] [raw]
Subject: Making a universal list of syscalls?

Currently, dealing with Linux syscalls in an architecture-independent
way is a mess. Here are some issues:

1. There's no clean way to map between syscall names and numbers on
different architectures. The kernel contains a number of tables (that
work differently for different architectures). strace has some arcane
mechanism. libseccomp has another.

2. There's no clean way to map between syscall argument registers and
logical syscall arguments. Each architecture knows how to do it, as
do strace and glibc, but I suspect that *everyone* else gets it wrong.
Especially on ARM.

3. Determining which architectures have which syscalls is a mess.
Recent kernel builds love to warn me that finit_module is missing on
x86_64. This is simply not true. I have no idea why.

4. Actually issuing a nontrivial syscall is annoying. syscall(2) can
do it for the native architecture (only).

5. Decoding ucontext from SIGSYS is a mess. I have prototype code
for libseccomp that can do it, but it gets the arguments wrong due to
ABI issues. See (2).

I'd like to see a master list in the kernel that lists, for every
syscall, the name, the number for each architecture that implements it
(using the AUDIT_ARCH semantics, probably), and the signature. The
build process could parse this table to replace the current per-arch
mess.

Issues here: some syscalls have different signatures on different
architectures. Maybe we could require that a canonical syscall name
would have the same signature everywhere, but architectures could
specify alternate names. So, for things like clone (?), there could
actually be a few syscalls that all have alternate names of "clone".

More importantly, we could add a library in tools that exposes this
information to userspace. Useful operations:

- For a given (arch, nr), indicate, for each logical argument, which
physical argument slot is used or, if the argument is split into a
high and low part, which pair of slots is used.

- For a given (nr, logical args), issue the syscall for the
architecture that build the library.

- For a given (arch, nr, logical args), issue the syscall if
possible. An x86_32 build could issue x86_64 syscalls with some
effort, and an x86_64 build could easily issue 32-bit syscalls.

- For a given arch, map between name and nr, and give access to the signature.

If this happened, presumably all architectures that supported it would
have to have valid AUDIT_ARCH support. That means that someone would
have to fix ARM OABI (sigh).

Thoughts?


--Andy


2014-02-27 20:53:52

by Eric Paris

[permalink] [raw]
Subject: Re: [libseccomp-discuss] Making a universal list of syscalls?

On Thu, 2014-02-27 at 12:40 -0800, Andy Lutomirski wrote:
> Currently, dealing with Linux syscalls in an architecture-independent
> way is a mess. Here are some issues:
>
> 1. There's no clean way to map between syscall names and numbers on
> different architectures. The kernel contains a number of tables (that
> work differently for different architectures). strace has some arcane
> mechanism. libseccomp has another.

userspace audit a 3rd.

> I'd like to see a master list in the kernel that lists, for every
> syscall, the name, the number for each architecture that implements it
> (using the AUDIT_ARCH semantics, probably), and the signature. The
> build process could parse this table to replace the current per-arch
> mess.

I know for audit it would be huge if userspace didn't try to organically
grow this knowledge on their own! So +1 from me!

>
> Issues here: some syscalls have different signatures on different
> architectures. Maybe we could require that a canonical syscall name
> would have the same signature everywhere, but architectures could
> specify alternate names. So, for things like clone (?), there could
> actually be a few syscalls that all have alternate names of "clone".
>
> More importantly, we could add a library in tools that exposes this
> information to userspace. Useful operations:
>
> - For a given (arch, nr), indicate, for each logical argument, which
> physical argument slot is used or, if the argument is split into a
> high and low part, which pair of slots is used.
>
> - For a given (nr, logical args), issue the syscall for the
> architecture that build the library.
>
> - For a given (arch, nr, logical args), issue the syscall if
> possible. An x86_32 build could issue x86_64 syscalls with some
> effort, and an x86_64 build could easily issue 32-bit syscalls.
>
> - For a given arch, map between name and nr, and give access to the signature.
>
> If this happened, presumably all architectures that supported it would
> have to have valid AUDIT_ARCH support. That means that someone would
> have to fix ARM OABI (sigh).
>
> Thoughts?

2014-02-27 21:16:47

by Paul Moore

[permalink] [raw]
Subject: Re: [libseccomp-discuss] Making a universal list of syscalls?

On Thursday, February 27, 2014 12:40:32 PM Andy Lutomirski wrote:
> Currently, dealing with Linux syscalls in an architecture-independent
> way is a mess. Here are some issues:
>
> 1. There's no clean way to map between syscall names and numbers on
> different architectures. The kernel contains a number of tables (that
> work differently for different architectures). strace has some arcane
> mechanism. libseccomp has another.

This is a major pain point for libseccomp, what we have now is passable, and
it works, but I cringe each time I look at it because I worry about
maintaining it. I would be very happy if the kernel had some
header/file/whatever that could be used by userspace applications to map
syscall names/numbers for each architecture.

> 2. There's no clean way to map between syscall argument registers and
> logical syscall arguments. Each architecture knows how to do it, as
> do strace and glibc, but I suspect that *everyone* else gets it wrong.
> Especially on ARM.

I remember looking into this with libseccomp, around the ARM time frame with
Andy, and I believe I managed to reassure myself - not well, mind you - that
we were *ok* with seccomp/libseccomp. However, having a argument mapping
document/header/etc. would go a long way here.

> 3. Determining which architectures have which syscalls is a mess.
> Recent kernel builds love to warn me that finit_module is missing on
> x86_64. This is simply not true. I have no idea why.

Closely related to item #1. Also a major pain for libseccomp for the same
reasons.

> 5. Decoding ucontext from SIGSYS is a mess. I have prototype code
> for libseccomp that can do it, but it gets the arguments wrong due to
> ABI issues. See (2).

I've actually been sitting on some of Andy's libseccomp code for this for a
while now because the solution is very fiddly. Improvements here could make
life much easier for us and remove a lot of my hesitation in merging Andy's
code.

--
paul moore
security and virtualization @ redhat