MIME-Version: 1.0
From: Andy Lutomirski <luto@amacapital.net>
Date: Thu, 27 Feb 2014 12:40:32 -0800
Message-ID: <CALCETrWbjazg_kAY1uAB+cMTdaJqtyGrTwQWNFCSzZ56B0KEkw@mail.gmail.com>
Subject: Making a universal list of syscalls?
To: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        linux-arch <linux-arch@vger.kernel.org>,
        libseccomp-discuss@lists.sourceforge.net
Content-Type: text/plain; charset=ISO-8859-1
Sender: linux-kernel-owner@vger.kernel.org

Currently, dealing with Linux syscalls in an architecture-independent
way is a mess.  Here are some issues:

 1. There's no clean way to map between syscall names and numbers on
different architectures.  The kernel contains a number of tables (that
work differently for different architectures).  strace has some arcane
mechanism.  libseccomp has another.

 2. There's no clean way to map between syscall argument registers and
logical syscall arguments.  Each architecture knows how to do it, as
do strace and glibc, but I suspect that *everyone* else gets it wrong.
 Especially on ARM.

 3. Determining which architectures have which syscalls is a mess.
Recent kernel builds love to warn me that finit_module is missing on
x86_64.  This is simply not true.  I have no idea why.

 4. Actually issuing a nontrivial syscall is annoying.  syscall(2) can
do it for the native architecture (only).

 5. Decoding ucontext from SIGSYS is a mess.  I have prototype code
for libseccomp that can do it, but it gets the arguments wrong due to
ABI issues.  See (2).

I'd like to see a master list in the kernel that lists, for every
syscall, the name, the number for each architecture that implements it
(using the AUDIT_ARCH semantics, probably), and the signature.  The
build process could parse this table to replace the current per-arch
mess.

Issues here: some syscalls have different signatures on different
architectures.  Maybe we could require that a canonical syscall name
would have the same signature everywhere, but architectures could
specify alternate names.  So, for things like clone (?), there could
actually be a few syscalls that all have alternate names of "clone".

More importantly, we could add a library in tools that exposes this
information to userspace.  Useful operations:

 - For a given (arch, nr), indicate, for each logical argument, which
physical argument slot is used or, if the argument is split into a
high and low part, which pair of slots is used.

 - For a given (nr, logical args), issue the syscall for the
architecture that build the library.

 - For a given (arch, nr, logical args), issue the syscall if
possible.  An x86_32 build could issue x86_64 syscalls with some
effort, and an x86_64 build could easily issue 32-bit syscalls.

 - For a given arch, map between name and nr, and give access to the signature.

If this happened, presumably all architectures that supported it would
have to have valid AUDIT_ARCH support.  That means that someone would
have to fix ARM OABI (sigh).

Thoughts?


--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/