2022-04-20 22:55:02

by Jann Horn

[permalink] [raw]
Subject: Re: Explicitly defining the userspace API

On Wed, Apr 20, 2022 at 6:30 PM Spencer Baugh <[email protected]> wrote:
> Linux guarantees the stability of its userspace API, but the API
> itself is only informally described, primarily with English prose. I
> want to add an explicit, authoritative machine-readable definition of
> the Linux userspace API.
>
> As background, in a conventional libc like glibc, read(2) calls the
> Linux system call read, passing arguments in an architecture-specific
> way according to the specific details of read.
>
> The details of these syscalls are at best documented in manpages, and
> often defined only by the implementation. Anyone else who wants to
> work with a syscall, in any way, needs to duplicate all those details.
>
> So the most basic definition of the API would just represent the
> information already present in SYSCALL_DEFINE macros: the C types of
> arguments and return values.

FWIW, I believe ftrace already gets that basic information from the
SYSCALL_DEFINE macros via struct syscall_metadata, and exports it to
root-privileged userspace (although I think it won't actually tell you
what the syscall number is that way):

# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_epoll_wait/format
name: sys_enter_epoll_wait
ID: 902
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;

field:int __syscall_nr; offset:8; size:4; signed:1;
field:int epfd; offset:16; size:8; signed:0;
field:struct epoll_event * events; offset:24; size:8; signed:0;
field:int maxevents; offset:32; size:8; signed:0;
field:int timeout; offset:40; size:8; signed:0;

print fmt: "epfd: 0x%08lx, events: 0x%08lx, maxevents: 0x%08lx,
timeout: 0x%08lx", ((unsigned long)(REC->epfd)), ((unsigned
long)(REC->events)), ((unsigned long)(REC->maxevents)), ((unsigned
long)(REC->timeout))

You could probably also get that data from DWARF somehow.


2022-04-22 17:12:06

by Arnd Bergmann

[permalink] [raw]
Subject: Re: Explicitly defining the userspace API

On Wed, Apr 20, 2022 at 7:18 PM Jann Horn <[email protected]> wrote:
>
> On Wed, Apr 20, 2022 at 6:30 PM Spencer Baugh <[email protected]> wrote:
> > Linux guarantees the stability of its userspace API, but the API
> > itself is only informally described, primarily with English prose. I
> > want to add an explicit, authoritative machine-readable definition of
> > the Linux userspace API.
> >
> > As background, in a conventional libc like glibc, read(2) calls the
> > Linux system call read, passing arguments in an architecture-specific
> > way according to the specific details of read.
> >
> > The details of these syscalls are at best documented in manpages, and
> > often defined only by the implementation. Anyone else who wants to
> > work with a syscall, in any way, needs to duplicate all those details.
> >
> > So the most basic definition of the API would just represent the
> > information already present in SYSCALL_DEFINE macros: the C types of
> > arguments and return values.
>
> FWIW, I believe ftrace already gets that basic information from the
> SYSCALL_DEFINE macros via struct syscall_metadata, and exports it to
> root-privileged userspace (although I think it won't actually tell you
> what the syscall number is that way):

One possible way I have considered in the past is to change the
SYSCALL_DEFINE() macros so they live in include/linux/syscalls.h,
where they expand to the wrappers for argument sanitizing (clearing
the upper bits etc) and end up calling normal functions.

When combined with the information in the syscall.tbl, this can help
provide a machine-readable list of implemented system calls and at the
same time ensure that the prototypes match what the actual functions
have.

The main missing bit for this is to convert asm-generic/unistd.h to
the syscall.tbl format, and to ensure that there is a unique mapping
between sys_*() function names and prototypes. The latter bit is
/almost/ there and should be easy to get right by renaming a couple
of nonstandard syscall entry points.

Arnd