2022-04-22 06:35:30

by Spencer Baugh

[permalink] [raw]
Subject: Explicitly defining the userspace API


Linux guarantees the stability of its userspace API, but the API
itself is only informally described, primarily with English prose. I
want to add an explicit, authoritative machine-readable definition of
the Linux userspace API.

As background, in a conventional libc like glibc, read(2) calls the
Linux system call read, passing arguments in an architecture-specific
way according to the specific details of read.

The details of these syscalls are at best documented in manpages, and
often defined only by the implementation. Anyone else who wants to
work with a syscall, in any way, needs to duplicate all those details.

So the most basic definition of the API would just represent the
information already present in SYSCALL_DEFINE macros: the C types of
arguments and return values. More usefully, it would describe the
formats of those arguments and return values: that the first argument
to read is a file descriptor rather than an arbitrary integer, and
what flags are valid in the flags argument of openat, and that open
returns a file descriptor. A step beyond that would be describing, in
some limited way, the effects of syscalls; for example, that read
writes into the passed buffer the number of bytes that it returned.

Even a basic machine-readable definition of the Linux userspace API
would have numerous benefits:

* Debugging tools which need to understand the format of syscalls and
their arguments in great detail, such as strace, are currently
primarily hand-written with great duplication of effort. Even a
basic description of syscalls would allow much of this code to be
generated instead.

* It often takes a long time for newly-added syscalls to be usable in
userspace. With an explicit definition of the Linux userspace API,
it would be easy to automatically generate functions for new
syscalls, which could be deployed quickly either as part of libc or
in a separate syscall library.

* Implementers of new languages currently almost always make syscalls
by going through libc. Supporting interoperability with C in this
way is a major burden, and the resulting interfaces are typically
highly unidiomatic for the new language. With a explicit definition
of the Linux API, it would be much easier for new languages to make
syscalls directly (rather than through libc) by automatically
generating syscall functions which are idiomatic to the new language;
for example, functions which preserve memory-safety and type-safety
in Rust.

* Reimplementers of the Linux API, such as Linuxulator, WSL1, and
gVisor, would be able to generate stubs for the interfaces they need
to implement automatically, reducing duplicated code and making them
conform better to the Linux API.

* Changes to Linux behavior that require a change in the API
definition would deserve greater scrutiny by maintainers, since such
a change might break userspace. This certainly could never catch all
possible API breaks, but it would be one more way to prevent
regressions.

* Any other tool which needs to understand the Linux API would
benefit, such as more esoteric projects to batch syscalls, intercept
and rewrite syscalls, forward syscalls to remote hosts, or any other
syscall manipulations.

To write this definition, a new Linux-specific format for the
definition might need to be created. At a minimum, it will need to be
able to describe bit-level data formats, complex pointer-based
data structures, tagged unions, "overloaded" syscalls such as ioctl,
and architecture-specific divergences. Most existing formats and
languages for describing interfaces like this unfortunately lack these
capabilities.

Whatever the format of the definition, the most important feature is
that it must be maintainable by existing Linux developers. One way to
achieve that might be to integrate it into the C code in some way,
building on top of SYSCALL_DEFINE. The API description can then be
automatically extracted from the C code into a more-easily-reusable
format, which can be used as input for other tools.

One step in this direction is Documentation/ABI, which specifies the
stability guarantees for different userspace APIs in a semi-formal
way. But it doesn't specify the actual content of those APIs, and it
doesn't cover individual syscalls at all.

Another related project is system call tables like
https://marcin.juszkiewicz.com.pl/download/tables/syscalls.html which
don't contain any more information than already in SYSCALL_DEFINE.

Hopefully this sounds like a reasonable thing to do. I'm looking for
any comments or suggestions, or related projects I don't know about.


2022-04-22 17:38:42

by Marcin Juszkiewicz

[permalink] [raw]
Subject: Re: Explicitly defining the userspace API

W dniu 20.04.2022 o 18:15, Spencer Baugh pisze:

> Another related project is system call tables like
> https://marcin.juszkiewicz.com.pl/download/tables/syscalls.html which
> don't contain any more information than already in SYSCALL_DEFINE.

This project was made to give a way of getting number<>name information
for system call. And for 'is it implemented'.

Nothing more, just simple info. And so far helped many developers and
their projects.

One day I got request from loongarch port maintainers to add their table
because systemd relies on it ;D

I also made Python module for using it. So far no known users :D

https://marcin.juszkiewicz.com.pl/2021/09/14/python-package-for-system-calls-information/

2022-04-22 20:09:16

by Cyril Hrubis

[permalink] [raw]
Subject: Re: Explicitly defining the userspace API

Hi!
> Linux guarantees the stability of its userspace API, but the API
> itself is only informally described, primarily with English prose. I
> want to add an explicit, authoritative machine-readable definition of
> the Linux userspace API.

My background is in kernel testing I do maintain the Linux Test Project
for more than a decade now. During the years we did create many "unit
tests" for kernel syscalls that are watching over the syscall API and
making sure that we get right results for both valid and invalid inputs.
These tests can also be considered to be a form of a documentation. The
same goes for some of the selftests that have been added to kernel repo
in the recent years. In a sense these are the most detailed descriptions
of the interfaces we have.

The main problem is that the kernel userspace boundary is large, we have
thousands of tests and I'm pretty sure that we don't cover even half of
it.

Also some of the interfaces are too complex to be even described in any
formal system, mostly the modern stuff such as io_uring or bfp. I have
had hard time even understading how to use these and I doubt I would be
even able to build a formal system to describe them. Especially since
the io_uring is mostly syscall less and we talk to the kernel by shared
buffers and atomic data updates.

> As background, in a conventional libc like glibc, read(2) calls the
> Linux system call read, passing arguments in an architecture-specific
> way according to the specific details of read.
>
> The details of these syscalls are at best documented in manpages, and
> often defined only by the implementation. Anyone else who wants to
> work with a syscall, in any way, needs to duplicate all those details.
>
> So the most basic definition of the API would just represent the
> information already present in SYSCALL_DEFINE macros: the C types of
> arguments and return values. More usefully, it would describe the
> formats of those arguments and return values: that the first argument
> to read is a file descriptor rather than an arbitrary integer, and
> what flags are valid in the flags argument of openat, and that open
> returns a file descriptor. A step beyond that would be describing, in
> some limited way, the effects of syscalls; for example, that read
> writes into the passed buffer the number of bytes that it returned.

Having this would be awesome, this is just one step from actually
generating automated tests for the syscalls. However my estimate is that
even if you started to work on this now it will take decade to get
somewhere, but maybe I'm too pesimistic.

Stil fingers crossed.

--
Cyril Hrubis
[email protected]

2022-04-22 21:32:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Explicitly defining the userspace API

On Wed, Apr 20, 2022 at 04:15:25PM +0000, Spencer Baugh wrote:
>
> Linux guarantees the stability of its userspace API, but the API
> itself is only informally described, primarily with English prose. I
> want to add an explicit, authoritative machine-readable definition of
> the Linux userspace API.
>
> As background, in a conventional libc like glibc, read(2) calls the
> Linux system call read, passing arguments in an architecture-specific
> way according to the specific details of read.
>
> The details of these syscalls are at best documented in manpages, and
> often defined only by the implementation. Anyone else who wants to
> work with a syscall, in any way, needs to duplicate all those details.
>
> So the most basic definition of the API would just represent the
> information already present in SYSCALL_DEFINE macros: the C types of
> arguments and return values. More usefully, it would describe the
> formats of those arguments and return values: that the first argument
> to read is a file descriptor rather than an arbitrary integer, and
> what flags are valid in the flags argument of openat, and that open
> returns a file descriptor. A step beyond that would be describing, in
> some limited way, the effects of syscalls; for example, that read
> writes into the passed buffer the number of bytes that it returned.

So how would you define read() in this format in a way that has not
already been attempted in the past? How are you going to define a
format that explains functionality in a way that is not just the
implementation in the end?

> One step in this direction is Documentation/ABI, which specifies the
> stability guarantees for different userspace APIs in a semi-formal
> way. But it doesn't specify the actual content of those APIs, and it
> doesn't cover individual syscalls at all.

The content is described in Documentation/ABI/ entries, where do you see
that missing?

And you are correct, that place does not describe syscalls, or other
user/kernel interfaces that predate sysfs.

good luck!

greg k-h

2022-05-09 02:43:09

by Spencer Baugh

[permalink] [raw]
Subject: Re: Explicitly defining the userspace API

Greg KH <[email protected]> writes:
> On Wed, Apr 20, 2022 at 04:15:25PM +0000, Spencer Baugh wrote:
>>
>> Linux guarantees the stability of its userspace API, but the API
>> itself is only informally described, primarily with English prose. I
>> want to add an explicit, authoritative machine-readable definition of
>> the Linux userspace API.
>>
>> As background, in a conventional libc like glibc, read(2) calls the
>> Linux system call read, passing arguments in an architecture-specific
>> way according to the specific details of read.
>>
>> The details of these syscalls are at best documented in manpages, and
>> often defined only by the implementation. Anyone else who wants to
>> work with a syscall, in any way, needs to duplicate all those details.
>>
>> So the most basic definition of the API would just represent the
>> information already present in SYSCALL_DEFINE macros: the C types of
>> arguments and return values. More usefully, it would describe the
>> formats of those arguments and return values: that the first argument
>> to read is a file descriptor rather than an arbitrary integer, and
>> what flags are valid in the flags argument of openat, and that open
>> returns a file descriptor. A step beyond that would be describing, in
>> some limited way, the effects of syscalls; for example, that read
>> writes into the passed buffer the number of bytes that it returned.
>
> So how would you define read() in this format in a way that has not
> already been attempted in the past?

I don't know about any attempts at doing this in the past (other than
what's already been mentioned in this thread - e.g. SYSCALL_DEFINE),
what do you have in mind?

> How are you going to define a format that explains functionality in a
> way that is not just the implementation in the end?

Lots of information can be expressed just with more specific types on
the function signature, even with regular C types. No need to expose
the implementation in any way.

For example, accept4's signature is:

SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr,
int __user *, upeer_addrlen, int, flags)

Here, fd and flags are the same type and have nothing to distinguish
them. But, purely as an example, not suggesting exactly this, but one
could have:

typedef int user_fd_t;
typedef int accept_flags_t;

SYSCALL_DEFINE4(accept4, user_fd_t, fd, struct sockaddr __user *, upeer_sockaddr,
int __user *, upeer_addrlen, accept_flags_t, flags)

Then a user could parse this SYSCALL_DEFINE and know that fd and flags
have different types with different possible valid values. user_fd_t
would be used by many different syscalls, accept_flags_t just by this.

With just this, the user of this information would still need to know
what user_fd and accept_flags are. The next step would be describing
the valid values for accept_flags. Unfortunately that's not something
that the C type system alone can express, but again purely as an
example, but one could have something like:

FLAGS_DEFINE(accept_flags, int,
SOCK_CLOEXEC,
SOCK_NONBLOCK)

Then a user could parse this FLAGS_DEFINE and know what the range of
valid values for accept_flags_t is. This could also be used in the
kernel; for example, FLAGS_DEFINE could generate an accept_flags_valid
function, usable in accept4 as:

if (!accept_flags_valid(flags))
return -EINVAL;

As for describing the buffer-writing behavior of read like I mentioned
before, here's a sketch of what that maybe could look like. The current
signature of read is:

SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)

One could imagine adding a type to the return value and changing this to
something like:

#define bytes_written_or_error(written_buffer) int
#define writable_user_buf(size_of_buffer) char __user *

SYSCALL_DEFINE3_RET(bytes_written_or_error(buf),
read, unsigned int, fd,
writable_user_buf(count), buf, size_t, count)

A user could parse this and know at least partially how read uses the
passed-in buffer, without having to look at the implementation.

Just for the sake of mentioning it, one could also imagine static
analysis which checks the kernel implementation against these
more-detailed types, which could catch bugs. But I'm not necessarily
proposing doing that - this is useful on its own even if it's not
checked by static analysis.

>> One step in this direction is Documentation/ABI, which specifies the
>> stability guarantees for different userspace APIs in a semi-formal
>> way. But it doesn't specify the actual content of those APIs, and it
>> doesn't cover individual syscalls at all.
>
> The content is described in Documentation/ABI/ entries, where do you see
> that missing?

I meant that it doesn't describe the content of the APIs in a
machine-readable way. (It's still very useful of course!)

> And you are correct, that place does not describe syscalls, or other
> user/kernel interfaces that predate sysfs.
>
> good luck!

Thank you!