2018-11-10 18:52:49

by Daniel Colascione

[permalink] [raw]
Subject: Official Linux system wrapper library?

Now that glibc is basically not adding any new system call wrappers,
how about publishing an "official" system call glue library as part of
the kernel distribution, along with the uapi headers? I don't think
it's reasonable to expect people to keep using syscall(__NR_XXX) for
all new functionality, especially as the system grows increasingly
sophisticated capabilities (like the new mount API, and hopefully the
new process API) outside the strictures of the POSIX process.


2018-11-10 19:02:13

by Willy Tarreau

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sat, Nov 10, 2018 at 10:52:06AM -0800, Daniel Colascione wrote:
> Now that glibc is basically not adding any new system call wrappers,
> how about publishing an "official" system call glue library as part of
> the kernel distribution, along with the uapi headers? I don't think
> it's reasonable to expect people to keep using syscall(__NR_XXX) for
> all new functionality, especially as the system grows increasingly
> sophisticated capabilities (like the new mount API, and hopefully the
> new process API) outside the strictures of the POSIX process.

It's partly related, but you may be interested in something I did that
is in the the RCU tree. It's called "nolibc", it's a set of syscall
wrappers defined only in include files. It's not complete, but still
enough to boot some small init wrappers. Mine can extract tar files
and do stuff like this with it. Here is the kernel port in the RCU
tree and an example of code using it :

https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/tree/tools/testing/selftests/rcutorture/bin/nolibc.h?h=rcu/next
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/tree/tools/testing/selftests/rcutorture/bin/mkinitrd.sh?h=rcu/next

The original one is maintained here (not very active since it works
well enough for my use cases now eventhough it's far from being
complete) :

http://git.formilux.org/?p=people/willy/nolibc.git

Maybe something along this could be done for the vast majority of
syscalls and the thicker stuff be left to glibc ? That would allow
simple userland to build without glibc using only kernel headers,
or by occasionally defining a few extra stuff or glue.

Willy

2018-11-10 19:07:25

by Daniel Colascione

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sat, Nov 10, 2018 at 11:01 AM, Willy Tarreau <[email protected]> wrote:
> On Sat, Nov 10, 2018 at 10:52:06AM -0800, Daniel Colascione wrote:
>> Now that glibc is basically not adding any new system call wrappers,
>> how about publishing an "official" system call glue library as part of
>> the kernel distribution, along with the uapi headers? I don't think
>> it's reasonable to expect people to keep using syscall(__NR_XXX) for
>> all new functionality, especially as the system grows increasingly
>> sophisticated capabilities (like the new mount API, and hopefully the
>> new process API) outside the strictures of the POSIX process.
>
> It's partly related, but you may be interested in something I did that
> is in the the RCU tree. It's called "nolibc", it's a set of syscall
> wrappers defined only in include files. It's not complete, but still
> enough to boot some small init wrappers. Mine can extract tar files
> and do stuff like this with it. Here is the kernel port in the RCU
> tree and an example of code using it :
>
> https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/tree/tools/testing/selftests/rcutorture/bin/nolibc.h?h=rcu/next
> https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/tree/tools/testing/selftests/rcutorture/bin/mkinitrd.sh?h=rcu/next
>
> The original one is maintained here (not very active since it works
> well enough for my use cases now eventhough it's far from being
> complete) :
>
> http://git.formilux.org/?p=people/willy/nolibc.git
>
> Maybe something along this could be done for the vast majority of
> syscalls and the thicker stuff be left to glibc ? That would allow
> simple userland to build without glibc using only kernel headers,
> or by occasionally defining a few extra stuff or glue.

Reminds me of LSS: https://chromium.googlesource.com/linux-syscall-support/

I'm not a fan of this approach for general-purpose use. There's value
in having *some* common function-level indirection before actually
issuing system calls, e.g., for LD_PRELOAD stuff.

2018-11-10 19:21:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sat, Nov 10, 2018 at 10:52:06AM -0800, Daniel Colascione wrote:
> Now that glibc is basically not adding any new system call wrappers,

Why are they not doing that anymore?

And there's no reason you have to use glibc, there are many other libcs
out there that hopefully are adding the new syscalls :)

> how about publishing an "official" system call glue library as part of
> the kernel distribution, along with the uapi headers? I don't think
> it's reasonable to expect people to keep using syscall(__NR_XXX) for
> all new functionality, especially as the system grows increasingly
> sophisticated capabilities (like the new mount API, and hopefully the
> new process API) outside the strictures of the POSIX process.

Patches are always welcome to be reviewed. But watch out that they
don't conflict with the libc headers. I know we had a "klibc" proposed
a long time ago but that died off for various reasons before it could
get merged.

Also, what about the basic work of making sure our uapi header files can
actually be used untouched by a libc? That isn't the case these days as
the bionic maintainers like to keep reminding me. That might be a good
thing to do _before_ trying to add new things like syscall wrappers.

thanks,

greg k-h

2018-11-10 19:34:30

by Willy Tarreau

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sat, Nov 10, 2018 at 11:06:45AM -0800, Daniel Colascione wrote:
> Reminds me of LSS: https://chromium.googlesource.com/linux-syscall-support/

Interesting, thanks for the link, I would probably not have started mine
had I known this one :-)

> I'm not a fan of this approach for general-purpose use. There's value
> in having *some* common function-level indirection before actually
> issuing system calls, e.g., for LD_PRELOAD stuff.

I'm not speaking about general purpose replacement but more about
general purpose low level functions that glibc wrappers can safely
use and expose by default. This way general purpose applications
would still use glibc and those willing to use a lower level could
do it more easily by accessing the lower layer, without having to
define their own syscalls. If I could do something like this in my
code :

#ifndef HAVE_SYSCALL_SPLICE // exposed by glibc
# ifdef __linux_splice // exposed by kernel header
# define splice __linux_splice
# else
# error "no splice exposed by either libc or kernel headers"
# endif
#endif

It would be easier, safer and cleaner than what I've used to do before :

#if !defined(HAVE_SYSCALL_SPLICE) && defined(__NR_splice)
static inline _syscall6(int, splice, int, fdin, loff_t *, off_in, int, fdout, loff_t *, off_out, size_t, len, unsigned long, flags);
#endif

Willy

2018-11-10 20:01:54

by Vlastimil Babka

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 11/10/18 8:20 PM, Greg KH wrote:
> On Sat, Nov 10, 2018 at 10:52:06AM -0800, Daniel Colascione wrote:
>> Now that glibc is basically not adding any new system call wrappers,
>
> Why are they not doing that anymore?

FYI just noticed there's a topic relevant to this in LPC Toolchain MC:

https://linuxplumbersconf.org/event/2/contributions/149/

> And there's no reason you have to use glibc, there are many other libcs
> out there that hopefully are adding the new syscalls :)
>
>> how about publishing an "official" system call glue library as part of
>> the kernel distribution, along with the uapi headers? I don't think
>> it's reasonable to expect people to keep using syscall(__NR_XXX) for
>> all new functionality, especially as the system grows increasingly
>> sophisticated capabilities (like the new mount API, and hopefully the
>> new process API) outside the strictures of the POSIX process.
>
> Patches are always welcome to be reviewed. But watch out that they
> don't conflict with the libc headers. I know we had a "klibc" proposed
> a long time ago but that died off for various reasons before it could
> get merged.
>
> Also, what about the basic work of making sure our uapi header files can
> actually be used untouched by a libc? That isn't the case these days as
> the bionic maintainers like to keep reminding me. That might be a good
> thing to do _before_ trying to add new things like syscall wrappers.
>
> thanks,
>
> greg k-h
>


Subject: Re: Official Linux system wrapper library?

[adding in glibc folk for comment]

On 11/10/18 7:52 PM, Daniel Colascione wrote:
> Now that glibc is basically not adding any new system call wrappers,
> how about publishing an "official" system call glue library as part of
> the kernel distribution, along with the uapi headers? I don't think
> it's reasonable to expect people to keep using syscall(__NR_XXX) for
> all new functionality, especially as the system grows increasingly
> sophisticated capabilities (like the new mount API, and hopefully the
> new process API) outside the strictures of the POSIX process.

As a quick glance at the glibc NEWS file shows, the above is not
quite true:

[[
Version 2.28
* The renameat2 function has been added...
* The statx function has been added...

Version 2.27
* Support for memory protection keys was added. The <sys/mman.h> header now
declares the functions pkey_alloc, pkey_free, pkey_mprotect...
* The copy_file_range function was added.

Version 2.26
* New wrappers for the Linux-specific system calls preadv2 and pwritev2.

Version 2.25
* The getrandom [function] have been added.
]]

I make that 11 system call wrappers added in the last 2 years.

That said, of course, there are many system calls that lack wrappers [1],
and the use of syscall() is undesirable.

The glibc folk do have their reasons for being conservative around
adding system calls (https://lwn.net/Articles/655028/). However, at
this point, I think one of the limiting factors is developer time
on the glibc project. Quite possibly, they just need some help to
add more (properly designed) wrappers faster.

Cheers,

Michael

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=6399 is a
longstanding example.

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2018-11-11 08:18:18

by Willy Tarreau

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sun, Nov 11, 2018 at 07:55:30AM +0100, Michael Kerrisk (man-pages) wrote:
> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=6399 is a
> longstanding example.

This one was a sad read and shows that applications will continue to
suffer from glibc's prehistorical view on operating systems and will
continue to have to define their own syscall wrappers to exploit the
full potential of the modern operating systems they execute on. This
reminds me when one had to write their own spinlocks and atomics many
years ago. Seeing comments suggesting an application should open
/proc/$PID makes me really wonder if people actually want to use slow
and insecure applications designed this way. Bah, after all, this
wipes quite a bit of the shame I feel every time I do something to
bypass it :-/

The sad thing is that the energy wasted arguing in the bug above could
have been better spent designing and implementing a generic solution
to expose syscalls without depending on glibc's politics anymore.

Willy

2018-11-11 08:25:57

by Daniel Colascione

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sun, Nov 11, 2018 at 12:17 AM, Willy Tarreau <[email protected]> wrote:
>
> On Sun, Nov 11, 2018 at 07:55:30AM +0100, Michael Kerrisk (man-pages) wrote:
> > [1] https://sourceware.org/
>
>
> Bah, after all, this
>
> wipes quite a bit of the shame I feel every time I do something to
>
> bypass it :-/
>
>
> The sad thing is that the energy wasted arguing in the bug above could
>
> have been better spent designing and implementing a generic solution
>
> to expose syscalls without depending on glibc's politics anymore.
>
>
> Willy
>
> bugzilla/show_bug.cgi?id=6399 is a
> > longstanding example.
>
> This one was a sad read and shows that applications will continue to
> suffer from glibc's prehistorical view on operating systems

Yes. I'm really not sure what glibc's current policies are meant to
accomplish. They don't serve any useful purpose. There seems to be
this weird subtext that glibc has leverage to change OS design, and it
really doesn't. It's a misplaced idealism and ends up just hurting
everyone.

>
> Seeing comments suggesting an application should open
> /proc/$PID makes me really wonder if people actually want to use slow
> and insecure applications designed this way.

That's a separate point. Yes, gettid should have a wrapper, but *also*
we should have an FD-based interface to processes, because outside
specialized contexts (e.g., parent-child waiting), the traditional
Unix process API really is impossible to use safely. But that's a
separate ongoing discussion.

2018-11-11 10:31:17

by Florian Weimer

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

* Willy Tarreau:

> On Sun, Nov 11, 2018 at 07:55:30AM +0100, Michael Kerrisk (man-pages) wrote:
>> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=6399 is a
>> longstanding example.
>
> This one was a sad read and shows that applications will continue to
> suffer from glibc's prehistorical view on operating systems and will
> continue to have to define their own syscall wrappers to exploit the
> full potential of the modern operating systems they execute on.

What's modern about a 15-bit thread identifier?

I understand that using this interface is required in some cases (which
includes some system calls for which glibc does provide wrappers), but I
assumed that it was at least understood that these reusable IDs for
tasks were an extremely poor interface. Aren't the resulting bugs
common knowledge?

> This reminds me when one had to write their own spinlocks and atomics
> many years ago. Seeing comments suggesting an application should open
> /proc/$PID makes me really wonder if people actually want to use slow
> and insecure applications designed this way.

I don't understand. If you want a non-reusable identifier, you have to
go through the /proc interface anyway. I think the recommendation is to
use the PID/start time combination to get a unique process identifier or
something like that.

I wanted to add gettid to glibc this cycle, but your comments suggest to
me that if we did this, we'd likely never get a proper non-reusable
thread identifier from the kernel. So I'm not sure what do anymore.

Thanks,
Florian

2018-11-11 10:41:30

by Florian Weimer

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

* Daniel Colascione:

> On Sun, Nov 11, 2018 at 12:17 AM, Willy Tarreau <[email protected]> wrote:
>>
>> On Sun, Nov 11, 2018 at 07:55:30AM +0100, Michael Kerrisk (man-pages) wrote:
>> > [1] https://sourceware.org/
>>
>>
>> Bah, after all, this
>>
>> wipes quite a bit of the shame I feel every time I do something to
>>
>> bypass it :-/
>>
>>
>> The sad thing is that the energy wasted arguing in the bug above could
>>
>> have been better spent designing and implementing a generic solution
>>
>> to expose syscalls without depending on glibc's politics anymore.
>>
>>
>> Willy
>>
>> bugzilla/show_bug.cgi?id=6399 is a
>> > longstanding example.
>>
>> This one was a sad read and shows that applications will continue to
>> suffer from glibc's prehistorical view on operating systems
>
> Yes. I'm really not sure what glibc's current policies are meant to
> accomplish. They don't serve any useful purpose. There seems to be
> this weird subtext that glibc has leverage to change OS design, and it
> really doesn't. It's a misplaced idealism and ends up just hurting
> everyone.

I'm not sure what this comment tries to accomplish.

glibc tries to serve many masters: Current and past Linux kernel
interfaces, current Hurd kernel interfaces, different versions of POSIX
and C (and even C++), current C/C++ programming practice, historic C
programming practice, current and historic Linux userspace programming,
various platform ABIs, just to name a few.

These requirements are often in conflict.

>> Seeing comments suggesting an application should open
>> /proc/$PID makes me really wonder if people actually want to use slow
>> and insecure applications designed this way.
>
> That's a separate point. Yes, gettid should have a wrapper, but *also*
> we should have an FD-based interface to processes, because outside
> specialized contexts (e.g., parent-child waiting), the traditional
> Unix process API really is impossible to use safely. But that's a
> separate ongoing discussion.

A descriptor-based API would not help glibc that much because there is
an expectation encoded into many C programs that the C library does not
keep permanently open descriptors for its own internal use.

Thanks,
Florian

Subject: Re: Official Linux system wrapper library?

On 11/11/18 9:17 AM, Willy Tarreau wrote:
> On Sun, Nov 11, 2018 at 07:55:30AM +0100, Michael Kerrisk (man-pages) wrote:
>> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=6399 is a
>> longstanding example.
>
> This one was a sad read and shows that applications will continue to
> suffer from glibc's prehistorical view on operating systems and will
> continue to have to define their own syscall wrappers to exploit the
> full potential of the modern operating systems they execute on. This
> reminds me when one had to write their own spinlocks and atomics many
> years ago. Seeing comments suggesting an application should open
> /proc/$PID makes me really wonder if people actually want to use slow
> and insecure applications designed this way. Bah, after all, this
> wipes quite a bit of the shame I feel every time I do something to
> bypass it :-/
>
> The sad thing is that the energy wasted arguing in the bug above could
> have been better spent designing and implementing a generic solution
> to expose syscalls without depending on glibc's politics anymore.

I'm not sure I'd view the glibc position quite so harshly (although
it is disappointing to me that bug 6399 remains open). I think they
are simply short of people to work on this task. There was a lengthy
period where no syscall wrappers were being added (pretty much from
2.16 to 2.24, as far as I can tell), but that has changed.

And there is an expectation in some cases from the kernel side
that glibc will provide wrappers that build on (rather than just
wrap) some syscalls. And sometimes those wrappers are non-trivial.

A converse question that one could ask is: why did a culture
evolve whereby kernel developers don't take responsibility for
working with the major libc to ensure that wrappers are added as
part of the job of adding each new system call? Yes, I know, there
are some historical reasons (and even today, IMO, they do
themselves no favors by requiring a CLA), but glibc really is
a different place today, compared to where it was a few years
ago.

Cheers,

Micahel

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2018-11-11 11:02:56

by Florian Weimer

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

* Michael Kerrisk:

> I'm not sure I'd view the glibc position quite so harshly (although
> it is disappointing to me that bug 6399 remains open). I think they
> are simply short of people to work on this task. There was a lengthy
> period where no syscall wrappers were being added (pretty much from
> 2.16 to 2.24, as far as I can tell), but that has changed.

People may have disappeared from glibc development who have objected to
gettid. I thought this was the case with strlcpy/strlcat, but it was
not.

At present, it takes one semi-active glibc contributor to block addition
of a system call. The process to override a sustained objection has
never been used successfully, and it is a lot of work to get it even
started.

Thanks,
Florian

2018-11-11 11:03:49

by Willy Tarreau

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sun, Nov 11, 2018 at 11:30:25AM +0100, Florian Weimer wrote:
> * Willy Tarreau:
>
> > On Sun, Nov 11, 2018 at 07:55:30AM +0100, Michael Kerrisk (man-pages) wrote:
> >> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=6399 is a
> >> longstanding example.
> >
> > This one was a sad read and shows that applications will continue to
> > suffer from glibc's prehistorical view on operating systems and will
> > continue to have to define their own syscall wrappers to exploit the
> > full potential of the modern operating systems they execute on.
>
> What's modern about a 15-bit thread identifier?

It's 15-bit on 32-bit systems, and 22 on 64-bit, hence you can have
4 million threads and/or processes on a single system image provided
you have the resources for that of course.

> I understand that using this interface is required in some cases (which
> includes some system calls for which glibc does provide wrappers), but I
> assumed that it was at least understood that these reusable IDs for
> tasks were an extremely poor interface. Aren't the resulting bugs
> common knowledge?

Sure, just as are the bugs created by people trying to implement their
own syscall wrappers. It's not by denying access to some native system
interfaces that you will prevent users from accessing them, you'll just
force them to work around the restriction and make things even worse.

> > This reminds me when one had to write their own spinlocks and atomics
> > many years ago. Seeing comments suggesting an application should open
> > /proc/$PID makes me really wonder if people actually want to use slow
> > and insecure applications designed this way.
>
> I don't understand. If you want a non-reusable identifier, you have to
> go through the /proc interface anyway. I think the recommendation is to
> use the PID/start time combination to get a unique process identifier or
> something like that.

It depends what you want to achieve. If you just need the tid, the one
you'll pass to sched_setaffinity(), gettid() is fine. There are two issues
with abusing /proc to emulate syscalls :
- it's sometimes much slower than the equivalent syscall and can
encourage users to cache the resulting values when they should not
- either it's done upon process startup and it may not get valid value
or may not work if /proc is not mounted yet (think init, mount etc),
or it's done upon first use and can break daemons which chroot()
themselves.

Syscalls don't have such limitations and are much safer to use. For other
things it's quite possible that you cannot rely on this syscall at all,
it's not a solution to everything, but it's a nice solution to all cases
where you need to access the system-wide identifier to pin a thread to a
given CPU set or renice it.

> I wanted to add gettid to glibc this cycle, but your comments suggest to
> me that if we did this, we'd likely never get a proper non-reusable
> thread identifier from the kernel. So I'm not sure what do anymore.

"Look people, I was about to do what we all refused to do for 10 years
now and Willy's comment made me change my mind, I'm sorry". The *real*
argument that most users could understand is "guys, we're sorry, but we
are running out of time and we won't work on this low priority stuff,
so someone else will have to take care of it".

In my opinion what matters is not whether or not people will use it
appropriately, but that its validity, side effects and wrong assumptions
are properly documented so that users don't shoot themselves in the foot.
But I guess that most of those defining it by themselves already figured
this out and are happy to use this available syscall when their application
wants to make use of certain feature that are offered by their operating
system.

Thanks,
Willy

2018-11-11 11:10:14

by Florian Weimer

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

* Michael Kerrisk:

> [adding in glibc folk for comment]
>
> On 11/10/18 7:52 PM, Daniel Colascione wrote:
>> Now that glibc is basically not adding any new system call wrappers,
>> how about publishing an "official" system call glue library as part of
>> the kernel distribution, along with the uapi headers? I don't think
>> it's reasonable to expect people to keep using syscall(__NR_XXX) for
>> all new functionality, especially as the system grows increasingly
>> sophisticated capabilities (like the new mount API, and hopefully the
>> new process API) outside the strictures of the POSIX process.
>
> As a quick glance at the glibc NEWS file shows, the above is not
> quite true:
>
> [[
> Version 2.28
> * The renameat2 function has been added...
> * The statx function has been added...
>
> Version 2.27
> * Support for memory protection keys was added. The <sys/mman.h> header now
> declares the functions pkey_alloc, pkey_free, pkey_mprotect...
> * The copy_file_range function was added.
>
> Version 2.26
> * New wrappers for the Linux-specific system calls preadv2 and pwritev2.
>
> Version 2.25
> * The getrandom [function] have been added.
> ]]
>
> I make that 11 system call wrappers added in the last 2 years.

And you missed mlock2 and memfd_create.

In some cases, we used system calls before the kernel had them (because
the kernel does not add system calls consistently across architectures).

On the other hand, this is only half of the story because distributions
do not backport system call wrappers, even those that backport kernel
implementations (or just rebase the kernel). This is something that
could be fixed eventually, but it is realted to another problem:

We had a patch for the membarrier system call, but the kernel developers
could not tell us what the system call does in therms of the C/C++
memory model, and the kernel developers and our concurrency expert could
not agree on documentation.

A lot of the new system calls lack clear specifications or are just
somewhat misdesigned. For example, pkey_alloc uses PKEY_DISABLE_WRITE
and PKEY_DISABLE_ACCESS flags (where the latter implies disabling both
read and write access), not something that matches the PROT_READ and
PROT_WRITE flags used by mmap/mprotect. This caused problems when POWER
support for pkey_alloc was added, and we are still working on resolving
that.

getrandom still causes boot delays because the kernel somehow fails to
seed its internal pool before starting PID 1 even on mainstream hardware
which has plenty of (true) randomness sources available, leading to
indefinite blocking of getrandom. It seems to me that people have
largely given up on fixing this in the upstream kernel.

For copy_file_range, we still have debates whether the system call (and
the glibc emulation) should preserve holes or not, and there a plans to
lift the cross-device restriction.

For renameat2, we already had a function in gnulib with the same name,
but which did not provide the atomic RENAME_NOREPLACE behavior for which
renameat2 was introduced.

These problems are relevant to the backporting question. One relatively
low-cost way do backport straight wrappers would be to put them as
hidden functions into libc_nonshared.a. But with these uncertainties,
this would be rather risky because fixing bugs of the wrappers would
then require relinking.

Thanks,
Florian

2018-11-11 11:12:31

by Willy Tarreau

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sun, Nov 11, 2018 at 11:53:54AM +0100, Michael Kerrisk (man-pages) wrote:
> I'm not sure I'd view the glibc position quite so harshly (although
> it is disappointing to me that bug 6399 remains open). I think they
> are simply short of people to work on this task.

I think so as well and really have great respect for this limitation,
which differs from the technical arguments on the bugzilla trying to
find every single good reason why using this syscall was wrong.

(...)
> A converse question that one could ask is: why did a culture
> evolve whereby kernel developers don't take responsibility for
> working with the major libc to ensure that wrappers are added as
> part of the job of adding each new system call? Yes, I know, there
> are some historical reasons (and even today, IMO, they do
> themselves no favors by requiring a CLA), but glibc really is
> a different place today, compared to where it was a few years
> ago.

I think the issue is a bit more complex :
- linux doesn't support a single libc
- glibc doesn't support a single OS

In practice we all know (believe?) that both statements above are
true but in practice 99% of the time there's a 1:1 relation between
these two components. What we'd really need would be to have the libc
interface as part of the operating system itself. I'm perfectly fine
with glibc providing all the "high-level" stuff like strcpy(), FILE*
operations etc, and all this probably is mostly system-independent.
But the system interface could possibly be handled easier in the
system itself, which would also provide a smoother adoption of new
syscalls and API updates. It would also limit the hassle required to
provide new syscalls, as if you start to have to contribute to two
projects at once for a single syscall, it becomes really painful.

But I don't know what changes that would require and it could really
turn out that in the end I'm totally wrong about the expected benefits.

Cheers,
Willy

2018-11-11 11:48:00

by Florian Weimer

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

* Willy Tarreau:

> I think the issue is a bit more complex :
> - linux doesn't support a single libc
> - glibc doesn't support a single OS
>
> In practice we all know (believe?) that both statements above are
> true but in practice 99% of the time there's a 1:1 relation between
> these two components.

Eh. Most Linux systems do not run glibc at all (and use cryptography
and other tricks to prevent users from installing it).

> What we'd really need would be to have the libc
> interface as part of the operating system itself. I'm perfectly fine
> with glibc providing all the "high-level" stuff like strcpy(), FILE*
> operations etc, and all this probably is mostly system-independent.

That's a bit messy, unfortunately.

The kernel does not know about TCB layout, so a lot of low-level
threading aspects are defined by userspace.

The kernel does not know about POSIX cancellation. Directly calling
system calls breaks support for that.

A lot of multi-threaded applications assume that most high-level
functionality remains usable even after fork in a multi-threaded
process. (This is increasingly a problem today with all those direct
calls to clone.) Unfortunately, this introduces rather tricky
low-level/high-level cross-subsystem issues, too.

> But the system interface could possibly be handled easier in the
> system itself, which would also provide a smoother adoption of new
> syscalls and API updates. It would also limit the hassle required to
> provide new syscalls, as if you start to have to contribute to two
> projects at once for a single syscall, it becomes really painful.

Sure, the duplication is unfortunate.

Several glibc contributors deeply care about standards compliance for
header files. The kernel developers care not, and the result is that we
copy definitions and declarations from the kernel header files, creating
additional problems.

We also want to use old kernel headers to compile glibc and still
implement features which are only defined by newer (upstream) kernels,
so that leads to more duplication.

Thanks,
Florian

2018-11-11 12:08:40

by Florian Weimer

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

* Willy Tarreau:

> On Sun, Nov 11, 2018 at 11:30:25AM +0100, Florian Weimer wrote:
>> * Willy Tarreau:
>>
>> > On Sun, Nov 11, 2018 at 07:55:30AM +0100, Michael Kerrisk (man-pages) wrote:
>> >> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=6399 is a
>> >> longstanding example.
>> >
>> > This one was a sad read and shows that applications will continue to
>> > suffer from glibc's prehistorical view on operating systems and will
>> > continue to have to define their own syscall wrappers to exploit the
>> > full potential of the modern operating systems they execute on.
>>
>> What's modern about a 15-bit thread identifier?
>
> It's 15-bit on 32-bit systems, and 22 on 64-bit, hence you can have
> 4 million threads and/or processes on a single system image provided
> you have the resources for that of course.

I believe the default for pid_max is still 32768.

>> I understand that using this interface is required in some cases (which
>> includes some system calls for which glibc does provide wrappers), but I
>> assumed that it was at least understood that these reusable IDs for
>> tasks were an extremely poor interface. Aren't the resulting bugs
>> common knowledge?
>
> Sure, just as are the bugs created by people trying to implement their
> own syscall wrappers. It's not by denying access to some native system
> interfaces that you will prevent users from accessing them, you'll just
> force them to work around the restriction and make things even worse.

Well, once we have the fixed interface, it becomes easier to use if we
only expose that, and not the confusing interface which is described in
countless Stackoverflow answers. More choice isn't always good.

>> > This reminds me when one had to write their own spinlocks and atomics
>> > many years ago. Seeing comments suggesting an application should open
>> > /proc/$PID makes me really wonder if people actually want to use slow
>> > and insecure applications designed this way.
>>
>> I don't understand. If you want a non-reusable identifier, you have to
>> go through the /proc interface anyway. I think the recommendation is to
>> use the PID/start time combination to get a unique process identifier or
>> something like that.
>
> It depends what you want to achieve. If you just need the tid, the one
> you'll pass to sched_setaffinity(), gettid() is fine.

You can use pthread_setaffinity_np to control the affinity mask of a
thread without knowing its TID, and you can call sched_setaffinity on
the current thread without knowing its TID anyway.

And for sched_setattr, you need to call syscall anyway because there is
no wrapper, so calling gettid via syscall isn't that bad. (We can't add
wrappers for sched_setattr because it's not entirely clear how the
userspace ABI will evolve in the future.)

> There are two issues
> with abusing /proc to emulate syscalls :
> - it's sometimes much slower than the equivalent syscall and can
> encourage users to cache the resulting values when they should not
> - either it's done upon process startup and it may not get valid value
> or may not work if /proc is not mounted yet (think init, mount etc),
> or it's done upon first use and can break daemons which chroot()
> themselves.

Sure, but many kernel developers prefer /proc and file-based interfaces.
See getumask for a particularly illuminating example.

> Syscalls don't have such limitations and are much safer to use. For other
> things it's quite possible that you cannot rely on this syscall at all,
> it's not a solution to everything, but it's a nice solution to all cases
> where you need to access the system-wide identifier to pin a thread to a
> given CPU set or renice it.

Again, you don't need gettid for that at all. glibc has covered this
fully.

Surely there is a better justification for using gettid?

I suspect quite a few calls to the gettid system calls could actually be
getpid, and the programmer used __NR_gettid instead of __NR_getpid to
bypass the glibc PID cache. But the cache isn't used by the syscall
code path anyway, so it really does not matter.

>> I wanted to add gettid to glibc this cycle, but your comments suggest to
>> me that if we did this, we'd likely never get a proper non-reusable
>> thread identifier from the kernel. So I'm not sure what do anymore.
>
> "Look people, I was about to do what we all refused to do for 10 years
> now and Willy's comment made me change my mind, I'm sorry". The *real*
> argument that most users could understand is "guys, we're sorry, but we
> are running out of time and we won't work on this low priority stuff,
> so someone else will have to take care of it".

I can assure you that in the past, a glibc patch for gettid would have
been rejected even if it were perfectly fine as far as the contribution
guidelines go (that is, copyright assignment, coding style, manual
update, ABI list update etc.). It's not a matter of resources or lack
thereof.

> In my opinion what matters is not whether or not people will use it
> appropriately, but that its validity, side effects and wrong assumptions
> are properly documented so that users don't shoot themselves in the foot.

Well, there I disagree. I think adding bad interfaces that confuse
developers is not a good idea, particularly if there is no compelling
use case. On the other hand, a userspace interface that is different
from what the kernel provides is confusing as well and leads to bugs
(see clone).

Thanks,
Florian

2018-11-11 12:11:14

by Willy Tarreau

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sun, Nov 11, 2018 at 12:46:35PM +0100, Florian Weimer wrote:
> > In practice we all know (believe?) that both statements above are
> > true but in practice 99% of the time there's a 1:1 relation between
> > these two components.
>
> Eh. Most Linux systems do not run glibc at all (and use cryptography
> and other tricks to prevent users from installing it).

Good point on this one. I could even have thought that most syscalls
are added with glibc in mind but your counter-example above could
remain valid.

> > What we'd really need would be to have the libc
> > interface as part of the operating system itself. I'm perfectly fine
> > with glibc providing all the "high-level" stuff like strcpy(), FILE*
> > operations etc, and all this probably is mostly system-independent.
>
> That's a bit messy, unfortunately.
>
> The kernel does not know about TCB layout, so a lot of low-level
> threading aspects are defined by userspace.
>
> The kernel does not know about POSIX cancellation. Directly calling
> system calls breaks support for that.
>
> A lot of multi-threaded applications assume that most high-level
> functionality remains usable even after fork in a multi-threaded
> process. (This is increasingly a problem today with all those direct
> calls to clone.) Unfortunately, this introduces rather tricky
> low-level/high-level cross-subsystem issues, too.

But don't you think that moving a bit of this into the kernel
repository could improve the situation ? The corner cases could then
be detected when the feature is developed and be addressed either by
adapting the kernel side of the syscall or even by changing the design
before it's committed. Maybe a few extra syscalls are missing to
retrieve some critial info that would make things more reliable or
easier between userland and kernel, and that would become more obvious
with all the relevant parts at the same place ?

> > But the system interface could possibly be handled easier in the
> > system itself, which would also provide a smoother adoption of new
> > syscalls and API updates. It would also limit the hassle required to
> > provide new syscalls, as if you start to have to contribute to two
> > projects at once for a single syscall, it becomes really painful.
>
> Sure, the duplication is unfortunate.
>
> Several glibc contributors deeply care about standards compliance for
> header files.

For having suffered a lot from the libc-4 to libc-5 then libc-5 to glibc,
I certainly can understand their concerns about standards compliance.

> The kernel developers care not, and the result is that we
> copy definitions and declarations from the kernel header files, creating
> additional problems.

Probably that these standard compatibility issues should be addressed at
their root in the kernel header definitions in fact. Working around issues
always leads to a stall at some point, and it encourages the process not
to change.

> We also want to use old kernel headers to compile glibc and still
> implement features which are only defined by newer (upstream) kernels,
> so that leads to more duplication.

This one could possibly be got rid of. When I build glibc, I specify the
oldest supported kernel, which usually is older than or equal to the
headers used to build, but I don't expect that newer features will
magically work at all. Thus I normally build with the most recent
headers covering my needs.

Thanks,
Willy

2018-11-11 14:24:16

by Daniel Colascione

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sun, Nov 11, 2018 at 3:09 AM, Florian Weimer <[email protected]> wrote:
> We had a patch for the membarrier system call, but the kernel developers
> could not tell us what the system call does in therms of the C/C++
> memory model
[snip]
> A lot of the new system calls lack clear specifications or are just
> somewhat misdesigned. For example, pkey_alloc
[snip]
> getrandom still causes boot delays
[snip]
> For copy_file_range, we still have debates whether the system call (and
> the glibc emulation) should preserve holes or not,
[snip]

These objections illustrate my point. glibc development is not the
proper forum for raising post-hoc objections to system call design.
Withholding wrappers will not un-ship these system calls. Applications
are already using them, via syscall(2). Developers and users would be
better served by providing access to the system as it is, with
appropriate documentation caveats, than by holding out for some
alternate and more ideal set of system calls that may or may not
appear in the future. This resistance to exposing the capabilities of
the system as they are, even in flawed and warty form, is what I meant
by "misplaced idealism" in my previous message. If the kernel provides
a system call, libc should provide a C wrapper for it, even if in the
opinion of the libc maintainers, that system call is flawed.

I agree with the proposals mentioned above to split system interface
responsibility, having glibc handle higher-level concerns like stdio
while punting system call wrappers and other low-level facilities to a
kernel-provided userspace library that can move faster and more
explicitly conform to the Linux kernel's userspace ABI.

2018-11-12 01:53:15

by Paul Eggert

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

Daniel Colascione wrote:
> This resistance to exposing the capabilities of
> the system as they are, even in flawed and warty form, is what I meant
> by "misplaced idealism" in my previous message.

With my application-developer hat on I prefer some resistance to flaws and
warts, as the resistance gives me a better feel for which functions are
problematic and which can be used more reliably. If glibc is missing Linux
syscall functionality that I really need then I can use syscall (with the usual
caveats) and I've done that on occasion (and have regretted it later too :-). It
is helpful for glibc to prefer mild curation to slavishly copying an API that
can be a bit helter-skelter at times.

2018-11-12 02:04:34

by Carlos O'Donell

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 11/10/18 2:58 PM, Vlastimil Babka wrote:
> On 11/10/18 8:20 PM, Greg KH wrote:
>> On Sat, Nov 10, 2018 at 10:52:06AM -0800, Daniel Colascione wrote:
>>> Now that glibc is basically not adding any new system call wrappers,
>>
>> Why are they not doing that anymore?
>
> FYI just noticed there's a topic relevant to this in LPC Toolchain MC:
>
> https://linuxplumbersconf.org/event/2/contributions/149/

Yes, and Adhemerval put it there on purpose to continue the discussion
between glibc developers and kernel developers. Florian Weimer and I have
both provided input to that talk, so if something comes out of the talk
and you want to talk more, please just reach out.

I hope that kernel developers interested in this topic will attend
and discuss the various ways forward on certain interesting topics :-)

--
Cheers,
Carlos.

2018-11-12 02:25:20

by Carlos O'Donell

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 11/10/18 2:20 PM, Greg KH wrote:
> Also, what about the basic work of making sure our uapi header files can
> actually be used untouched by a libc? That isn't the case these days as
> the bionic maintainers like to keep reminding me. That might be a good
> thing to do _before_ trying to add new things like syscall wrappers.
I agree completely. There are many steps in the checklist to writing
a new syscall, heck we should probably have a checklist!

Socially the issue is difficult because the various communities only
marginally share the same network of developers, care about different
features, or the same features with different priorities.

That doesn't mean we shouldn't try to integrate better. As was pointed
out, various people from the userspace and toolchain communities are
going to LPC to do just this.

--
Cheers,
Carlos.

2018-11-12 02:37:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sun, Nov 11, 2018 at 09:24:40PM -0500, Carlos O'Donell wrote:
> On 11/10/18 2:20 PM, Greg KH wrote:
> > Also, what about the basic work of making sure our uapi header files can
> > actually be used untouched by a libc? That isn't the case these days as
> > the bionic maintainers like to keep reminding me. That might be a good
> > thing to do _before_ trying to add new things like syscall wrappers.
> I agree completely. There are many steps in the checklist to writing
> a new syscall, heck we should probably have a checklist!

We should have a checklist. That's a great idea. Now to find someone
to write it... :)

I'll try to make it to the plumbers talk, but I think there's a
competing one I am supposed to be at at the same time.

greg k-h

2018-11-12 05:47:17

by Andy Lutomirski

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sun, Nov 11, 2018 at 6:24 PM Carlos O'Donell <[email protected]> wrote:
>
> On 11/10/18 2:20 PM, Greg KH wrote:
> > Also, what about the basic work of making sure our uapi header files can
> > actually be used untouched by a libc? That isn't the case these days as
> > the bionic maintainers like to keep reminding me. That might be a good
> > thing to do _before_ trying to add new things like syscall wrappers.
> I agree completely. There are many steps in the checklist to writing
> a new syscall, heck we should probably have a checklist!
>
> Socially the issue is difficult because the various communities only
> marginally share the same network of developers, care about different
> features, or the same features with different priorities.
>
> That doesn't mean we shouldn't try to integrate better. As was pointed
> out, various people from the userspace and toolchain communities are
> going to LPC to do just this.
>

if you all want my two cents, I think that we should approach this all
quite differently than trying to get glibc to add a wrapper for each
syscall. I think the kernel should contain a list or list of syscalls
along with parameter names, types, and numbers, and this should get
processed during the kernel build to produce a few different
artifacts:

- A machine-readable version of the same data in a stable format.
Tools like strace should be able to consume it.

- A library called, perhaps, libinux, or maybe a header-only library.
It should have a wrapper for *every* syscall, and they should be
namespaced. Instead of renameat2(), it should expose
linux_renameat2(). Ideally it would use the UAPI header types, but
void * wouldn't be so bad for pointers.

P.S. Does gcc even *have* the correct asm constraints to express
typeless syscalls? Ideally we'd want syscalls to have exactly the
same pointer escaping semantics as ordinary functions, so, if I do:

struct timeval tv;
/* typed expansion of linux_gettimeofday(&tv, NULL); */
asm volatile ("whatever" : "+m" (tv) : "D" (&tv));

it works. But if I want to use a generic wrapper that doesn't know
that the argument is a pointer, I do:

asm volatile ("whatever" :: "D" (&tv));

then gcc seems to not actually understand that the value pointed to by
&tv is modified by the syscall. glibc's syscall() function works
AFAICT because it's an external function, and gcc considers &tv to
have escaped and can't see the body of the syscall() function.

2018-11-12 08:11:55

by Florian Weimer

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

* Daniel Colascione:

> If the kernel provides a system call, libc should provide a C wrapper
> for it, even if in the opinion of the libc maintainers, that system
> call is flawed.

It's not that simple, I think. What about bdflush? socketcall?
getxpid? osf_gettimeofday? set_robust_list? There are quite a few
irregularities, and some editorial discretion appears to be unavoidable.

Even if we were to provide perfectly consistent system call wrappers
under separate names, we'd still expose different calling conventions
for things like off_t to applications, which would make using some of
the system calls quite difficult and surprisingly non-portable.

Thanks,
Florian

2018-11-12 12:26:27

by Florian Weimer

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

* Willy Tarreau:

>> > What we'd really need would be to have the libc
>> > interface as part of the operating system itself. I'm perfectly fine
>> > with glibc providing all the "high-level" stuff like strcpy(), FILE*
>> > operations etc, and all this probably is mostly system-independent.
>>
>> That's a bit messy, unfortunately.
>>
>> The kernel does not know about TCB layout, so a lot of low-level
>> threading aspects are defined by userspace.
>>
>> The kernel does not know about POSIX cancellation. Directly calling
>> system calls breaks support for that.
>>
>> A lot of multi-threaded applications assume that most high-level
>> functionality remains usable even after fork in a multi-threaded
>> process. (This is increasingly a problem today with all those direct
>> calls to clone.) Unfortunately, this introduces rather tricky
>> low-level/high-level cross-subsystem issues, too.
>
> But don't you think that moving a bit of this into the kernel
> repository could improve the situation ? The corner cases could then
> be detected when the feature is developed and be addressed either by
> adapting the kernel side of the syscall or even by changing the design
> before it's committed. Maybe a few extra syscalls are missing to
> retrieve some critial info that would make things more reliable or
> easier between userland and kernel, and that would become more obvious
> with all the relevant parts at the same place ?
>
>> > But the system interface could possibly be handled easier in the
>> > system itself, which would also provide a smoother adoption of new
>> > syscalls and API updates. It would also limit the hassle required to
>> > provide new syscalls, as if you start to have to contribute to two
>> > projects at once for a single syscall, it becomes really painful.
>>
>> Sure, the duplication is unfortunate.
>>
>> Several glibc contributors deeply care about standards compliance for
>> header files.
>
> For having suffered a lot from the libc-4 to libc-5 then libc-5 to glibc,
> I certainly can understand their concerns about standards compliance.

This is getting way off-topic, but:

The C standard does not care deeply about practical source code
compatibility. Behavior of valid syntax generally remains unchanged.
However, each revision adds many macros to existing header files, so
practical source code compatibility tends to be problematic. For glibc,
the current policy is to enable all optional features with _GNU_SOURCE,
so most projects receive the full dose of macros. (Unrelated to
standards, even new system call wrappers are problematic for source code
compatibility).)

For ABI compatibility, there are only ad-hoc standards anyway, so it's
mostly about us being careful when making changes.

>> The kernel developers care not, and the result is that we
>> copy definitions and declarations from the kernel header files, creating
>> additional problems.
>
> Probably that these standard compatibility issues should be addressed at
> their root in the kernel header definitions in fact. Working around issues
> always leads to a stall at some point, and it encourages the process not
> to change.

In the past, we couldn't even get agreement about little things such as
__u64, so I'm not hopeful. 8-(

Thanks,
Florian

2018-11-12 12:46:24

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 11/11/18 14:22, Daniel Colascione wrote:
> On Sun, Nov 11, 2018 at 3:09 AM, Florian Weimer <[email protected]> wrote:
>> We had a patch for the membarrier system call, but the kernel developers
>> could not tell us what the system call does in therms of the C/C++
>> memory model
> [snip]
>> A lot of the new system calls lack clear specifications or are just
>> somewhat misdesigned. For example, pkey_alloc
> [snip]
>> getrandom still causes boot delays
> [snip]
>> For copy_file_range, we still have debates whether the system call (and
>> the glibc emulation) should preserve holes or not,
> [snip]
>
> These objections illustrate my point. glibc development is not the
> proper forum for raising post-hoc objections to system call design.
> Withholding wrappers will not un-ship these system calls. Applications
> are already using them, via syscall(2). Developers and users would be
> better served by providing access to the system as it is, with
> appropriate documentation caveats, than by holding out for some
> alternate and more ideal set of system calls that may or may not
> appear in the future. This resistance to exposing the capabilities of
> the system as they are, even in flawed and warty form, is what I meant
> by "misplaced idealism" in my previous message. If the kernel provides
> a system call, libc should provide a C wrapper for it, even if in the
> opinion of the libc maintainers, that system call is flawed.

flaws can be worked around.

it's just more work to do that, hence wrappers are delayed.

(while new flawed syscalls get added, there are missing
syscalls for implementing posix semantics or for better libc
quality, so are the priorities of linux right?)

> I agree with the proposals mentioned above to split system interface
> responsibility, having glibc handle higher-level concerns like stdio
> while punting system call wrappers and other low-level facilities to a
> kernel-provided userspace library that can move faster and more
> explicitly conform to the Linux kernel's userspace ABI.

consuming linux uapi headers is a huge problem (not just for
glibc): the libc has to repeat uapi definitions under appropriate
feature macros using proper libc types etc, this usually creates
conflict between linux and libc headers and a lot of duplicated
work at every linux release. the situation would be worse if all
new types were exposed for new syscalls when they appeared.

the proposal mentioned above does not solve this in any way.

2018-11-12 13:20:00

by Daniel Colascione

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Mon, Nov 12, 2018 at 12:11 AM, Florian Weimer <[email protected]> wrote:
> * Daniel Colascione:
>
>> If the kernel provides a system call, libc should provide a C wrapper
>> for it, even if in the opinion of the libc maintainers, that system
>> call is flawed.
>
> It's not that simple, I think. What about bdflush? socketcall?
> getxpid? osf_gettimeofday? set_robust_list?

What about them? Mentioning that these system calls exist is not in
itself an argument.

> There are quite a few
> irregularities

So?

> and some editorial discretion appears to be unavoidable.

That's an assertion, not an argument, and I strongly disagree. *Why*
do you think "editorial discretion" is unavoidable? What privileges
glibc's judgement here? What would go wrong if socketcall and
set_robust_list and so on had wrappers? If applications chose to use
these lower-level wrappers instead of higher-level facilities, they
take on responsibility for using the APIs properly.

> Even if we were to provide perfectly consistent system call wrappers
> under separate names, we'd still expose different calling conventions
> for things like off_t to applications, which would make using some of
> the system calls quite difficult and surprisingly non-portable.

We can learn something from how Windows does things. On that system,
what we think of as "libc" is actually two parts. (More, actually, but
I'm simplifying.) At the lowest level, you have the semi-documented
ntdll.dll, which contains raw system call wrappers and arcane
kernel-userland glue. On top of ntdll live the "real" libc
(msvcrt.dll, kernel32.dll, etc.) that provide conventional
application-level glue. The tight integration between ntdll.dll and
the kernel allows Windows to do very impressive things. (For example,
on x86_64, Windows has no 32-bit ABI as far as the kernel is
concerned! You can still run 32-bit programs though, and that works
via ntdll.dll essentially shimming every system call and switching the
processor between long and compatibility mode as needed.) Normally,
you'd use the higher-level capabilities, but if you need something in
ntdll (e.g., if you're Cygwin) nothing stops your calling into the
lower-level system facilities directly. ntdll is tightly bound to the
kernel; the higher-level libc, not so.

We should adopt a similar approach. Shipping a lower-level
"liblinux.so" tightly bound to the kernel would not only let the
kernel bypass glibc's "editorial discretion" in exposing new
facilities to userspace, but would also allow for tighter user-kernel
integration that one can achieve with a simplistic syscall(2)-style
escape hatch. (For example, for a long time now, I've wanted to go
beyond POSIX and improve the system's signal handling API, and this
improvement requires userspace cooperation.) The vdso is probably too
small and simplistic to serve in this role; I'd want a real library.

2018-11-12 14:37:25

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Mon, Nov 12, 2018 at 12:45:26PM +0000, Szabolcs Nagy wrote:
> >> A lot of the new system calls lack clear specifications or are just
> >> somewhat misdesigned. For example, pkey_alloc
> > [snip]
> >> getrandom still causes boot delays

I'll note that what some people consider misdesigns, others consider
"fix CVE's".

Some people may consider it more important to avoid boot delays;
others would consider internet-wide security problems, ala
https://factorable.net to be higher priority.

It's clear this is one area where I and some glibc developers have had
a difference of opinion. The bigger problem is that if a single glibc
developer is able to veto any new system call, maybe we *do* need to
have a kernel-provided library which bypasses glibc....

- Ted

2018-11-12 14:41:17

by Daniel Colascione

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Mon, Nov 12, 2018 at 6:35 AM, Theodore Y. Ts'o <[email protected]> wrote:
> On Mon, Nov 12, 2018 at 12:45:26PM +0000, Szabolcs Nagy wrote:
>> >> A lot of the new system calls lack clear specifications or are just
>> >> somewhat misdesigned. For example, pkey_alloc
>> > [snip]
>> >> getrandom still causes boot delays
>
> I'll note that what some people consider misdesigns, others consider
> "fix CVE's".
>
> Some people may consider it more important to avoid boot delays;
> others would consider internet-wide security problems, ala
> https://factorable.net to be higher priority.
>
> It's clear this is one area where I and some glibc developers have had
> a difference of opinion. The bigger problem is that if a single glibc
> developer is able to veto any new system call, maybe we *do* need to
> have a kernel-provided library which bypasses glibc....

Historically speaking, the liberum veto has not led to good governance.

2018-11-12 16:09:23

by Jonathan Corbet

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sun, 11 Nov 2018 18:36:30 -0800
Greg KH <[email protected]> wrote:

> We should have a checklist. That's a great idea. Now to find someone
> to write it... :)

Do we think the LPC session might have the right people to create such a
thing? If so, I can try to put together a coherent presentation of the
result.

jon

2018-11-12 16:46:09

by Joseph Myers

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sun, 11 Nov 2018, Florian Weimer wrote:

> People may have disappeared from glibc development who have objected to
> gettid. I thought this was the case with strlcpy/strlcat, but it was
> not.

Well, I know of two main people who were objecting to the notion of adding
bindings for all non-obsolescent syscalls, Linux-specific if not suitable
for adding to the OS-independent GNU API, and neither seems to have posted
in the past year.

> At present, it takes one semi-active glibc contributor to block addition
> of a system call. The process to override a sustained objection has
> never been used successfully, and it is a lot of work to get it even
> started.

We don't have such a process. (I've suggested, e.g. in conversation with
Carlos at the Cauldron, that we should have something involving a
supermajority vote of the GNU maintainers for glibc in cases where we're
unable to reach a consensus in the community as a whole.)

--
Joseph S. Myers
[email protected]

2018-11-12 17:02:24

by Joseph Myers

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sun, 11 Nov 2018, Florian Weimer wrote:

> The kernel does not know about TCB layout, so a lot of low-level
> threading aspects are defined by userspace.
>
> The kernel does not know about POSIX cancellation. Directly calling
> system calls breaks support for that.

Indeed. Where cancellation is involved, glibc needs to know exactly what
instructions might be calling a cancellable syscall and what instructions
are before or after the syscall (see Adhemerval's patches for bug 12683).

This involves an ABI that is not just specific to a particular libc, but
specific to a particular libc version. So it's inherently unsuitable to
put cancellable syscalls in libc_nonshared.a, as well as unsuitable to put
them in any kernel-provided library.

The interface for setting errno may also be libc-specific, for any
syscalls involving setting errno.

Syscalls often involve types in their interfaces such as off_t and struct
timespec. libcs may have multiple different variants of those types; the
variants available, and the ways of selecting them, are libc-specific and
libc-version-specific. So for any syscall for which the proper userspace
interface involves one of those types, wrappers for it are inherently
specific to a particular libc and libc version. (See e.g. how preadv2 and
pwritev2 syscalls also have preadv64v2 and pwritev64v2 APIs in glibc, with
appropriate redirections hased on __USE_FILE_OFFSET64, which is in turn
based on _FILE_OFFSET_BITS.)

There are many ABI variants that are relevant to glibc but not to the
kernel. Some of these involve ABI tagging of object files to indicate
which ABI variant an object is built for (and those that don't have such
tagging ought to have it), to prevent accidental linking of objects for
different ABIs. How to build objects for different userspace ABIs is not
something the kernel should need to know anything about; it's most
naturally dealt with at the level of building compiler multilibs and libc.

glibc deliberately avoids depending at compile time on the existence of
libgcc_s.so to facilitate bootstrap builds (a stripped glibc binary built
with a C-only static-only inhibit_libc GCC that was built without glibc
should be identical to the result of a longer alternating sequence of GCC
and glibc builds). I don't think any kernel-provided library would be any
better to depend on.

What one might suggest is that when new syscalls are added, kernel
developers should at least obtain agreement on linux-api from libc people
about what the userspace interface to the syscall should be. That means
the userspace-level types (such as off_t and struct timespec), and the
choice of error handling (returning error number or setting errno), and
the name of the header declaring the function, and the name of the
function, and how the syscall relates to thread cancellation, for example
- and whatever other issues may be raised.

--
Joseph S. Myers
[email protected]

2018-11-12 17:37:12

by Joseph Myers

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sun, 11 Nov 2018, Willy Tarreau wrote:

> > The kernel developers care not, and the result is that we
> > copy definitions and declarations from the kernel header files, creating
> > additional problems.
>
> Probably that these standard compatibility issues should be addressed at
> their root in the kernel header definitions in fact. Working around issues
> always leads to a stall at some point, and it encourages the process not
> to change.

But it's not a bug in the Linux kernel header files. The set of feature
test macros supported is libc-specific and libc-version-specific. The
internal macros defined as a result of the feature test macros, that
determine what features to expose, are also libc-specific and
libc-version-specific. (The __USE_* macros in glibc are not a stable API.
For example, we might move to using __GLIBC_USE for more features in place
of the defined/undefined __USE_* internal macros.)

If a feature is Linux-specific, and the userspace header for it is also
Linux-specific (as opposed to constants in standard headers such as
sys/mman.h, where you get all the namespace issues), that userspace header
*can* include uapi headers in many cases to get constants and structures -
if those uapi headers actually work in userspace without defining things
conflicting with libc types. E.g. <sys/fanotify.h> includes
<linux/fanotify.h>.

What *is*, in my view, a bug in the uapi headers is that some of them
don't work when included on their own. I'd expect #include
<linux/whatever.h> or #include <asm/whatever.h>, for any such header
installed by make headers_install, to compile on its own in userspace
without needing any other headers to be included first, unless some header
is specifically defined as being an internal part of another header which
is the one that should be included.

In glibc we have scripts/check-installed-headers.sh which verifies that
installed headers work when included like that in various language
standard and feature test macro modes - and with my bots running
build-many-glibcs.py, this property is effectively verified every few
hours for (currently) 79 different glibc configurations covering all
supported glibc ABIs. If the uapi headers are fixed to work on their own,
there should be similar continuous integration to make sure that this
continues to be the case in future.

Simply having uapi headers that reliably work when included on their own
would help with adding further test automation in glibc to verify
consistency of constant and structure definitions between glibc and uapi
headers. We have a few such checks (e.g. for signal numbers), but now
that we require Python 3 to build glibc I hope to convert those into more
general infrastructure for extracting information from headers and running
checks on the extracted information.

--
Joseph S. Myers
[email protected]

2018-11-12 17:43:28

by Zack Weinberg

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

Daniel Colascione <[email protected]> wrote:
> >> If the kernel provides a system call, libc should provide a C wrapper
> >> for it, even if in the opinion of the libc maintainers, that system
> >> call is flawed.

I would like to state general support for this principle; in fact, I
seriously considered preparing patches that made exactly this change,
about a year ago, posting them, and calling for objections. Then
$dayjob ate all my hacking time (and is still doing so, alas).

Nonetheless I do think there are exceptions, such as those that are
completely obsolete (bdflush, socketcall) and those that cannot be
used without stomping on glibc's own data structures (set_robust_list
is the only one of these I know about off the top of my head, but
there may well be others).

Daniel Colascione <[email protected]> wrote:
> We can learn something from how Windows does things. On that system,
> what we think of as "libc" is actually two parts. (More, actually, but
> I'm simplifying.) At the lowest level, you have the semi-documented
> ntdll.dll, which contains raw system call wrappers and arcane
> kernel-userland glue. On top of ntdll live the "real" libc
> (msvcrt.dll, kernel32.dll, etc.) that provide conventional
> application-level glue.

This is an appealing idea at first sight; there are several other
constituencies for it besides frustrated kernel hackers, such as
alternative system programming languages (Rust, Go) that want to
minimize dependencies on legacy "C library" functionality. If we
could find a clean way to do it, I would support it.

The trouble is that "raw system call wrappers and arcane
kernel-userland glue" turns out to be a lot more code, with a lot more
tentacles in both directions, than you might think. If you compare
the sizes of the text sections of `ntdll.dll` and `libc.so.6` you will
notice that the former is _bigger_. The reason for this, as far as I
can determine (without any access to Microsoft's internal
documentation or source code ;-) is that ntdll.dll contains the
dynamic linker-equivalent, a basic memory allocator, the stack
unwinder, and a good chunk of the core thread library. (It also has
stuff in it that's needed by programs that run early during boot and
can't use kernel32.dll, but that's not our problem.) I don't think
this is an accident or an engineering compromise. It is necessary for
the dynamic loader to understand threads, and the thread library to
understand shared library semantics. It is necessary for both of
those components to allocate memory. And both of those components are
naturally tightly coupled to the kernel, and in particular they have
to be up and running from the first user-space instruction executed in
a new process, so it's natural to put them in the component that is
responsible for talking directly to the kernel.

But the _consequence_ of this design is, ntdll.dll defines the
semantics of shared library loading, and the semantics of threads, for
the entire system. A hypothetical equivalent liblinuxabi.so.1 would
have to do the same. And that means you wouldn't get as much
decoupling from the C and POSIX standards -- both of which specify at
least part of those semantics -- as you want, and we would still be
having these arguments. For example, it would be every bit as
troublesome for liblinuxabi.so.1 to export set_robust_list as it would
be for libc.so.6 to do that.

You might be able to get out of most of the tangle by putting the
dynamic loader in a separate process, and that's _also_ an appealing
idea for several other reasons, but it would still need to understand
some of the thread-related data structures within the processes it
manipulated, so I don't think it would help enough to be worth it (in
a complete greenfields design where I get to ignore POSIX and rewrite
the kernel API from scratch, now, that might be a different story).

On a larger note, the fundamental complaint here is a project process
/ communication complaint. We haven't been communicating enough with
the kernel team, fair criticism. We can do better. But the
communication has to go both ways. When, for instance, we tell you
that membarrier needs to have its semantics nailed down in terms of
the C++17 memory model, that actually needs to happen. When we tell
you that we can't use UAPI headers directly unless you commit to
honoring all of the standard-sourced namespace constraints on
user-visible headers, that needs to end the argument unless and until
someone does commit to doing all of that work on the kernel side. (We
could discuss things we could do to make that work easier from your
end -- the __USE macros could stand to be better documented, for
instance -- but ultimately someone has to do the work.)

And, because this is a process / communication problem, you cannot
expect there to be a purely technical fix. Your position appears,
from where I'm sitting, to be something like "if we split glibc into
two pieces, then you and us will never have to talk to each other
again" which, I'm sorry, I can't see that working out in the long run.

> (For example, for a long time now, I've wanted to go
> beyond POSIX and improve the system's signal handling API, and this
> improvement requires userspace cooperation.)

This is also an appealing notion, but the first step should be to
eliminate all of the remaining uses for asynchronous signals: for
instance, give us process handles already! Once a program only ever
needs to call sigaction() to deal with
SIGSEGV/SIGBUS/SIGILL/SIGFPE/SIGTRAP, then we can think about
inventing a better replacement for that scenario.

zw

2018-11-12 17:55:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Mon, Nov 12, 2018 at 05:36:11PM +0000, Joseph Myers wrote:
> What *is*, in my view, a bug in the uapi headers is that some of them
> don't work when included on their own. I'd expect #include
> <linux/whatever.h> or #include <asm/whatever.h>, for any such header
> installed by make headers_install, to compile on its own in userspace
> without needing any other headers to be included first, unless some header
> is specifically defined as being an internal part of another header which
> is the one that should be included.

Yes, that is a bug, and people have been working on fixing that. We now
have a new build target:
make headers_check
to keep this all working properly.

Right now on Linus's latest tree I only see one failure when running
this:
./usr/include/linux/v4l2-controls.h:1105: found __[us]{8,16,32,64} type without #include <linux/types.h>
so we are getting better.

If there are still problems with this, please let us know and we will be
glad to resolve them.

thanks,

greg k-h

2018-11-12 18:12:11

by Joseph Myers

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Mon, 12 Nov 2018, Greg KH wrote:

> If there are still problems with this, please let us know and we will be
> glad to resolve them.

With headers installed from Linus's latest tree, I retried (for x86_64)
the case of a source file containing the single line

#include <linux/elfcore.h>

which (as previously discussed, and Arnd had an RFC patch) I want to use
in a glibc test of header consistency. It gives errors "unknown type name
'elf_greg_t'" etc. (for lots more types as well) - but even before getting
onto those errors, there's

asm/signal.h:127:2: error: unknown type name 'size_t'

from a header included from linux/elfcore.h. So this doesn't seem to be
working as I'd expect yet.

--
Joseph S. Myers
[email protected]

2018-11-12 18:15:34

by Randy Dunlap

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 11/12/18 10:09 AM, Joseph Myers wrote:
> On Mon, 12 Nov 2018, Greg KH wrote:
>
>> If there are still problems with this, please let us know and we will be
>> glad to resolve them.
>
> With headers installed from Linus's latest tree, I retried (for x86_64)
> the case of a source file containing the single line
>
> #include <linux/elfcore.h>
>
> which (as previously discussed, and Arnd had an RFC patch) I want to use
> in a glibc test of header consistency. It gives errors "unknown type name
> 'elf_greg_t'" etc. (for lots more types as well) - but even before getting
> onto those errors, there's
>
> asm/signal.h:127:2: error: unknown type name 'size_t'
>
> from a header included from linux/elfcore.h. So this doesn't seem to be
> working as I'd expect yet.


Yes, someone from Google (iirc) and also David Howells had some tests
that would point out all of the problems. I thought (expected) more follow-up
from them with patches...


--
~Randy

2018-11-12 18:29:35

by Daniel Colascione

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Mon, Nov 12, 2018 at 9:24 AM, Zack Weinberg <[email protected]> wrote:
> Daniel Colascione <[email protected]> wrote:
>> >> If the kernel provides a system call, libc should provide a C wrapper
>> >> for it, even if in the opinion of the libc maintainers, that system
>> >> call is flawed.
>
> I would like to state general support for this principle; in fact, I
> seriously considered preparing patches that made exactly this change,
> about a year ago, posting them, and calling for objections. Then
> $dayjob ate all my hacking time (and is still doing so, alas).
>
> Nonetheless I do think there are exceptions, such as those that are
> completely obsolete (bdflush, socketcall) and those that cannot be
> used without stomping on glibc's own data structures (set_robust_list
> is the only one of these I know about off the top of my head, but
> there may well be others).

If people want to stomp over glibc's data structures, let them. Maybe
a particular program, for whatever reason, wants to avoid glibc
mutexes entirely and do its own synchronization. It should be possible
to cleanly separate the users on a per-thread basis.

Besides, adhering to the principle that all system functionality is
provided is worth it even if (in the case of bdflush) there's not a
compelling use right now.

Consider bdflush: in kernel debugging, hijacking "useless" system
calls and setting breakpoints on them or temporarily wiring them to
custom functionality is sometimes useful, and there's no particular
reason to *prevent* a program from calling one of these routines,
especially since there's little cost to providing a wrapper and
noticeable value in completeness itself.

> Daniel Colascione <[email protected]> wrote:
>> We can learn something from how Windows does things. On that system,
>> what we think of as "libc" is actually two parts. (More, actually, but
>> I'm simplifying.) At the lowest level, you have the semi-documented
>> ntdll.dll, which contains raw system call wrappers and arcane
>> kernel-userland glue. On top of ntdll live the "real" libc
>> (msvcrt.dll, kernel32.dll, etc.) that provide conventional
>> application-level glue.
>
> This is an appealing idea at first sight; there are several other
> constituencies for it besides frustrated kernel hackers, such as
> alternative system programming languages (Rust, Go) that want to
> minimize dependencies on legacy "C library" functionality. If we
> could find a clean way to do it, I would support it.
>
> The trouble is that "raw system call wrappers and arcane
> kernel-userland glue" turns out to be a lot more code, with a lot more
> tentacles in both directions, than you might think. If you compare
> the sizes of the text sections of `ntdll.dll` and `libc.so.6` you will
> notice that the former is _bigger_. The reason for this, as far as I
> can determine (without any access to Microsoft's internal
> documentation or source code ;-) is that ntdll.dll contains the
> dynamic linker-equivalent, a basic memory allocator, the stack
> unwinder, and a good chunk of the core thread library. (It also has
> stuff in it that's needed by programs that run early during boot and
> can't use kernel32.dll, but that's not our problem.) I don't think
> this is an accident or an engineering compromise. It is necessary for
> the dynamic loader to understand threads, and the thread library to
> understand shared library semantics.

Sure, but I'm not proposing talking about including threads or dynamic
library loading in the minimal kernel glue library we're discussing.
That ntdll includes this functionality (and a thread pool, and various
other gunk) works for Windows, but it's not a necessary consequence of
our adopting a layering model that the lowest of *our* layers include
what the lowest layer on Windows includes. As I mentioned above,
there's room for a "minimal" kernel interface library that actually
touches relatively little of glibc's concerns.

> A hypothetical equivalent liblinuxabi.so.1 would
> have to do the same.

It depends on what you put into the library. Basic system call
wrappers and potential future userspace glue. The ABI I'm proposing
doesn't have to look like POSIX --- for example, it can indicate error
returns via a separate out parameter. (This approach is cleaner
anyway.) As for pthread cancelation? All there's required is to mark a
range of PC values as "after cancel check, before syscall
instruction". The Linux ABI library could export a function that libc
could use, passing in a program counter value, to determine whether PC
(extracted from ucontext_t in a signal handler) were immediately
before a cancellation check.

What about off_t differences? Again, it doesn't matter. From the
*kernel's* point of view, there's one width of offset parameter per
system call per architecture. The library I'm proposing would expose
this parameter literally. If a higher-level libc wants to use a
preprocessor switch to conditionally support different offset widths,
that's fine, but there's no reason that a more literal kernel
interface library would have to do that.

> And that means you wouldn't get as much
> decoupling from the C and POSIX standards -- both of which specify at
> least part of those semantics -- as you want, and we would still be
> having these arguments. For example, it would be every bit as
> troublesome for liblinuxabi.so.1 to export set_robust_list as it would
> be for libc.so.6 to do that.

Why? Such an exported function would cause no trouble until called,
and there are legitimate reasons for calling such a function. Not
everyone, as mentioned, wants to write a program that relies on libc.

> You might be able to get out of most of the tangle by putting the
> dynamic loader in a separate process

I don't think that's a workable approach. The creation of a separate
process is a very observable side effect, and it seems unexpected that
something as simple as cat(1) would have this side effect. If
anything, parts of the dynamic linker should move into the *kernel* to
support things like applying relocations to clean pages, but that's a
separate discussion.

> and that's _also_ an appealing
> idea for several other reasons, but it would still need to understand
> some of the thread-related data structures within the processes it
> manipulated, so I don't think it would help enough to be worth it (in
> a complete greenfields design where I get to ignore POSIX and rewrite
> the kernel API from scratch, now, that might be a different story).
>
> On a larger note, the fundamental complaint here is a project process
> / communication complaint. We haven't been communicating enough with
> the kernel team, fair criticism. We can do better. But the
> communication has to go both ways. When, for instance, we tell you
> that membarrier needs to have its semantics nailed down in terms of
> the C++17 memory model, that actually needs to happen

I think you can think of membarrier as upgrading signal fences to thread fences.

> And, because this is a process / communication problem, you cannot
> expect there to be a purely technical fix. Your position appears,
> from where I'm sitting, to be something like "if we split glibc into
> two pieces, then you and us will never have to talk to each other
> again" which, I'm sorry, I can't see that working out in the long run.
>
>> (For example, for a long time now, I've wanted to go
>> beyond POSIX and improve the system's signal handling API, and this
>> improvement requires userspace cooperation.)
>
> This is also an appealing notion, but the first step should be to
> eliminate all of the remaining uses for asynchronous signals: for
> instance, give us process handles already! Once a program only ever
> needs to call sigaction() to deal with
> SIGSEGV/SIGBUS/SIGILL/SIGFPE/SIGTRAP, then we can think about
> inventing a better replacement for that scenario.

I too want process handles. (See my other patches.) But that's besides
the point.

This stance in the paragraph I've quoted is another example of glibc's
misplaced idealism. As I've elaborated elsewhere, people use signals
for many purposes today. The current signals API is extremely
difficult to use correctly in a process in which multiple unrelated
components want to take advantage of signal-handling functionality.
Users deserve a cleaner, modern, and safe API. It's not productive
withhold improvements to the signal API and gate them on unrelated
features like process handles merely because, in the personal
judgement of the glibc maintainers, developers should use signals for
fewer things. This attitude is an unwarranted imposition on the entire
ecosystem. It should be possible to innovate in this area without
these blockers, one way or another.

2018-11-12 19:12:08

by Florian Weimer

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

* Daniel Colascione:

> What about off_t differences? Again, it doesn't matter. From the
> *kernel's* point of view, there's one width of offset parameter per
> system call per architecture. The library I'm proposing would expose
> this parameter literally.

Does this mean the application author needs to know when to split an
off_t argument into two, and when to pass it as a single argument, and
when to insert dummy arguments for alignment, depending on the
architecture?

>> And that means you wouldn't get as much
>> decoupling from the C and POSIX standards -- both of which specify at
>> least part of those semantics -- as you want, and we would still be
>> having these arguments. For example, it would be every bit as
>> troublesome for liblinuxabi.so.1 to export set_robust_list as it would
>> be for libc.so.6 to do that.
>
> Why? Such an exported function would cause no trouble until called,
> and there are legitimate reasons for calling such a function. Not
> everyone, as mentioned, wants to write a program that relies on libc.

For that use case, a machine-readable system call ABI specification is
the only reasonable approach: Some people want inline system calls,
others want dedicated routines per system call. The calling convention
for the dedicated functions will vary, and the way errors are handled as
well. Some want connect calls to be handled by socketcall if possible,
others prefer the direct call.

The nice thing here is that once you settled for a particular approach,
the functions are really small and will not change, so there is no real
need for dynamic linking. The challenge here is to come up with a
uniform description of the system call interface for all architectures,
and for application programmer's sanity, make sure that the kernel adds
generic system calls in a single version, across all architectures.

> This stance in the paragraph I've quoted is another example of glibc's
> misplaced idealism. As I've elaborated elsewhere, people use signals
> for many purposes today. The current signals API is extremely
> difficult to use correctly in a process in which multiple unrelated
> components want to take advantage of signal-handling functionality.
> Users deserve a cleaner, modern, and safe API. It's not productive
> withhold improvements to the signal API and gate them on unrelated
> features like process handles merely because, in the personal
> judgement of the glibc maintainers, developers should use signals for
> fewer things.

The two aren't unrelated. If you take asynchronous signals out of the
picture, the design becomes simpler and easier to use.

Thanks,
Florian

2018-11-12 19:27:31

by Daniel Colascione

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Mon, Nov 12, 2018 at 11:11 AM, Florian Weimer <[email protected]> wrote:
> * Daniel Colascione:
>
>> What about off_t differences? Again, it doesn't matter. From the
>> *kernel's* point of view, there's one width of offset parameter per
>> system call per architecture. The library I'm proposing would expose
>> this parameter literally.
>
> Does this mean the application author needs to know when to split an
> off_t argument into two, and when to pass it as a single argument, and
> when to insert dummy arguments for alignment, depending on the
> architecture?

No, I wouldn't make callers go to that trouble. I don't see any
barrier to common-sense local data transformations. These
transformations don't have external dependencies, after all. I want a
thin interface to the kernel, but not so thin as to be a direct
mapping onto register locations. I don't see value in that level of
correspondence.

>>> And that means you wouldn't get as much
>>> decoupling from the C and POSIX standards -- both of which specify at
>>> least part of those semantics -- as you want, and we would still be
>>> having these arguments. For example, it would be every bit as
>>> troublesome for liblinuxabi.so.1 to export set_robust_list as it would
>>> be for libc.so.6 to do that.
>>
>> Why? Such an exported function would cause no trouble until called,
>> and there are legitimate reasons for calling such a function. Not
>> everyone, as mentioned, wants to write a program that relies on libc.
>
> For that use case, a machine-readable system call ABI specification is
> the only reasonable approach:

> The challenge here is to come up with a
> uniform description of the system call interface for all architectures,

This is another example in which we should remember the old aphorism
that the perfect is the enemy of the good. There's no reason that the
kernel couldn't simply provide a library with conventional functions
exported in the conventional way doing the conventional things that
functions do, one that would free users from relying on direct use of
syscall(2). If this library were to interact with errno and
cancelation properly, so much the better. There's no reason to avoid
this work in favor of some theoretically-elegant
abstract-function-description metadata-based approach that will likely
never materialize.

(Alternatively: just regard C as the uniform description language.)

>> This stance in the paragraph I've quoted is another example of glibc's
>> misplaced idealism. As I've elaborated elsewhere, people use signals
>> for many purposes today. The current signals API is extremely
>> difficult to use correctly in a process in which multiple unrelated
>> components want to take advantage of signal-handling functionality.
>> Users deserve a cleaner, modern, and safe API. It's not productive
>> withhold improvements to the signal API and gate them on unrelated
>> features like process handles merely because, in the personal
>> judgement of the glibc maintainers, developers should use signals for
>> fewer things.
>
> The two aren't unrelated. If you take asynchronous signals out of the
> picture, the design becomes simpler and easier to use.

The two features *are* unrelated. The design I've proposed works
equally well for synchronous and asynchronous signals, and limiting it
to synchronous signals doesn't simplify it. Even if it were the case
that the design were simpler and easier to use when limited to
synchronous signals --- which it isn't, unless you want to go in the
SEH direction, which is more, not less complicated --- that wouldn't
be a reason to block the work until some form of process handle
landed. The objections I've seen have all essentially amounted to "we
don't think people should use signals".

2018-11-12 20:04:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Mon, Nov 12, 2018 at 09:08:28AM -0700, Jonathan Corbet wrote:
> On Sun, 11 Nov 2018 18:36:30 -0800
> Greg KH <[email protected]> wrote:
>
> > We should have a checklist. That's a great idea. Now to find someone
> > to write it... :)
>
> Do we think the LPC session might have the right people to create such a
> thing? If so, I can try to put together a coherent presentation of the
> result.

I do not know who will be there, sorry...

2018-11-12 22:35:57

by Joseph Myers

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Mon, 12 Nov 2018, Florian Weimer wrote:

> For that use case, a machine-readable system call ABI specification is
> the only reasonable approach: Some people want inline system calls,

I also think it's much more likely to be of use to glibc than any syscall
library provided by the kernel. I don't think a syscall library provided
by the kernel is likely to be of use for implementing glibc functions, but
some kind of textual ABI specification might at least be of use for
checking that syscall macro calls / syscalls.list entries are consistent
with what the kernel thinks its ABI is. (Hopefully there would be
automated tests on the kernel side as well of some kind of consistency
between the ABI specification and the kernel.)

strace is indeed a more obvious potential consumer of such a description
of syscall ABIs.

I'd think a syscall library would more likely be something a few
applications would use if they want to access a syscall that for whatever
reason glibc doesn't have a wrapper for yet - not something useful for
glibc itself to call or link against.

> and for application programmer's sanity, make sure that the kernel adds
> generic system calls in a single version, across all architectures.

That would be strongly desirable for glibc as well - a way of ensuring
that the kernel rapidly fails CI tests and does not get released if new
syscalls are only present on some architectures (including e.g. being
missing from some compat syscall tables, or defined in asm/unistd.h but
not in the actual syscall table, or vice versa - or some way of making
sure such inconsistencies cannot occur by eliminating duplicate copies of
the syscall list information in the sources).

When we have compatibility code in glibc for the absence of some syscall,
we can only eliminate that code when the oldest kernel version supported
by glibc is new enough to have the syscall on whichever glibc architecture
was slowest to introduce the syscall in the kernel - and that can often be
years after the first architectures gained support for that syscall in the
kernel.

--
Joseph S. Myers
[email protected]

2018-11-12 22:53:31

by Joseph Myers

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Mon, 12 Nov 2018, Daniel Colascione wrote:

> The two features *are* unrelated. The design I've proposed works
> equally well for synchronous and asynchronous signals, and limiting it

Whatever the design, I see no obvious reason why a kernel-provided
library, with all the problems that entails, should need to be involved,
rather than putting new APIs either in libc or in a completely separate
libsignal for libraries wanting to use such a system for cooperative
signal use.

(I can imagine *other* parts of the toolchain being involved, if e.g. you
want to have a good way of checking "is the address of the instruction
causing this signal in this library?" that works with static as well as
dynamic linking - for dynamic linking, I expect something could be done
using libc_nonshared and __dso_handle to identify code in the library
calling some registering function. And indeed there might also be new
kernel interfaces that help improve signal handling.)

In the absence of consensus for adding such a new API for signals to
glibc, it's unlikely one would get consensus for glibc to depend on some
other library providing such an API either. But you can always write a
library (which I think would most naturally be a completely separate
libsignal, not part of the kernel source tree) and seek to persuade
libraries they should be using it rather than interfering with global
state by registering normal signal handlers directly.

--
Joseph S. Myers
[email protected]

2018-11-12 23:11:42

by Daniel Colascione

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Mon, Nov 12, 2018 at 2:51 PM, Joseph Myers <[email protected]> wrote:
> I see no obvious reason why a kernel-provided
> library, with all the problems that entails, should need to be involved,
> rather than putting new APIs either in libc

I initially wanted to put the APIs in libc. I still do. But that's
proving to be impractical, for the reasons we're discussing on this
thread.

> or in a completely separate
> libsignal

A separate library can't prevent some use of sigaction elsewhere in
the program stomping on its handler. One of the key aspects of the
registered-handler design is that registered handlers get to run
*before* the legacy process-wide handler. The only non-hacky way to do
that is to put the signal handler registration logic in the same logic
component that houses the legacy signal registration machinery.

> (I can imagine *other* parts of the toolchain being involved, if e.g. you
> want to have a good way of checking "is the address of the instruction
> causing this signal in this library?" that works with static as well as
> dynamic linking - for dynamic linking, I expect something could be done
> using libc_nonshared and __dso_handle to identify code in the library
> calling some registering function. And indeed there might also be new
> kernel interfaces that help improve signal handling.)

Again: you're blocking a practical solution for the sake of some
elegant theoretical implementation that will never arrive, and so the
world remains in a poor state indefinitely. Incremental improvement is
good. Nothing about the registered signal handler approach precludes
this sort of enhancement in the future. The same goes for the system
call metadata database you've described: nice-to-have; shouldn't block
simpler and more immediately practical work.

> In the absence of consensus for adding such a new API for signals to
> glibc, it's unlikely one would get consensus for glibc to depend on some
> other library providing such an API either.

glibc would continue using an unsupported legacy system call
interfaces in lieu of a supported low-level interface library?

2018-11-12 23:29:08

by Joseph Myers

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Mon, 12 Nov 2018, Daniel Colascione wrote:

> I initially wanted to put the APIs in libc. I still do. But that's
> proving to be impractical, for the reasons we're discussing on this
> thread.

Well, your proposed APIs didn't attract consensus among libc developers.

> > (I can imagine *other* parts of the toolchain being involved, if e.g. you
> > want to have a good way of checking "is the address of the instruction
> > causing this signal in this library?" that works with static as well as
> > dynamic linking - for dynamic linking, I expect something could be done
> > using libc_nonshared and __dso_handle to identify code in the library
> > calling some registering function. And indeed there might also be new
> > kernel interfaces that help improve signal handling.)
>
> Again: you're blocking a practical solution for the sake of some
> elegant theoretical implementation that will never arrive, and so the

I'm not - I'm observing various areas that might be open to improvements
related to signal handling, not saying improvements in one area are a
prerequisite to improvements in another. I'm exploring the problem and
solution space, and collectively exploring the problem and solution space
is an important part of trying to work out where there might be useful
future improvements related to the general issue of signal handling.

Exploring the problem and solution space can include coming to the
conclusion that an idea that seems obvious is in fact a bad idea, or in
fact orthogonal to other ideas that are independently useful - those
things are still useful in yielding a better rationale for taking a given
approach.

> > In the absence of consensus for adding such a new API for signals to
> > glibc, it's unlikely one would get consensus for glibc to depend on some
> > other library providing such an API either.
>
> glibc would continue using an unsupported legacy system call
> interfaces in lieu of a supported low-level interface library?

The Linux kernel supports the interfaces that people actually use, on the
principle of not breaking userspace, not the interfaces that someone would
like to declare to be the supported ones. We'd use the interfaces that
seem suitable for use by glibc, and direct syscalls seem more suitable to
me than any kernel-provided userspace library.

Naturally a library invented in the kernel on the basis of not liking what
libc people are doing or not doing is unlikely to be suitable for use by
libc (and use together with libc of anything in it that interferes with
libc functionality such as sigaction might be explicitly discouraged by
libc maintainers, just as e.g. direct use of clone can be discouraged) -
whereas interfaces developed collaboratively with libc implementations and
getting consensus from those users are more likely to be of use to libc
implementations.

--
Joseph S. Myers
[email protected]

2018-11-13 15:17:11

by Carlos O'Donell

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 11/12/18 11:43 AM, Joseph Myers wrote:
> On Sun, 11 Nov 2018, Florian Weimer wrote:
>
>> People may have disappeared from glibc development who have objected to
>> gettid. I thought this was the case with strlcpy/strlcat, but it was
>> not.
>
> Well, I know of two main people who were objecting to the notion of adding
> bindings for all non-obsolescent syscalls, Linux-specific if not suitable
> for adding to the OS-independent GNU API, and neither seems to have posted
> in the past year.
>
>> At present, it takes one semi-active glibc contributor to block addition
>> of a system call. The process to override a sustained objection has
>> never been used successfully, and it is a lot of work to get it even
>> started.
>
> We don't have such a process. (I've suggested, e.g. in conversation with
> Carlos at the Cauldron, that we should have something involving a
> supermajority vote of the GNU maintainers for glibc in cases where we're
> unable to reach a consensus in the community as a whole.)

... and I need a good excuse to propose such a process :-)

--
Cheers,
Carlos.

2018-11-13 19:39:54

by Dave Martin

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Mon, Nov 12, 2018 at 05:19:14AM -0800, Daniel Colascione wrote:

[...]

> We can learn something from how Windows does things. On that system,
> what we think of as "libc" is actually two parts. (More, actually, but
> I'm simplifying.) At the lowest level, you have the semi-documented
> ntdll.dll, which contains raw system call wrappers and arcane
> kernel-userland glue. On top of ntdll live the "real" libc
> (msvcrt.dll, kernel32.dll, etc.) that provide conventional
> application-level glue. The tight integration between ntdll.dll and
> the kernel allows Windows to do very impressive things. (For example,
> on x86_64, Windows has no 32-bit ABI as far as the kernel is
> concerned! You can still run 32-bit programs though, and that works
> via ntdll.dll essentially shimming every system call and switching the
> processor between long and compatibility mode as needed.) Normally,
> you'd use the higher-level capabilities, but if you need something in
> ntdll (e.g., if you're Cygwin) nothing stops your calling into the
> lower-level system facilities directly. ntdll is tightly bound to the
> kernel; the higher-level libc, not so.
>
> We should adopt a similar approach. Shipping a lower-level
> "liblinux.so" tightly bound to the kernel would not only let the
> kernel bypass glibc's "editorial discretion" in exposing new
> facilities to userspace, but would also allow for tighter user-kernel
> integration that one can achieve with a simplistic syscall(2)-style
> escape hatch. (For example, for a long time now, I've wanted to go
> beyond POSIX and improve the system's signal handling API, and this
> improvement requires userspace cooperation.) The vdso is probably too
> small and simplistic to serve in this role; I'd want a real library.

Can you expand on your reasoning here?

Playing devil's advocate:

If the library is just exposing the syscall interface, I don't see
why it _couldn't_ fit into the vdso (or something vdso-like).

If a separate library, I'd be concerned that it would accumulate
value-add bloat over time, and the kernel ABI may start to creep since
most software wouldn't invoke the kernel directly any more. Even if
it's maintained in the kernel tree, its existence as an apparently
standalone component may encourage forking, leading to a potential
compatibility mess.

The vdso approach would mean we can guarantee that the library is
available and up to date at runtime, and may make it easier to keep
what's in it down to sane essentials.

Cheers
---Dave

2018-11-13 20:59:48

by Andy Lutomirski

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?


> On Nov 13, 2018, at 11:39 AM, Dave Martin <[email protected]> wrote:
>
> On Mon, Nov 12, 2018 at 05:19:14AM -0800, Daniel Colascione wrote:
>
> [...]
>
>> We can learn something from how Windows does things. On that system,
>> what we think of as "libc" is actually two parts. (More, actually, but
>> I'm simplifying.) At the lowest level, you have the semi-documented
>> ntdll.dll, which contains raw system call wrappers and arcane
>> kernel-userland glue. On top of ntdll live the "real" libc
>> (msvcrt.dll, kernel32.dll, etc.) that provide conventional
>> application-level glue. The tight integration between ntdll.dll and
>> the kernel allows Windows to do very impressive things. (For example,
>> on x86_64, Windows has no 32-bit ABI as far as the kernel is
>> concerned! You can still run 32-bit programs though, and that works
>> via ntdll.dll essentially shimming every system call and switching the
>> processor between long and compatibility mode as needed.) Normally,
>> you'd use the higher-level capabilities, but if you need something in
>> ntdll (e.g., if you're Cygwin) nothing stops your calling into the
>> lower-level system facilities directly. ntdll is tightly bound to the
>> kernel; the higher-level libc, not so.
>>
>> We should adopt a similar approach. Shipping a lower-level
>> "liblinux.so" tightly bound to the kernel would not only let the
>> kernel bypass glibc's "editorial discretion" in exposing new
>> facilities to userspace, but would also allow for tighter user-kernel
>> integration that one can achieve with a simplistic syscall(2)-style
>> escape hatch. (For example, for a long time now, I've wanted to go
>> beyond POSIX and improve the system's signal handling API, and this
>> improvement requires userspace cooperation.) The vdso is probably too
>> small and simplistic to serve in this role; I'd want a real library.
>
> Can you expand on your reasoning here?
>
> Playing devil's advocate:
>
> If the library is just exposing the syscall interface, I don't see
> why it _couldn't_ fit into the vdso (or something vdso-like).
>
> If a separate library, I'd be concerned that it would accumulate
> value-add bloat over time, and the kernel ABI may start to creep since
> most software wouldn't invoke the kernel directly any more. Even if
> it's maintained in the kernel tree, its existence as an apparently
> standalone component may encourage forking, leading to a potential
> compatibility mess.
>
> The vdso approach would mean we can guarantee that the library is
> available and up to date at runtime, and may make it easier to keep
> what's in it down to sane essentials.

Hmm. Putting on my vDSO hat:

The vDSO could provide all kinds of nifty things. Better exception handling comes to mind. But it has two major limitations that severely restrict what it can do:

- It can’t allocate memory. We probably want to keep it that way.

- It can’t use TLS. Solving this without genuinely awful ABI issues may be extremely hard. We *could* require callers to pass a thread pointer in, I suppose.

Also, if we make the vDSO stateful, CRIU is going to have a blast. We might need to expose explicit save and restore abilities.

As a straw man use case, it would be neat if DSOs (or the loader, maybe) could register a list of exception fixups per DSO. The kernel could consult these lists before delivering a signal. ISTM it wouldn’t be so crazy if the vDSO handled registration, although it could uses syscalls as well. If the vDSO did it, it would need somewhere to put the lists.

2018-11-14 10:57:13

by Dave Martin

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Tue, Nov 13, 2018 at 12:58:39PM -0800, Andy Lutomirski wrote:
>
> > On Nov 13, 2018, at 11:39 AM, Dave Martin <[email protected]> wrote:
> >
> > On Mon, Nov 12, 2018 at 05:19:14AM -0800, Daniel Colascione wrote:
> >
> > [...]
> >
> >> We can learn something from how Windows does things. On that system,
> >> what we think of as "libc" is actually two parts. (More, actually, but
> >> I'm simplifying.) At the lowest level, you have the semi-documented
> >> ntdll.dll, which contains raw system call wrappers and arcane
> >> kernel-userland glue. On top of ntdll live the "real" libc
> >> (msvcrt.dll, kernel32.dll, etc.) that provide conventional
> >> application-level glue. The tight integration between ntdll.dll and
> >> the kernel allows Windows to do very impressive things. (For example,
> >> on x86_64, Windows has no 32-bit ABI as far as the kernel is
> >> concerned! You can still run 32-bit programs though, and that works
> >> via ntdll.dll essentially shimming every system call and switching the
> >> processor between long and compatibility mode as needed.) Normally,
> >> you'd use the higher-level capabilities, but if you need something in
> >> ntdll (e.g., if you're Cygwin) nothing stops your calling into the
> >> lower-level system facilities directly. ntdll is tightly bound to the
> >> kernel; the higher-level libc, not so.
> >>
> >> We should adopt a similar approach. Shipping a lower-level
> >> "liblinux.so" tightly bound to the kernel would not only let the
> >> kernel bypass glibc's "editorial discretion" in exposing new
> >> facilities to userspace, but would also allow for tighter user-kernel
> >> integration that one can achieve with a simplistic syscall(2)-style
> >> escape hatch. (For example, for a long time now, I've wanted to go
> >> beyond POSIX and improve the system's signal handling API, and this
> >> improvement requires userspace cooperation.) The vdso is probably too
> >> small and simplistic to serve in this role; I'd want a real library.
> >
> > Can you expand on your reasoning here?
> >
> > Playing devil's advocate:
> >
> > If the library is just exposing the syscall interface, I don't see
> > why it _couldn't_ fit into the vdso (or something vdso-like).
> >
> > If a separate library, I'd be concerned that it would accumulate
> > value-add bloat over time, and the kernel ABI may start to creep since
> > most software wouldn't invoke the kernel directly any more. Even if
> > it's maintained in the kernel tree, its existence as an apparently
> > standalone component may encourage forking, leading to a potential
> > compatibility mess.
> >
> > The vdso approach would mean we can guarantee that the library is
> > available and up to date at runtime, and may make it easier to keep
> > what's in it down to sane essentials.
>
> Hmm. Putting on my vDSO hat:
>
> The vDSO could provide all kinds of nifty things. Better exception
> handling comes to mind. But it has two major limitations that severely
> restrict what it can do:
>
> - It can’t allocate memory. We probably want to keep it that way.
>
> - It can’t use TLS. Solving this without genuinely awful ABI issues
> may be extremely hard. We *could* require callers to pass a thread
> pointer in, I suppose.
>
> Also, if we make the vDSO stateful, CRIU is going to have a blast. We
> might need to expose explicit save and restore abilities.
>
> As a straw man use case, it would be neat if DSOs (or the loader,
> maybe) could register a list of exception fixups per DSO. The kernel
> could consult these lists before delivering a signal. ISTM it wouldn’t
> be so crazy if the vDSO handled registration, although it could uses
> syscalls as well. If the vDSO did it, it would need somewhere to put
> the lists.

Fair points, though this is rather what I meant by "sane essentials".
Because there are strict limits on what can be done in the vDSO, it may
be more bloat-resistant and more conservatively maintained.

This might provide a way to push some dumb compatibility kludge code
that receives little ongoing maintenance outside the privilege wall,
whereas it has to sit in the kernel proper today.

In theory we could opt to advertise new syscalls only via vDSO entry
points, and not maintain __NR_xxx values for them (which may or may
not upset ptrace users.) Anyway, I digress...

Cheers
---Dave

2018-11-14 11:41:35

by Florian Weimer

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

* Dave Martin:

> Fair points, though this is rather what I meant by "sane essentials".
> Because there are strict limits on what can be done in the vDSO, it may
> be more bloat-resistant and more conservatively maintained.
>
> This might provide a way to push some dumb compatibility kludge code
> that receives little ongoing maintenance outside the privilege wall,
> whereas it has to sit in the kernel proper today.
>
> In theory we could opt to advertise new syscalls only via vDSO entry
> points, and not maintain __NR_xxx values for them (which may or may
> not upset ptrace users.) Anyway, I digress...

Is the vDSO available across all architectures? (I don't think we use
it on all architectures in glibc.)

If not, a vDSO-based approach would merely lead to even more variance
between architectures, which can't be a good thing.

Thanks,
Florian

2018-11-14 11:58:54

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 13/11/18 19:39, Dave Martin wrote:
> On Mon, Nov 12, 2018 at 05:19:14AM -0800, Daniel Colascione wrote:
>> We should adopt a similar approach. Shipping a lower-level
>> "liblinux.so" tightly bound to the kernel would not only let the
>> kernel bypass glibc's "editorial discretion" in exposing new
>> facilities to userspace, but would also allow for tighter user-kernel
>> integration that one can achieve with a simplistic syscall(2)-style
>> escape hatch. (For example, for a long time now, I've wanted to go
>> beyond POSIX and improve the system's signal handling API, and this
>> improvement requires userspace cooperation.) The vdso is probably too
>> small and simplistic to serve in this role; I'd want a real library.
>
> Can you expand on your reasoning here?

such lib creates a useless abi+api layer that
somebody has to maintain and document (with or
without vdso).

it obviously cannot work together with a posix
conform libc implementation for which it would
require knowledge about

thread cancellation internals, potentially TLS
for errno, know libc types even ones that are
based on compile time feature macros (and expose
them in headers in a way that does not collide
with libc headers), abi variants the libc supports
(e.g. softfp, security hardened abi), libc
internal signals (for anything that's changing
signal masks), thread internals for syscalls that
require coordination between all user created
threads (setxid), libc internal state for syscalls
that create/destroy threads.

and thus such lib does not solve the problems
of users who actually requested wrappers for
new syscalls (since they want to call into libc
and create threads).

there is a lot of bikesheding here by people who
don't understand the constraints nor the use-cases.

an actual proposal in the thread that i think is
worth considering is to make the linux syscall
design process involve libc devs so the c api is
designed together with the syscall abi.

unfortunately i still haven't seen a solution that
makes using linux uapi headers together with libc
headers reliable, continuously testing them in
isolation is useful, but that does not solve the
potential conflicts with libc definitions.

2018-11-14 12:04:57

by Adam Borowski

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sun, Nov 11, 2018 at 12:46:35PM +0100, Florian Weimer wrote:
> A lot of multi-threaded applications assume that most high-level
> functionality remains usable even after fork in a multi-threaded
> process.

How would this be even possible? Currently fork kills all threads
(save for the caller).

Glibc's manpage also warns:

# After a fork() in a multithreaded program, the child can safely call only
# async-signal-safe functions (see signal-safety(7)) until such time as it
# calls execve(2).

Which makes sense as its malloc uses a mutex, and you can't take a breath
without a library call using malloc somewhere (or in C++, the language
itself).

So any functionality remaining usable after fork is pretty strictly
limited...


Meow!
--
⢀⣴⠾⠻⢶⣦⠀ I've read an article about how lively happy music boosts
⣾⠁⢰⠒⠀⣿⡁ productivity. You can read it, too, you just need the
⢿⡄⠘⠷⠚⠋⠀ right music while doing so. I recommend Skepticism
⠈⠳⣄⠀⠀⠀⠀ (funeral doom metal).

2018-11-14 12:11:09

by Florian Weimer

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

* Adam Borowski:

> On Sun, Nov 11, 2018 at 12:46:35PM +0100, Florian Weimer wrote:
>> A lot of multi-threaded applications assume that most high-level
>> functionality remains usable even after fork in a multi-threaded
>> process.
>
> How would this be even possible? Currently fork kills all threads
> (save for the caller).

glibc's fork acquires several locks around fork. Other mallocs install
fork handlers, too.

> Glibc's manpage also warns:
>
> # After a fork() in a multithreaded program, the child can safely call only
> # async-signal-safe functions (see signal-safety(7)) until such time as it
> # calls execve(2).
>
> Which makes sense as its malloc uses a mutex, and you can't take a breath
> without a library call using malloc somewhere (or in C++, the language
> itself).

Right, but applications require a working malloc after fork,
unfortunately. opendir is often used to enumerate file descriptors
which need closing, for example.

Thanks,
Florian

2018-11-14 14:47:27

by Andy Lutomirski

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?




> On Nov 14, 2018, at 3:58 AM, Szabolcs Nagy <[email protected]> wrote:
>
>> On 13/11/18 19:39, Dave Martin wrote:
>>> On Mon, Nov 12, 2018 at 05:19:14AM -0800, Daniel Colascione wrote:
>>> We should adopt a similar approach. Shipping a lower-level
>>> "liblinux.so" tightly bound to the kernel would not only let the
>>> kernel bypass glibc's "editorial discretion" in exposing new
>>> facilities to userspace, but would also allow for tighter user-kernel
>>> integration that one can achieve with a simplistic syscall(2)-style
>>> escape hatch. (For example, for a long time now, I've wanted to go
>>> beyond POSIX and improve the system's signal handling API, and this
>>> improvement requires userspace cooperation.) The vdso is probably too
>>> small and simplistic to serve in this role; I'd want a real library.
>>
>> Can you expand on your reasoning here?
>
> such lib creates a useless abi+api layer that
> somebody has to maintain and document (with or
> without vdso).

I’m not so sure it’s useless. Historically, POSIX systems have, in practice and almost by definition, been very C focused, but the world is changing. A less crufty library could be useful for newer languages:

>
> it obviously cannot work together with a posix
> conform libc implementation for which it would
> require knowledge about
>
> thread cancellation internals,

Thread cancellation is a big mess, and we only really need to support it because on legacy code. The whole mechanism should IMO be considered extremely deprecated.

> potentially TLS
> for errno,

errno is IMO a libc thing, full stop. A lower level library should *not* support errno.

> know libc types even ones that are
> based on compile time feature macros (and expose
> them in headers in a way that does not collide
> with libc headers),

This one is tricky. I wonder if we could instead get a C compiler extension to set libc declare that a given struct is a layout-compatible variant of another.

> abi variants the libc supports
> (e.g. softfp, security hardened abi),

Hmm.

> libc
> internal signals (for anything that's changing
> signal masks),

This is nasty, but see my cancellation comment above.

> thread internals for syscalls that
> require coordination between all user created
> threads (setxid),

We should just deal with this in the kernel. The current state of affairs is nuts.

> libc internal state for syscalls
> that create/destroy threads.

I disagree. If you make or destroy threads behind libc’s back, I think you get to keep both pieces.


2018-11-14 14:59:07

by Carlos O'Donell

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 11/14/18 6:58 AM, Szabolcs Nagy wrote:
> an actual proposal in the thread that i think is
> worth considering is to make the linux syscall
> design process involve libc devs so the c api is
> designed together with the syscall abi.

Right, I see at least 2 actionable items:

* "The Checklist" which everyone making a syscall should
follow and we create the checklist with input from both
sides and it becomes the thing you reference e.g.
"Did you follow the checklist? Where is X?"

* Programmatic / Machine readable description of syscalls.
This way the kernel gives users the ability to autogenerate
all the wrappers *if they want to* in a consistent way that
matches this syscall description format.

--
Cheers,
Carlos.

2018-11-14 15:08:35

by Florian Weimer

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

* Andy Lutomirski:

> Thread cancellation is a big mess, and we only really need to support
> it because on legacy code. The whole mechanism should IMO be
> considered extremely deprecated.

The part regarding legacy code is not true: people write new code using
it all the time. It's true that this feature is difficult to use, and
it is often employed in cases where it is not needed or
counterproductive. However, there are cases where code becomes simpler.

Thanks,
Florian

2018-11-14 15:41:57

by Daniel Colascione

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Wed, Nov 14, 2018 at 3:58 AM, Szabolcs Nagy <[email protected]> wrote:
> On 13/11/18 19:39, Dave Martin wrote:
>> On Mon, Nov 12, 2018 at 05:19:14AM -0800, Daniel Colascione wrote:
>>> We should adopt a similar approach. Shipping a lower-level
>>> "liblinux.so" tightly bound to the kernel would not only let the
>>> kernel bypass glibc's "editorial discretion" in exposing new
>>> facilities to userspace, but would also allow for tighter user-kernel
>>> integration that one can achieve with a simplistic syscall(2)-style
>>> escape hatch. (For example, for a long time now, I've wanted to go
>>> beyond POSIX and improve the system's signal handling API, and this
>>> improvement requires userspace cooperation.) The vdso is probably too
>>> small and simplistic to serve in this role; I'd want a real library.
>>
>> Can you expand on your reasoning here?
>
> such lib creates a useless abi+api layer that
> somebody has to maintain and document (with or
> without vdso).

People already maintain the kernel man pages and are very good.

> it obviously cannot work together with a posix
> conform libc implementation for which it would
> require knowledge about

You're incorrect on this point. See programs cobbled together out of
syscall(2) invocations today: despite lack of libc integration, things
do mostly work in practice. Calling through a library can't possible
be worse, and in many ways can be much better.

> thread cancellation internals,

As I mentioned upthread, the only thing a libc needs in order to
support cancellation properly (at least the way glibc does it)
is a way to ask the kernel-provided userspace library whether a
particular program counter address belongs to a certain code sequence
immediately before the system call instruction, whatever that is.
Providing this facility is doable without deep knowledge of libc's
internals, and libc can use it without a deep knowledge of the
interface library.

> potentially TLS
> for errno

As someone else mentioned, errno is a libc construct. It's not *hard*
to support setting errno though: libc could just be required to supply
a well-defined libc_set_errno symbol that the kernel ABI library would
then use as needed.

> know libc types even ones that are
> based on compile time feature macros

This library would not have to do the things that libc does. Why would
it have to support libc's feature test macros at all?

> (and expose
> them in headers in a way that does not collide
> with libc headers)

The kernel should have a set of types and a symbol namespace
completely disjoint from libc's, with no compatibility hacks or macros
needed. (That might take some renaming kernel-side.) If libc wants to
provide a POSIX API, it can take on the responsibility for mapping the
kernel's structures to libc's, but within its namespace, the kernel
should be able to add types without fear of conflict.

> abi variants the libc supports
> (e.g. softfp, security hardened abi), libc
> internal signals (for anything that's changing
> signal masks), thread internals for syscalls that
> require coordination between all user created
> threads

Most proposed new system calls do not create threads, manipulate
signal masks, or muck with other internals, so these concerns just
don't apply. That's why syscall(2) mostly works in practice. Even if a
few new system calls *do* involve these internal details and require
closer libc coordination, the majority (e.g., the new mount API,
termios2) don't, and so can be exposed directly from the kernel
project without being blocked by glibc.

> (setxid),

A kernel-side fix here would be the cleanest approach.

> libc internal state for syscalls
> that create/destroy threads.
>
> and thus such lib does not solve the problems
> of users who actually requested wrappers for
> new syscalls (since they want to call into libc
> and create threads).
>
> there is a lot of bikesheding here by people who
> don't understand the constraints nor the use-cases.

Conversely, there's a lot of doubt-sowing from the other side that
makes shipping a kernel-provided interface library seem harder than it
is. Most new system calls do not bear on the integration concerns that
you and others are raising, and whatever problems remain can be solved
with a narrow interface between libc and a new interface library, one
that would let both evolve independently.

> an actual proposal in the thread that i think is
> worth considering is to make the linux syscall
> design process involve libc devs so the c api is
> designed together with the syscall abi.

After looking at the history of settid, signal multi-handler
registration, and other proposed improvements running into the brick
wall of glibc's API process, I think it's clear that requiring glibc
signoff on new kernel interfaces would simply lead to stagnation. It's
not as if we're approaching the problem from a position of ignorance.
The right answer is a move to an approximation of the BSD model and
bring the primary interface layer in-house.
There's a lot of evidence that this model works.

2018-11-14 17:20:20

by Arnd Bergmann

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Wed, Nov 14, 2018 at 6:58 AM Carlos O'Donell <[email protected]> wrote:
> On 11/14/18 6:58 AM, Szabolcs Nagy wrote:
> > an actual proposal in the thread that i think is
> > worth considering is to make the linux syscall
> > design process involve libc devs so the c api is
> > designed together with the syscall abi.
>
> * Programmatic / Machine readable description of syscalls.
> This way the kernel gives users the ability to autogenerate
> all the wrappers *if they want to* in a consistent way that
> matches this syscall description format.

Firoz Khan is in the process of doing part of this, by changing the
in-kernel per-architecture unistd.h and syscall.S files into a
architecture independent machine-readable format that is used to
generate the existing files. The format will be similar to what
we have on arm/s390/x86 in the syscall.tbl files already.

This is of course only part of the picture, it answers the question
of which syscalls are implemented on an architecture, which number
they have and (ideally) whether they use a standard implementation
or a custom one, but it does not yet relate to the prototype.

Once this work is merged, we can follow up by coming up with a
way to add prototypes and enforcing that the user space wrapper
uses the same argument types as the in-kernel entry point.

Arnd

2018-11-14 17:42:40

by Joseph Myers

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Wed, 14 Nov 2018, Andy Lutomirski wrote:

> I’m not so sure it’s useless. Historically, POSIX systems have, in
> practice and almost by definition, been very C focused, but the world is
> changing. A less crufty library could be useful for newer languages:

Historically, there was once an attempt to rework POSIX into a separate
language-independent definition and language bindings (for C, Fortran, Ada
etc.), but I don't think it got anywhere, and it's probably doubtful
whether the idea was ever very practical. (See the introduction to
POSIX.1:1990, for example: "Future revisions are expected to contain
bindings for other programming languages as well as for the C language.
This will be accomplished by breaking this part of ISO/IEC 9945 into
multiple portions---one defining core requirements independent of any
programming language, and others composed of programming language
bindings.".)

> > thread internals for syscalls that
> > require coordination between all user created
> > threads (setxid),
>
> We should just deal with this in the kernel. The current state of
> affairs is nuts.

Yes, we should have a few new syscalls to set these ids at the process
level.

--
Joseph S. Myers
[email protected]

2018-11-14 18:13:59

by Paul Eggert

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 11/14/18 9:40 AM, Joseph Myers wrote:
> Historically, there was once an attempt to rework POSIX into a separate
> language-independent definition and language bindings (for C, Fortran, Ada
> etc.), but I don't think it got anywhere, and it's probably doubtful
> whether the idea was ever very practical.

That effort did produce IEEE Std 1003.5-1992 (Ada Bindings to IEEE Std
1003.1-1990), IEEE 1003.5b-1996 (Ada bindings for realtime extensions),
and IEEE Std 1003.9-1992 (F77 Bindings to IEEE Std 1003.1-1992). The Ada
group simply translated the POSIX standard from C into Ada, repeating
functional text and coming up with a "thick" standard; in contrast the
Fortran group did a "thin" standard that focused on Fortran mechanics
and deferred underlying functionality to the main POSIX standard. The
thin Fortran standard was harder to grok and was less successful in
practice.

As you write, these efforts were probably not worth the trouble. Non-C
language systems can provide a standard way to invoke C APIs, and then
let user-level programmers have at it. The performance advantage of
having a pure Ada/Fortran/etc. API for POSIX are so minor that it's not
worth the major hassle of standardizing and using a language-independent
POSIX API.


2018-11-14 18:16:58

by Joseph Myers

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Wed, 14 Nov 2018, Daniel Colascione wrote:

> > there is a lot of bikesheding here by people who
> > don't understand the constraints nor the use-cases.
>
> Conversely, there's a lot of doubt-sowing from the other side that

"other side" is the wrong concept here in the first place - it's supposed
to be a matter of cooperating projects trying to find good interfaces
together.

Any new feature from the kernel that is meant to be of use to libcs is
best designed in a way involving such cooperation (with multiple libcs).
I concur with Zack's assessment in
<https://sourceware.org/ml/libc-alpha/2018-11/msg00286.html> that a
technical fix to process / communication issues cannot work here. Any
feature (e.g. syscall library) with a design coming solely from the kernel
rather than a cooperative process is also likely to have an unsuitable
design meaning it doesn't get used. Once we have sufficient communication
to design suitable interfaces *together*, "avoiding the need to
communicate" becomes irrelevant as a design criterion anyway.

> After looking at the history of settid, signal multi-handler
> registration, and other proposed improvements running into the brick
> wall of glibc's API process, I think it's clear that requiring glibc
> signoff on new kernel interfaces would simply lead to stagnation. It's

That there was disagreement on some particular interface does not mean
there are problems with the basic principle of working with libc
maintainers (of multiple libcs, not just one!) to establish what the
intended userspace C API to some new kernel interface should be, and to
nail down the details of how the kernel interface is defined in the
process.

(And as noted elsewhere, I think the main people objecting to generally
having bindings for all non-obsolescent syscalls are no longer active in
glibc.)

If the semantics of some proposed kernel interface, both at the syscall
level and at the userspace C API level, are agreed e.g. by kernel and musl
people, I'd think the API agreement from musl would be a good indication
of the API also being suitable to add to glibc. It's not necessary to get
agreement from every libc on every API - but there should be agreement
from *some* libc that is careful about API review. If enough people with
good sense about libc APIs have judged some API for a new syscall
suitable, I expect other libcs can implement it even if it's not exactly
the API they'd come up with themselves.

(I haven't seen enough comments on libc / kernel API design from people I
know to be associated with bionic, uclibc-ng, etc., to judge if they also
pay similarly careful attention to working out what a good C API design
for some interface should be. Note that there are musl people active on
libc-alpha, which helps everyone arrive at a consensus on better C API
designs.)

> The right answer is a move to an approximation of the BSD model and
> bring the primary interface layer in-house.

I could equally say we should take the kernel in-house and develop it to
better support glibc - that if the kernel doesn't provide what we want, we
should add the features to GNU Linux-libre and say that's the supported
kernel for use with glibc. It's an equally absurd statement in a context
of multiple cooperating projects.

> There's a lot of evidence that this model works.

There's a lot of evidence that the model of separately maintained Linux
kernel and libc works (see: the number of devices using Linux kernels with
a range of different libc implementations that meet different needs).

--
Joseph S. Myers
[email protected]

2018-11-14 18:33:33

by Joseph Myers

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Wed, 14 Nov 2018, Arnd Bergmann wrote:

> Firoz Khan is in the process of doing part of this, by changing the
> in-kernel per-architecture unistd.h and syscall.S files into a
> architecture independent machine-readable format that is used to
> generate the existing files. The format will be similar to what
> we have on arm/s390/x86 in the syscall.tbl files already.

Will this also mean the following are unable to occur in future (both have
occurred in the past):

* A syscall added to unistd.h for an architecture, but not added to the
syscall table until sometime later?

* A syscall added to the native syscall table for some ABI (e.g. 32-bit
x86 or arm) but not added to the corresponding compat syscall table (e.g.
32-bit x86 binaries running on x86_64, 32-bit arm binaries running on
arm64) until sometime later?

Avoiding both of those complications is beneficial to libc (as is a third
thing, avoiding a syscall being added to different architectures in
different versions).

--
Joseph S. Myers
[email protected]

2018-11-14 18:37:41

by Daniel Colascione

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Wed, Nov 14, 2018 at 10:15 AM, Joseph Myers <[email protected]> wrote:
> Any
> feature (e.g. syscall library) with a design coming solely from the kernel
> rather than a cooperative process is also likely to have an unsuitable
> design meaning it doesn't get used.

Is that so? membarrier came directly from the kernel. It gets used and
appears to have a suitable design. That something isn't used by libc
doesn't mean that it doesn't get used in general.

> Once we have sufficient communication
> to design suitable interfaces *together*, "avoiding the need to
> communicate" becomes irrelevant as a design criterion anyway.

If that approach is going to go work, the libc maintainership needs to
be more pragmatic, less idealistic, and less likely to block work on
purity grounds, e.g., we shouldn't do X because the dynamic linker
really should be out-of-process, we can't do Y because nobody should
be using signals, and we can't do Z because the kernel uses IDs that
have such-and-such ugly properties.

A good demonstration of a new commitment to pragmatism would be
merging the trivial wrappers for gettid(2).

2018-11-14 18:48:53

by Joseph Myers

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Wed, 14 Nov 2018, Daniel Colascione wrote:

> A good demonstration of a new commitment to pragmatism would be
> merging the trivial wrappers for gettid(2).

I support the addition of gettid (for use with those syscalls that take
tids, and with appropriate documentation explaining the properties of
tids) - and, generally, wrappers for all non-obsolescent
architecture-independent Linux kernel syscalls, including ones that are
very Linux-specific, except maybe for a few interfaces fundamentally
inconsistent with glibc managing TLS etc. - they are, at least, no worse
as a source of APIs than all the old BSD / SVID interfaces we have from
when those were used as sources of APIs.

--
Joseph S. Myers
[email protected]

2018-11-15 05:31:40

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Wed, Nov 14, 2018 at 06:47:57PM +0000, Joseph Myers wrote:
> On Wed, 14 Nov 2018, Daniel Colascione wrote:
>
> > A good demonstration of a new commitment to pragmatism would be
> > merging the trivial wrappers for gettid(2).
>
> I support the addition of gettid (for use with those syscalls that take
> tids, and with appropriate documentation explaining the properties of
> tids) - and, generally, wrappers for all non-obsolescent
> architecture-independent Linux kernel syscalls, including ones that are
> very Linux-specific, except maybe for a few interfaces fundamentally
> inconsistent with glibc managing TLS etc. - they are, at least, no worse
> as a source of APIs than all the old BSD / SVID interfaces we have from
> when those were used as sources of APIs.

That's great. But is it or is it not true (either de jure or de
facto) that "a single active glibc developer" can block a system call
from being supported by glibc by objecting? And if not, under what is
the process by resolving a conflict?

- Ted

2018-11-15 10:35:50

by Dave Martin

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Wed, Nov 14, 2018 at 12:40:44PM +0100, Florian Weimer wrote:
> * Dave Martin:
>
> > Fair points, though this is rather what I meant by "sane essentials".
> > Because there are strict limits on what can be done in the vDSO, it may
> > be more bloat-resistant and more conservatively maintained.
> >
> > This might provide a way to push some dumb compatibility kludge code
> > that receives little ongoing maintenance outside the privilege wall,
> > whereas it has to sit in the kernel proper today.
> >
> > In theory we could opt to advertise new syscalls only via vDSO entry
> > points, and not maintain __NR_xxx values for them (which may or may
> > not upset ptrace users.) Anyway, I digress...
>
> Is the vDSO available across all architectures? (I don't think we use
> it on all architectures in glibc.)

It's probably not available on all arches.

> If not, a vDSO-based approach would merely lead to even more variance
> between architectures, which can't be a good thing.

That's a fair concern.

Channeling syscalls through the vDSO could allow for a uniform syscall
interface at the ELF linkage level, but only those arches that have a
vDSO. There may be other issues too.

Also, I don't say that we should definitely do this, just that it's
a possibility.

Cheers
---Dave

2018-11-15 16:31:39

by Joseph Myers

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Thu, 15 Nov 2018, Theodore Y. Ts'o wrote:

> That's great. But is it or is it not true (either de jure or de
> facto) that "a single active glibc developer" can block a system call
> from being supported by glibc by objecting? And if not, under what is
> the process by resolving a conflict?

We use a consensus-building process as described at
<https://sourceware.org/glibc/wiki/Consensus>.

--
Joseph S. Myers
[email protected]

2018-11-15 17:09:56

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Thu, Nov 15, 2018 at 04:29:43PM +0000, Joseph Myers wrote:
> On Thu, 15 Nov 2018, Theodore Y. Ts'o wrote:
>
> > That's great. But is it or is it not true (either de jure or de
> > facto) that "a single active glibc developer" can block a system call
> > from being supported by glibc by objecting? And if not, under what is
> > the process by resolving a conflict?
>
> We use a consensus-building process as described at
> <https://sourceware.org/glibc/wiki/Consensus>.

So can a single glibc developer can block Consensus?

I've chaired IETF working groups, where the standard was "Rough
Consensus and Running Code". Strict Consensus very easily ends up
leading to the Librem Veto which did not serve the Polish-Lithuanian
Commonwealth well in the 17th-18th centuries....

- Ted

2018-11-15 17:16:20

by Joseph Myers

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Thu, 15 Nov 2018, Theodore Y. Ts'o wrote:

> On Thu, Nov 15, 2018 at 04:29:43PM +0000, Joseph Myers wrote:
> > On Thu, 15 Nov 2018, Theodore Y. Ts'o wrote:
> >
> > > That's great. But is it or is it not true (either de jure or de
> > > facto) that "a single active glibc developer" can block a system call
> > > from being supported by glibc by objecting? And if not, under what is
> > > the process by resolving a conflict?
> >
> > We use a consensus-building process as described at
> > <https://sourceware.org/glibc/wiki/Consensus>.
>
> So can a single glibc developer can block Consensus?

If it's a sustained objection - it still works an awful lot better than
how things worked before 2011/12. (See my suggestion of having a process
involving a supermajority vote of the GNU maintainers for glibc in the
rare cases where a consensus cannot be reached - but those are rare enough
that actually agreeing a process for such cases has never been a
priority.)

--
Joseph S. Myers
[email protected]

2018-11-15 20:36:54

by Carlos O'Donell

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 11/14/18 1:47 PM, Joseph Myers wrote:
> On Wed, 14 Nov 2018, Daniel Colascione wrote:
>
>> A good demonstration of a new commitment to pragmatism would be
>> merging the trivial wrappers for gettid(2).
>
> I support the addition of gettid (for use with those syscalls that take
> tids, and with appropriate documentation explaining the properties of
> tids) - and, generally, wrappers for all non-obsolescent
> architecture-independent Linux kernel syscalls, including ones that are
> very Linux-specific, except maybe for a few interfaces fundamentally
> inconsistent with glibc managing TLS etc. - they are, at least, no worse
> as a source of APIs than all the old BSD / SVID interfaces we have from
> when those were used as sources of APIs.

I agree. Documentation is important.

--
Cheers,
Carlos.

2018-11-15 21:02:28

by Carlos O'Donell

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 11/15/18 12:08 PM, Theodore Y. Ts'o wrote:
> On Thu, Nov 15, 2018 at 04:29:43PM +0000, Joseph Myers wrote:
>> On Thu, 15 Nov 2018, Theodore Y. Ts'o wrote:
>>
>>> That's great. But is it or is it not true (either de jure or de
>>> facto) that "a single active glibc developer" can block a system call
>>> from being supported by glibc by objecting? And if not, under what is
>>> the process by resolving a conflict?
>>
>> We use a consensus-building process as described at
>> <https://sourceware.org/glibc/wiki/Consensus>.
>
> So can a single glibc developer can block Consensus?

Yes.

I think the comparison to the "liberum veto" is not a fair
comparison to the way the glibc community works :-)

(1) Community consensus.

Consensus need not imply unanimity.

Consensus is only from the set of important and concerned
interests. The community gets to decide that you're a troll
that does no real work, and can therefore ignore you.

Consensus is blocked only by sustained objection (not just
normal objections, which are recorded as part of the
development process e.g. "I don't like it, but I leave it
up to you to decide").

Therefore an involved glibc developer can lodge a sustained
objection, and block consensus.

(2) The GNU package maintainers for glibc.

There are 8 GNU package maintainers for glibc.

The package maintainers created the consensus process to
empower the community, but they can act as a final
review committee to move issues where there are two
reasonable but competing view points.

As Joseph points out we haven't ever used the GNU pakcage
maintainers to vote on a stuck issue, but I will arrange
it when the need arises. If you think we're at that point
with wrapper functions, just say so, but it doesn't seem
like it to me.

--
Cheers,
Carlos.

2018-11-16 21:26:24

by Alan Cox

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?


> I think the issue is a bit more complex :
> - linux doesn't support a single libc
> - glibc doesn't support a single OS
>
> In practice we all know (believe?) that both statements above are
> true but in practice 99% of the time there's a 1:1 relation between
> these two components.

The top linux C library is probably the Android one. Given the number
of containers now running Alpine and the number of embedded devices it's
probably a good fight going on for 2nd, 3rd and 4th. It is certainly not
a Linux/Glibc world any more.

Alan

2018-11-24 08:35:18

by Florian Weimer

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

* Daniel Colascione:

> On Mon, Nov 12, 2018 at 12:11 AM, Florian Weimer <[email protected]> wrote:
>> * Daniel Colascione:
>>
>>> If the kernel provides a system call, libc should provide a C wrapper
>>> for it, even if in the opinion of the libc maintainers, that system
>>> call is flawed.
>>
>> It's not that simple, I think. What about bdflush? socketcall?
>> getxpid? osf_gettimeofday? set_robust_list?
>
> What about them? Mentioning that these system calls exist is not in
> itself an argument.

But socketcall does not exist on all architectures. Neither does
getpid, it's called getxpid on some architectures.

>> There are quite a few irregularities
>
> So?

I think it would be a poor approach to expose application developers to
these portability issues. We need to abstract over these differences at
a certain layer, and applications are too late.

>> and some editorial discretion appears to be unavoidable.
>
> That's an assertion, not an argument, and I strongly disagree. *Why*
> do you think "editorial discretion" is unavoidable?

We do not want application authors to write code which uses socketcall,
however it is the right system call for the BSD sockets API if you need
compatibility back to Linux 2.6.32 and before. If we application
authors seitched to socketall, applications would not be portable (at
the source level) to new architectures which do not have socketcall.

We do not want to force application authors to call osf_gettimeofday
instead of gettimeofday on Alpha.

We do not want to encourage library authors to call set_robust_list
because doing so would break robust mutex support in any libc.

>> Even if we were to provide perfectly consistent system call wrappers
>> under separate names, we'd still expose different calling conventions
>> for things like off_t to applications, which would make using some of
>> the system calls quite difficult and surprisingly non-portable.
>
> We can learn something from how Windows does things. On that system,
> what we think of as "libc" is actually two parts. (More, actually, but
> I'm simplifying.) At the lowest level, you have the semi-documented
> ntdll.dll, which contains raw system call wrappers and arcane
> kernel-userland glue. On top of ntdll live the "real" libc
> (msvcrt.dll, kernel32.dll, etc.) that provide conventional
> application-level glue. The tight integration between ntdll.dll and
> the kernel allows Windows to do very impressive things.

> We should adopt a similar approach.

Most kernel developers claim that a stable userspace ABI is desirable.
With your proposal, we need to maintain three stable ABI layers instead
of two, without actually adding any functionality. That doesn't seem to
be a good way of using developer resources.

Thanks,
Florian

2018-11-24 08:36:39

by David Newall

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 24/11/18 12:04 am, Florian Weimer wrote:
> But socketcall does not exist on all architectures. Neither does
> getpid, it's called getxpid on some architectures.
> ...
> I think it would be a poor approach to expose application developers to
> these portability issues. We need to abstract over these differences at
> a certain layer, and applications are too late.

Interesting.  I think the opposite.  I think exposing the OS's
interfaces is exactly what a c-library should do.  It might also provide
alternative interfaces that work consistently across different
platforms, but in addition to, not instead of the OS interface.


2018-11-24 08:40:04

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 23/11/18 14:11, David Newall wrote:
> On 24/11/18 12:04 am, Florian Weimer wrote:
>> But socketcall does not exist on all architectures.  Neither does
>> getpid, it's called getxpid on some architectures.
>> ...
>> I think it would be a poor approach to expose application developers to
>> these portability issues.  We need to abstract over these differences at
>> a certain layer, and applications are too late.
>
> Interesting.  I think the opposite.  I think exposing the OS's interfaces is exactly what a c-library should do.  It might also provide
> alternative interfaces that work consistently across different platforms, but in addition to, not instead of the OS interface.

you don't understand the point of the c language if you think so.

2018-11-24 08:52:09

by Daniel Colascione

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Fri, Nov 23, 2018 at 5:34 AM Florian Weimer <[email protected]> wrote:
>
> * Daniel Colascione:
>
> > On Mon, Nov 12, 2018 at 12:11 AM, Florian Weimer <[email protected]> wrote:
> >> * Daniel Colascione:
> >>
> >>> If the kernel provides a system call, libc should provide a C wrapper
> >>> for it, even if in the opinion of the libc maintainers, that system
> >>> call is flawed.
> >>
> >> It's not that simple, I think. What about bdflush? socketcall?
> >> getxpid? osf_gettimeofday? set_robust_list?
> >
> > What about them? Mentioning that these system calls exist is not in
> > itself an argument.
>
> But socketcall does not exist on all architectures. Neither does
> getpid, it's called getxpid on some architectures.

So what? On systems on which a given system call does not exist,
attempts to link against that system call should fail, or attempts to
make that system call should fail at runtime with ENOSYS. That's
completely expected and unsurprising behavior, not some unavoidable
source of catastrophic confusion.

> >> There are quite a few irregularities
> >
> > So?
>
> I think it would be a poor approach to expose application developers to
> these portability issues. We need to abstract over these differences at
> a certain layer, and applications are too late.

And glibc is too early. The purpose of the lowest-level user library
is not to provide an OS-agnostic portability layer. There are other
projects much better-suited to providing portability, including the
excellent GLib, Gnulib, and Qt. The purpose of the lowest-level user
library is to expose the interfaces of the underlying system, whatever
they are. That's a basic tenet of layered interface design.

Due to historical accident, the same library (on most Linux systems)
serves as both the lowest-level user library and an implementation of
some antiquated portability constructs from ANSI C and POSIX. That
this library provides these old portability interfaces is not a reason
for that library to neglect its responsibility as the lowest-level
system interface library. If people find that every attempt to expose
even trivial new kernel interfaces turns into an endless trek through
a swamp of specious objection (see the gettid debacle), then it
becomes perfectly reasonable to find an alternate route over firmer
ground.

Other glibc developers (e.g., Joseph Myers) have expressed support for
adding long-missing system call wrappers, like gettid, as long as the
functions are adequately documented. Would you make a sustained
objection to these additions?

> >> and some editorial discretion appears to be unavoidable.
> >
> > That's an assertion, not an argument, and I strongly disagree. *Why*
> > do you think "editorial discretion" is unavoidable?
>
> We do not want application authors to write code which uses socketcall,

That's an opinion on portability, not an argument for the necessity of
"editorial discretion". That you think an application calling
socketcall would somehow be a bad idea is not a justification for not
providing this interface. Low-level libraries must focus on mechanism,
not policy, if a system is to be flexible enough to accommodate
unanticipated needs.

2018-11-24 08:55:26

by Dmitry V. Levin

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Fri, Nov 23, 2018 at 12:15:39PM -0800, Daniel Colascione wrote:
> On Fri, Nov 23, 2018 at 5:34 AM Florian Weimer wrote:
> > > On Mon, Nov 12, 2018 at 12:11 AM, Florian Weimer wrote:
> > >>
> > >>> If the kernel provides a system call, libc should provide a C wrapper
> > >>> for it, even if in the opinion of the libc maintainers, that system
> > >>> call is flawed.
> > >>
> > >> It's not that simple, I think. What about bdflush? socketcall?
> > >> getxpid? osf_gettimeofday? set_robust_list?
> > >
> > > What about them? Mentioning that these system calls exist is not in
> > > itself an argument.
> >
> > But socketcall does not exist on all architectures. Neither does
> > getpid, it's called getxpid on some architectures.
>
> So what? On systems on which a given system call does not exist,
> attempts to link against that system call should fail, or attempts to
> make that system call should fail at runtime with ENOSYS. That's
> completely expected and unsurprising behavior, not some unavoidable
> source of catastrophic confusion.

I'm sorry but you've just said that getpid() must either be unavailable or
fail on those architectures that provide no syscall with exactly the same
semantics as getpid syscall. Nobody is going to use a libc that doesn't
provide getpid() in a reliable way.

If you really need a 1-1 correspondence between syscalls and C wrappers,
there is syscall(3) with all associated portability issues.

If you need something else, please be more specific, i.e. be ready to give
a detailed answer about every syscall ever supported by the kernel,
on every supported architecture.

My first trivial question is, do you need C wrappers for
__NR_epoll_create, __NR_eventfd, __NR_inotify_init,
and __NR_signalfd syscalls?


--
ldv


Attachments:
(No filename) (1.77 kB)
signature.asc (817.00 B)
Download all attachments

2018-11-24 09:01:01

by David Newall

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 24/11/18 1:53 am, Szabolcs Nagy wrote:
> On 23/11/18 14:11, David Newall wrote:
>> On 24/11/18 12:04 am, Florian Weimer wrote:
>>> But socketcall does not exist on all architectures.  Neither does
>>> getpid, it's called getxpid on some architectures.
>>> ...
>>> I think it would be a poor approach to expose application developers to
>>> these portability issues.  We need to abstract over these differences at
>>> a certain layer, and applications are too late.
>> Interesting.  I think the opposite.  I think exposing the OS's interfaces is exactly what a c-library should do.  It might also provide
>> alternative interfaces that work consistently across different platforms, but in addition to, not instead of the OS interface.
> you don't understand the point of the c language if you think so.

I understand the point of C, thank you very much, and we're talking
about the C library, not the language.  I don't understand the point of
your rudeness.


2018-11-28 13:19:38

by David Laight

[permalink] [raw]
Subject: RE: Official Linux system wrapper library?

From: David Newall
> Sent: 23 November 2018 14:11
>
> On 24/11/18 12:04 am, Florian Weimer wrote:
> > But socketcall does not exist on all architectures. Neither does
> > getpid, it's called getxpid on some architectures.
> > ...
> > I think it would be a poor approach to expose application developers to
> > these portability issues. We need to abstract over these differences at
> > a certain layer, and applications are too late.
>
> Interesting.  I think the opposite.  I think exposing the OS's
> interfaces is exactly what a c-library should do.  It might also provide
> alternative interfaces that work consistently across different
> platforms, but in addition to, not instead of the OS interface.

Also, it really shouldn't implement broken workarounds for 'missing'
system calls.
At least one C library I've met converted pread() into lseek() and read().
That is just so broken it is better to fail to link or fail at runtime.

Never mind all the fun trying to read CLOCK_MONOTONIC.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2018-12-09 04:41:01

by Randy Dunlap

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 11/12/18 8:08 AM, Jonathan Corbet wrote:
> On Sun, 11 Nov 2018 18:36:30 -0800
> Greg KH <[email protected]> wrote:
>
>> We should have a checklist. That's a great idea. Now to find someone
>> to write it... :)
>
> Do we think the LPC session might have the right people to create such a
> thing? If so, I can try to put together a coherent presentation of the
> result.

Hi,
Did anything ever happen with this syscall checklist suggestion?

thnx,
--
~Randy

2018-12-10 16:34:51

by Jonathan Corbet

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On Sat, 8 Dec 2018 20:38:56 -0800
Randy Dunlap <[email protected]> wrote:

> On 11/12/18 8:08 AM, Jonathan Corbet wrote:
> > On Sun, 11 Nov 2018 18:36:30 -0800
> > Greg KH <[email protected]> wrote:
> >
> >> We should have a checklist. That's a great idea. Now to find someone
> >> to write it... :)
> >
> > Do we think the LPC session might have the right people to create such a
> > thing? If so, I can try to put together a coherent presentation of the
> > result.
>
> Hi,
> Did anything ever happen with this syscall checklist suggestion?

No, we really didn't have the right people around to do that,
unfortunately.

jon

2018-12-10 18:15:50

by Carlos O'Donell

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 12/10/18 11:27 AM, Jonathan Corbet wrote:
> On Sat, 8 Dec 2018 20:38:56 -0800
> Randy Dunlap <[email protected]> wrote:
>
>> On 11/12/18 8:08 AM, Jonathan Corbet wrote:
>>> On Sun, 11 Nov 2018 18:36:30 -0800
>>> Greg KH <[email protected]> wrote:
>>>
>>>> We should have a checklist. That's a great idea. Now to find someone
>>>> to write it... :)
>>>
>>> Do we think the LPC session might have the right people to create such a
>>> thing? If so, I can try to put together a coherent presentation of the
>>> result.
>>
>> Hi,
>> Did anything ever happen with this syscall checklist suggestion?
>
> No, we really didn't have the right people around to do that,
> unfortunately.

We already have Documentation/process/adding-syscalls.rst.

The documentation there is quite thorough.

It lists things that people commonly forget e.g. email [email protected].

Would it be acceptable to attempt to collate per-libc information
into the adding-syscalls.rst under a new section called:

"Integration with libc"

--
Cheers,
Carlos.

2018-12-11 02:07:51

by Randy Dunlap

[permalink] [raw]
Subject: Re: Official Linux system wrapper library?

On 12/10/18 9:39 AM, Carlos O'Donell wrote:
> On 12/10/18 11:27 AM, Jonathan Corbet wrote:
>> On Sat, 8 Dec 2018 20:38:56 -0800
>> Randy Dunlap <[email protected]> wrote:
>>
>>> On 11/12/18 8:08 AM, Jonathan Corbet wrote:
>>>> On Sun, 11 Nov 2018 18:36:30 -0800
>>>> Greg KH <[email protected]> wrote:
>>>>
>>>>> We should have a checklist. That's a great idea. Now to find someone
>>>>> to write it... :)
>>>>
>>>> Do we think the LPC session might have the right people to create such a
>>>> thing? If so, I can try to put together a coherent presentation of the
>>>> result.
>>>
>>> Hi,
>>> Did anything ever happen with this syscall checklist suggestion?
>>
>> No, we really didn't have the right people around to do that,
>> unfortunately.
>
> We already have Documentation/process/adding-syscalls.rst.
>
> The documentation there is quite thorough.
>
> It lists things that people commonly forget e.g. email [email protected].
>
> Would it be acceptable to attempt to collate per-libc information
> into the adding-syscalls.rst under a new section called:
>
> "Integration with libc"
>

I think that updates to adding-syscalls.rst would be sufficient,
instead of having a new/separate syscalls-checklist file.

thanks,
--
~Randy