2016-04-13 12:58:06

by Richard W.M. Jones

[permalink] [raw]
Subject: [PATCH v2 0/2] vfs: Define new syscall getumask.

v1 -> v2:

- Use current_umask() instead of current->fs->umask.

- Retested it.

----------------------------------------------------------------------

It's not possible to read the process umask without also modifying it,
which is what umask(2) does. A library cannot read umask safely,
especially if the main program might be multithreaded.

This patch series adds a trivial system call "getumask" which returns
the umask of the current process.

Another approach to this has been attempted before, adding something
to /proc, although it didn't go anywhere. See:

http://comments.gmane.org/gmane.linux.kernel/1292109

Another way to solve this would be to add a thread-safe getumask to
glibc. Since glibc could own the mutex, this would permit libraries
linked to this glibc to read umask safely.

I should also note that man-pages documents getumask(3), but no
version of glibc has ever implemented it.

Typical test script:

#include <stdio.h>
#include <stdlib.h>
#include <linux/unistd.h>
#include <sys/syscall.h>

int main(int argc, char *argv[])
{
int r = syscall(329);
if (r == -1) {
perror("getumask");
exit(1);
}
printf("umask = %o\n", r);
exit(0);
}

$ ./getumask
umask = 22

Rich.


2016-04-13 12:58:12

by Richard W.M. Jones

[permalink] [raw]
Subject: [PATCH v2 1/2] vfs: Define new syscall getumask.

Define a system call for reading the current umask value.

Signed-off-by: Richard W.M. Jones <[email protected]>
---
include/linux/syscalls.h | 1 +
kernel/sys.c | 5 +++++
2 files changed, 6 insertions(+)

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index d795472..e96e88f 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -659,6 +659,7 @@ asmlinkage long sys_prlimit64(pid_t pid, unsigned int resource,
struct rlimit64 __user *old_rlim);
asmlinkage long sys_getrusage(int who, struct rusage __user *ru);
asmlinkage long sys_umask(int mask);
+asmlinkage long sys_getumask(void);

asmlinkage long sys_msgget(key_t key, int msgflg);
asmlinkage long sys_msgsnd(int msqid, struct msgbuf __user *msgp,
diff --git a/kernel/sys.c b/kernel/sys.c
index cf8ba54..9db526c 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1649,6 +1649,11 @@ SYSCALL_DEFINE1(umask, int, mask)
return mask;
}

+SYSCALL_DEFINE0(getumask)
+{
+ return current_umask();
+}
+
static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
{
struct fd exe;
--
2.7.4

2016-04-13 12:58:20

by Richard W.M. Jones

[permalink] [raw]
Subject: [PATCH v2 2/2] x86: Wire up new getumask system call on x86.

Signed-off-by: Richard W.M. Jones <[email protected]>
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
2 files changed, 2 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index b30dd81..af0a032 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -386,3 +386,4 @@
377 i386 copy_file_range sys_copy_file_range
378 i386 preadv2 sys_preadv2
379 i386 pwritev2 sys_pwritev2
+380 i386 getumask sys_getumask
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index cac6d17..47c1579 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -335,6 +335,7 @@
326 common copy_file_range sys_copy_file_range
327 64 preadv2 sys_preadv2
328 64 pwritev2 sys_pwritev2
+329 common getumask sys_getumask

#
# x32-specific system call numbers start at 512 to avoid cache impact
--
2.7.4

2016-04-13 13:20:39

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] vfs: Define new syscall getumask.

On Wed, Apr 13, 2016 at 01:57:51PM +0100, Richard W.M. Jones wrote:
> Define a system call for reading the current umask value.
>
> Signed-off-by: Richard W.M. Jones <[email protected]>

Btw don't we have to declare it in include/uapi/asm-generic/unistd.h as well?

2016-04-13 13:57:16

by Richard W.M. Jones

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] vfs: Define new syscall getumask.

On Wed, Apr 13, 2016 at 04:20:32PM +0300, Cyrill Gorcunov wrote:
> On Wed, Apr 13, 2016 at 01:57:51PM +0100, Richard W.M. Jones wrote:
> > Define a system call for reading the current umask value.
> >
> > Signed-off-by: Richard W.M. Jones <[email protected]>
>
> Btw don't we have to declare it in include/uapi/asm-generic/unistd.h as well?

Yes, I think I do. I was following pwritev2 which wasn't added
to this file, but other recent system calls (mlock2, copy_file_range)
were added.

TBH the documentation for this file is not very clear...

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

2016-04-13 14:00:22

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On Wed, Apr 13, 2016 at 01:57:50PM +0100, Richard W.M. Jones wrote:
> v1 -> v2:
>
> - Use current_umask() instead of current->fs->umask.
>
> - Retested it.
>
> ----------------------------------------------------------------------
>
> It's not possible to read the process umask without also modifying it,
> which is what umask(2) does. A library cannot read umask safely,
> especially if the main program might be multithreaded.
>
> This patch series adds a trivial system call "getumask" which returns
> the umask of the current process.
>
> Another approach to this has been attempted before, adding something
> to /proc, although it didn't go anywhere. See:
>
> http://comments.gmane.org/gmane.linux.kernel/1292109
>
> Another way to solve this would be to add a thread-safe getumask to
> glibc. Since glibc could own the mutex, this would permit libraries
> linked to this glibc to read umask safely.
>
> I should also note that man-pages documents getumask(3), but no
> version of glibc has ever implemented it.
>
> Typical test script:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <linux/unistd.h>
> #include <sys/syscall.h>
>
> int main(int argc, char *argv[])
> {
> int r = syscall(329);
> if (r == -1) {
> perror("getumask");
> exit(1);
> }
> printf("umask = %o\n", r);
> exit(0);
> }

Why not add this to the ktest infrastructure, we strongly encourage that
for new syscalls, along with a man page patch.

thanks,

greg k-h

2016-04-13 14:02:10

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] vfs: Define new syscall getumask.

On Wed, Apr 13, 2016 at 02:57:08PM +0100, Richard W.M. Jones wrote:
> On Wed, Apr 13, 2016 at 04:20:32PM +0300, Cyrill Gorcunov wrote:
> > On Wed, Apr 13, 2016 at 01:57:51PM +0100, Richard W.M. Jones wrote:
> > > Define a system call for reading the current umask value.
> > >
> > > Signed-off-by: Richard W.M. Jones <[email protected]>
> >
> > Btw don't we have to declare it in include/uapi/asm-generic/unistd.h as well?
>
> Yes, I think I do. I was following pwritev2 which wasn't added
> to this file, but other recent system calls (mlock2, copy_file_range)
> were added.

Indeed. I'll send a patch to wire up pread/writev2, sorry for causing
your conflicts, but adding syscalls is a bit of a mess.

2016-04-13 15:27:20

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] vfs: Define new syscall getumask.

----- On Apr 13, 2016, at 9:57 AM, Richard W.M. Jones [email protected] wrote:

> On Wed, Apr 13, 2016 at 04:20:32PM +0300, Cyrill Gorcunov wrote:
>> On Wed, Apr 13, 2016 at 01:57:51PM +0100, Richard W.M. Jones wrote:
>> > Define a system call for reading the current umask value.
>> >
>> > Signed-off-by: Richard W.M. Jones <[email protected]>
>>
>> Btw don't we have to declare it in include/uapi/asm-generic/unistd.h as well?
>
> Yes, I think I do. I was following pwritev2 which wasn't added
> to this file, but other recent system calls (mlock2, copy_file_range)
> were added.
>
> TBH the documentation for this file is not very clear...

asm-generic/unistd.h defines the system call for a few
architectures.

grep -r asm-generic/unistd.h arch/*/include/
arch/arc/include/uapi/asm/unistd.h:#include <asm-generic/unistd.h>
arch/arc/include/uapi/asm/unistd.h:/* Generic syscall (fs/filesystems.c - lost in asm-generic/unistd.h */
arch/arm64/include/uapi/asm/unistd.h:#include <asm-generic/unistd.h>
arch/c6x/include/uapi/asm/unistd.h:#include <asm-generic/unistd.h>
arch/h8300/include/uapi/asm/unistd.h:#include <asm-generic/unistd.h>
arch/hexagon/include/uapi/asm/unistd.h:#include <asm-generic/unistd.h>
arch/metag/include/uapi/asm/unistd.h:#include <asm-generic/unistd.h>
arch/nios2/include/uapi/asm/unistd.h:#include <asm-generic/unistd.h>
arch/openrisc/include/uapi/asm/unistd.h:#include <asm-generic/unistd.h>
arch/score/include/uapi/asm/unistd.h:#include <asm-generic/unistd.h>
arch/tile/include/uapi/asm/unistd.h:#include <asm-generic/unistd.h>
arch/unicore32/include/uapi/asm/unistd.h:#include <asm-generic/unistd.h>

Wiring up the system call in this header means adding
support for this system call on all those architectures.

Thanks,

Mathieu

>
> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-df lists disk usage of guests without needing to install any
> software inside the virtual machine. Supports Linux and Windows.
> http://people.redhat.com/~rjones/virt-df/

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2016-04-13 15:39:35

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

----- On Apr 13, 2016, at 8:57 AM, Richard W.M. Jones [email protected] wrote:

> v1 -> v2:
>
> - Use current_umask() instead of current->fs->umask.
>
> - Retested it.
>
> ----------------------------------------------------------------------
>
> It's not possible to read the process umask without also modifying it,
> which is what umask(2) does. A library cannot read umask safely,
> especially if the main program might be multithreaded.
>
> This patch series adds a trivial system call "getumask" which returns
> the umask of the current process.

In addition to this system call, we could extend a variation of my
thread_local_abi system call (https://lkml.org/lkml/2016/4/4/455)
(could be without features flags, or an entirely new system call
specifically for a umask cache) to register a "current umask" cache
located in a TLS area.

Basically, reading the current umask value would be a simple load from
a TLS variable. This could also allow quickly blocking and unblocking
signal delivery from user-space by storing a mask to this TLS area.

The kernel could then look into the signal mask in this TLS area whenever
it needs to deliver a signal (assuming this code path can take
user-space faults), in addition to the mask kept within the
task struct.

This "tls cache" idea could also apply to setting a CPU affinity to the
currently running CPU for short user-space critical sections.

The benefit here is to get _very_ fast operations on the thread umask
and cpu affinity.

Are those ideas too far-fetched ?

Thanks,

Mathieu

>
> Another approach to this has been attempted before, adding something
> to /proc, although it didn't go anywhere. See:
>
> http://comments.gmane.org/gmane.linux.kernel/1292109
>
> Another way to solve this would be to add a thread-safe getumask to
> glibc. Since glibc could own the mutex, this would permit libraries
> linked to this glibc to read umask safely.
>
> I should also note that man-pages documents getumask(3), but no
> version of glibc has ever implemented it.
>
> Typical test script:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <linux/unistd.h>
> #include <sys/syscall.h>
>
> int main(int argc, char *argv[])
> {
> int r = syscall(329);
> if (r == -1) {
> perror("getumask");
> exit(1);
> }
> printf("umask = %o\n", r);
> exit(0);
> }
>
> $ ./getumask
> umask = 22
>
> Rich.

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2016-04-13 15:41:48

by Colin Walters

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On Wed, Apr 13, 2016, at 08:57 AM, Richard W.M. Jones wrote:

> It's not possible to read the process umask without also modifying it,
> which is what umask(2) does. A library cannot read umask safely,
> especially if the main program might be multithreaded.

I assume you just want to do this from a shared library so you can
determine whether or not you need to call fchown() after making files
and the like? If that's the case it'd be good to note it in the commit
message.

BTW...it might be a good idea to add a flags argument:
https://lwn.net/Articles/585415/

Did you consider calling this `umask2`, having the initial version only support
retrieving it via a UMASK_GET flag, and lay the groundwork to support
setting a threadsafe umask with a UMASK_SET_THREAD flag?

2016-04-13 16:03:27

by Richard W.M. Jones

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On Wed, Apr 13, 2016 at 11:41:45AM -0400, Colin Walters wrote:
> On Wed, Apr 13, 2016, at 08:57 AM, Richard W.M. Jones wrote:
>
> > It's not possible to read the process umask without also modifying it,
> > which is what umask(2) does. A library cannot read umask safely,
> > especially if the main program might be multithreaded.
>
> I assume you just want to do this from a shared library so you can
> determine whether or not you need to call fchown() after making files
> and the like? If that's the case it'd be good to note it in the commit
> message.

Yes, the use case is something like that. I write a shared library
(libguestfs) and we get bug reports that turn out to be caused by odd
umask settings. Of course we fix these on a case-by-case basis, but
we also want to include the current umask in debug output so that we
can identify the problem quickly in future reports.

Actually I wrote a rather involved getumask substitute:

https://github.com/libguestfs/libguestfs/blob/master/src/launch.c#L477

It works by creating a temporary directory, writing a file inside that
directory with mode 0777, then calling fstat(2) to work out what mode
the kernel gave it.

It turns out this code is not even correct. It was pointed out to me
that there is a filesystem umask mount option (and fmask, dmask too)
which stops this from working properly.

So it's a lot of work to read umask safely inside a shared library.

I will update the commit comment with a brief summary of the above.

> BTW...it might be a good idea to add a flags argument:
> https://lwn.net/Articles/585415/
>
> Did you consider calling this `umask2`, having the initial version
> only support retrieving it via a UMASK_GET flag, and lay the
> groundwork to support setting a threadsafe umask with a
> UMASK_SET_THREAD flag?

Can certainly do it like this if that is preferable.

For my needs, getumask as implemented now is sufficient.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

2016-04-13 21:01:34

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

----- On Apr 13, 2016, at 11:39 AM, Mathieu Desnoyers [email protected] wrote:

> ----- On Apr 13, 2016, at 8:57 AM, Richard W.M. Jones [email protected] wrote:
>
>> v1 -> v2:
>>
>> - Use current_umask() instead of current->fs->umask.
>>
>> - Retested it.
>>
>> ----------------------------------------------------------------------
>>
>> It's not possible to read the process umask without also modifying it,
>> which is what umask(2) does. A library cannot read umask safely,
>> especially if the main program might be multithreaded.
>>
>> This patch series adds a trivial system call "getumask" which returns
>> the umask of the current process.
>
> In addition to this system call, we could extend a variation of my
> thread_local_abi system call (https://lkml.org/lkml/2016/4/4/455)
> (could be without features flags, or an entirely new system call
> specifically for a umask cache) to register a "current umask" cache
> located in a TLS area.
>
> Basically, reading the current umask value would be a simple load from
> a TLS variable.

I'm actually discussing 3 separate things here: the umask, sigmask, and
cpu affinity mask.

Not sure if caching the umask in a TLS would be that useful, though.
The caching idea seems to make more sense for signal mask and cpu
affinity mask.

Thanks,

Mathieu

> This could also allow quickly blocking and unblocking
> signal delivery from user-space by storing a mask to this TLS area.
>
> The kernel could then look into the signal mask in this TLS area whenever
> it needs to deliver a signal (assuming this code path can take
> user-space faults), in addition to the mask kept within the
> task struct.
>
> This "tls cache" idea could also apply to setting a CPU affinity to the
> currently running CPU for short user-space critical sections.
>
> The benefit here is to get _very_ fast operations on the thread umask
> and cpu affinity.
>
> Are those ideas too far-fetched ?
>
> Thanks,
>
> Mathieu
>
>>
>> Another approach to this has been attempted before, adding something
>> to /proc, although it didn't go anywhere. See:
>>
>> http://comments.gmane.org/gmane.linux.kernel/1292109
>>
>> Another way to solve this would be to add a thread-safe getumask to
>> glibc. Since glibc could own the mutex, this would permit libraries
>> linked to this glibc to read umask safely.
>>
>> I should also note that man-pages documents getumask(3), but no
>> version of glibc has ever implemented it.
>>
>> Typical test script:
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <linux/unistd.h>
>> #include <sys/syscall.h>
>>
>> int main(int argc, char *argv[])
>> {
>> int r = syscall(329);
>> if (r == -1) {
>> perror("getumask");
>> exit(1);
>> }
>> printf("umask = %o\n", r);
>> exit(0);
>> }
>>
>> $ ./getumask
>> umask = 22
>>
>> Rich.
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2016-04-14 02:14:20

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On Wed, Apr 13, 2016 at 09:01:25PM +0000, Mathieu Desnoyers wrote:
> I'm actually discussing 3 separate things here: the umask, sigmask, and
> cpu affinity mask.

The last two are available in /proc/<pid>/status --- which brings up
the question why not just add umask to /proc/<pid>/status?

That way the shared library can read it via /proc/self/status, but
this way it would be possible to look at other process's umask values
this as well.

> >> Another approach to this has been attempted before, adding something
> >> to /proc, although it didn't go anywhere. See:
> >>
> >> http://comments.gmane.org/gmane.linux.kernel/1292109

... and indeed that's what I suggested. It looks like from the thread
that it petered out due to apathy instead of people not liking the
idea.

One other reason to suggest using a /proc file is that you're not at
the mercy of the glibc folks to wire up a new system call. (Glibc has
been refusing to wire up getrandom(2), for example. Grrrr.....)

- Ted

2016-04-14 03:47:57

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On Wed, 13 Apr 2016 06:59:50 -0700
Greg KH <[email protected]> wrote:


> Why not add this to the ktest infrastructure, we strongly encourage that
> for new syscalls, along with a man page patch.

Do you mean the "selftest infrastructure"? As I don't see how this
could be used with ktest.

See tools/testing/selftests/

-- Steve

2016-04-14 17:57:01

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On Apr 13, 2016 2:01 PM, "Mathieu Desnoyers"
<[email protected]> wrote:
>
> ----- On Apr 13, 2016, at 11:39 AM, Mathieu Desnoyers [email protected] wrote:
>
> > ----- On Apr 13, 2016, at 8:57 AM, Richard W.M. Jones [email protected] wrote:
> >
> >> v1 -> v2:
> >>
> >> - Use current_umask() instead of current->fs->umask.
> >>
> >> - Retested it.
> >>
> >> ----------------------------------------------------------------------
> >>
> >> It's not possible to read the process umask without also modifying it,
> >> which is what umask(2) does. A library cannot read umask safely,
> >> especially if the main program might be multithreaded.
> >>
> >> This patch series adds a trivial system call "getumask" which returns
> >> the umask of the current process.
> >
> > In addition to this system call, we could extend a variation of my
> > thread_local_abi system call (https://lkml.org/lkml/2016/4/4/455)
> > (could be without features flags, or an entirely new system call
> > specifically for a umask cache) to register a "current umask" cache
> > located in a TLS area.
> >
> > Basically, reading the current umask value would be a simple load from
> > a TLS variable.
>
> I'm actually discussing 3 separate things here: the umask, sigmask, and
> cpu affinity mask.
>
> Not sure if caching the umask in a TLS would be that useful, though.
> The caching idea seems to make more sense for signal mask and cpu
> affinity mask.
>

I think this is of questionable value.

Keep in mind that every feature like this adds overhead to lots of
code paths as well as additional complexity. Such features should be
justified by big performance benefits.

Given that processes can easily track their own umasks and signal
masks if they care, I don't see why the kernel would want to help.

--Andy

2016-04-14 19:26:21

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On Wed, Apr 13, 2016 at 11:47:03PM -0400, Steven Rostedt wrote:
> On Wed, 13 Apr 2016 06:59:50 -0700
> Greg KH <[email protected]> wrote:
>
>
> > Why not add this to the ktest infrastructure, we strongly encourage that
> > for new syscalls, along with a man page patch.
>
> Do you mean the "selftest infrastructure"? As I don't see how this
> could be used with ktest.
>
> See tools/testing/selftests/

Sorry, yes, I meant selftest, not ktest, sorry for the confusion.

greg k-h

2016-04-18 00:39:44

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On 04/13/16 19:13, Theodore Ts'o wrote:
>
> One other reason to suggest using a /proc file is that you're not at
> the mercy of the glibc folks to wire up a new system call. (Glibc has
> been refusing to wire up getrandom(2), for example. Grrrr.....)
>

This brings right back up the libinux idea. There are continued
concerns about type compatibility, but saying "oh, use syscall(3)
instead" has worse properties than a Linux-kernel-team maintained
libinux. Last I heard the glibc team had (reluctantly?) agreed to do
something to deal with linux-specific system calls, but last I heard
nothing had happened. The last discussion I see on the glibc mailing
list dates back to November, and that thread seems to have died from
bikeshedding, again.

There aren't a *lot* of such system calls, but even in that thread the
"oh, only two applications need this, let them use syscall(3)" seems to
remain.

-hpa


2016-04-18 01:09:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On Sun, Apr 17, 2016 at 05:38:24PM -0700, H. Peter Anvin wrote:
> On 04/13/16 19:13, Theodore Ts'o wrote:
> >
> > One other reason to suggest using a /proc file is that you're not at
> > the mercy of the glibc folks to wire up a new system call. (Glibc has
> > been refusing to wire up getrandom(2), for example. Grrrr.....)
> >
>
> This brings right back up the libinux idea. There are continued
> concerns about type compatibility, but saying "oh, use syscall(3)
> instead" has worse properties than a Linux-kernel-team maintained
> libinux. Last I heard the glibc team had (reluctantly?) agreed to do
> something to deal with linux-specific system calls, but last I heard
> nothing had happened. The last discussion I see on the glibc mailing
> list dates back to November, and that thread seems to have died from
> bikeshedding, again.
>
> There aren't a *lot* of such system calls, but even in that thread the
> "oh, only two applications need this, let them use syscall(3)" seems to
> remain.

And only 2 applications will continue to use it because no one wants to
write syscall() wrappers for their individual applications, so it's a
vicious cycle :(

I really like the 'libinux' idea, did anyone every hack up a first-pass
at this? And I'm guessing we have more syscalls now that would need to
be added (like getrandom(), but that shouldn't be too difficult.

thanks,

greg k-h

2016-04-18 01:43:08

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On 04/13/16 08:41, Colin Walters wrote:
> On Wed, Apr 13, 2016, at 08:57 AM, Richard W.M. Jones wrote:
>
>> It's not possible to read the process umask without also modifying it,
>> which is what umask(2) does. A library cannot read umask safely,
>> especially if the main program might be multithreaded.
>
> I assume you just want to do this from a shared library so you can
> determine whether or not you need to call fchown() after making files
> and the like? If that's the case it'd be good to note it in the commit
> message.
>
> BTW...it might be a good idea to add a flags argument:
> https://lwn.net/Articles/585415/
>
> Did you consider calling this `umask2`, having the initial version only support
> retrieving it via a UMASK_GET flag, and lay the groundwork to support
> setting a threadsafe umask with a UMASK_SET_THREAD flag?
>

The comments on that article also list a number of problems with this
approach, related to how undefined flags are handled.

In fact, if it wasn't for this exact problem then umask(-1) would have
been the logical way to deal with this, but because umask(2) is defined
to have an internal & 07777 it becomes infeasible at least in theory.
In practice it might work...

However, see previous discussions about making this available in /proc.
Also, I really think there is something to be said for a O_NOUMASK
option...

-hpa

2016-04-18 01:58:03

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On Sun, Apr 17, 2016 at 06:42:12PM -0700, H. Peter Anvin wrote:
> On 04/13/16 08:41, Colin Walters wrote:
> > On Wed, Apr 13, 2016, at 08:57 AM, Richard W.M. Jones wrote:
> >
> >> It's not possible to read the process umask without also modifying it,
> >> which is what umask(2) does. A library cannot read umask safely,
> >> especially if the main program might be multithreaded.
> >
> > I assume you just want to do this from a shared library so you can
> > determine whether or not you need to call fchown() after making files
> > and the like? If that's the case it'd be good to note it in the commit
> > message.
> >
> > BTW...it might be a good idea to add a flags argument:
> > https://lwn.net/Articles/585415/
> >
> > Did you consider calling this `umask2`, having the initial version only support
> > retrieving it via a UMASK_GET flag, and lay the groundwork to support
> > setting a threadsafe umask with a UMASK_SET_THREAD flag?
> >
>
> The comments on that article also list a number of problems with this
> approach, related to how undefined flags are handled.
>
> In fact, if it wasn't for this exact problem then umask(-1) would have
> been the logical way to deal with this, but because umask(2) is defined
> to have an internal & 07777 it becomes infeasible at least in theory.
> In practice it might work...
>
> However, see previous discussions about making this available in /proc.
> Also, I really think there is something to be said for a O_NOUMASK
> option...

O_NOUMASK seems potentially useful to support implementation of umask
entirely in userspace, which also addresses thread-safety. A program
could read its process umask out at startup, handle umask entirely in
userspace (including for threads), and only interact with the system
umask after fork and before exec.

- Josh Triplett

2016-04-18 02:03:32

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On 04/17/16 18:09, Greg KH wrote:
>>
>> There aren't a *lot* of such system calls, but even in that thread the
>> "oh, only two applications need this, let them use syscall(3)" seems to
>> remain.
>
> And only 2 applications will continue to use it because no one wants to
> write syscall() wrappers for their individual applications, so it's a
> vicious cycle :(
>
> I really like the 'libinux' idea, did anyone every hack up a first-pass
> at this? And I'm guessing we have more syscalls now that would need to
> be added (like getrandom(), but that shouldn't be too difficult.
>

I haven't hacked anything up yet, but I think it would be easy to simply
take the klibc machinery, remove the stuff that isn't needed, and adjust
the handling of errno(3). I think it is time I admit that it needs to
be put on my TODO list.

-hpa


2016-04-18 02:13:19

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On Mon, Apr 18, 2016 at 10:09:25AM +0900, Greg KH wrote:
> On Sun, Apr 17, 2016 at 05:38:24PM -0700, H. Peter Anvin wrote:
> > On 04/13/16 19:13, Theodore Ts'o wrote:
> > >
> > > One other reason to suggest using a /proc file is that you're not at
> > > the mercy of the glibc folks to wire up a new system call. (Glibc has
> > > been refusing to wire up getrandom(2), for example. Grrrr.....)
> > >
> >
> > This brings right back up the libinux idea. There are continued
> > concerns about type compatibility, but saying "oh, use syscall(3)
> > instead" has worse properties than a Linux-kernel-team maintained
> > libinux. Last I heard the glibc team had (reluctantly?) agreed to do
> > something to deal with linux-specific system calls, but last I heard
> > nothing had happened. The last discussion I see on the glibc mailing
> > list dates back to November, and that thread seems to have died from
> > bikeshedding, again.
> >
> > There aren't a *lot* of such system calls, but even in that thread the
> > "oh, only two applications need this, let them use syscall(3)" seems to
> > remain.
>
> And only 2 applications will continue to use it because no one wants to
> write syscall() wrappers for their individual applications, so it's a
> vicious cycle :(
>
> I really like the 'libinux' idea, did anyone every hack up a first-pass
> at this? And I'm guessing we have more syscalls now that would need to
> be added (like getrandom(), but that shouldn't be too difficult.

Personally, I'd suggest that libinux should wire up *all* (non-obsolete)
syscalls, not just those that haven't already been exposed via any
particular libc implementation. Each such syscall function would have
minimal overhead, since unlike libc these wrappers would not have any
special handling (other than use of the vdso) and would directly map to
the kernel syscall signature. Given a standard prefix like sys_ or
linux_, that would provide a clear distinction between direct-syscall
functions and libc functions, and avoid any future conflict if libc adds
a function named the same as the syscall.

As a random example, sys_getpid() would *always* call the getpid
syscall, rather than reading a cache within the library. (And
sys_gettid would call the gettid syscall, rather than failing to exist.)

2016-04-18 02:16:14

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On 04/17/16 19:12, Josh Triplett wrote:
>>
>> I really like the 'libinux' idea, did anyone every hack up a first-pass
>> at this? And I'm guessing we have more syscalls now that would need to
>> be added (like getrandom(), but that shouldn't be too difficult.
>
> Personally, I'd suggest that libinux should wire up *all* (non-obsolete)
> syscalls, not just those that haven't already been exposed via any
> particular libc implementation. Each such syscall function would have
> minimal overhead, since unlike libc these wrappers would not have any
> special handling (other than use of the vdso) and would directly map to
> the kernel syscall signature. Given a standard prefix like sys_ or
> linux_, that would provide a clear distinction between direct-syscall
> functions and libc functions, and avoid any future conflict if libc adds
> a function named the same as the syscall.
>
> As a random example, sys_getpid() would *always* call the getpid
> syscall, rather than reading a cache within the library. (And
> sys_gettid would call the gettid syscall, rather than failing to exist.)
>

I'm not so sure this is a good idea. It has definite pros and cons. In
some ways it pushes it more to be like syscall(3).

-hpa

2016-04-18 03:00:49

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On Sun, Apr 17, 2016 at 07:15:31PM -0700, H. Peter Anvin wrote:
> On 04/17/16 19:12, Josh Triplett wrote:
> >>
> >> I really like the 'libinux' idea, did anyone every hack up a first-pass
> >> at this? And I'm guessing we have more syscalls now that would need to
> >> be added (like getrandom(), but that shouldn't be too difficult.
> >
> > Personally, I'd suggest that libinux should wire up *all* (non-obsolete)
> > syscalls, not just those that haven't already been exposed via any
> > particular libc implementation. Each such syscall function would have
> > minimal overhead, since unlike libc these wrappers would not have any
> > special handling (other than use of the vdso) and would directly map to
> > the kernel syscall signature. Given a standard prefix like sys_ or
> > linux_, that would provide a clear distinction between direct-syscall
> > functions and libc functions, and avoid any future conflict if libc adds
> > a function named the same as the syscall.
> >
> > As a random example, sys_getpid() would *always* call the getpid
> > syscall, rather than reading a cache within the library. (And
> > sys_gettid would call the gettid syscall, rather than failing to exist.)
>
> I'm not so sure this is a good idea. It has definite pros and cons. In
> some ways it pushes it more to be like syscall(3).

It seems like one of the main problems with syscall() is that it forces
userspace to handle weird ABI issues, such as syscall numbers varying by
architecture, encoding of 64-bit arguments on 32-bit platforms (see the
example in the syscall manpage), and other subtleties that will break on
architectures other than the one the developer is most likely to be
running. libinux bindings would eliminate those issues.

What cases do you have in mind where the libinux binding should look
different than the C API of the SYSCALL_DEFINE'd function in the kernel?

Users can still call the libc syscall when they want libc's behavior;
for syscalls that have a libc binding, most users will want that
version. But I've often needed to call the underlying syscall even for
syscalls that *do* have a libc binding, for various purposes, and having
a standard way to do that while still having safe type signatures seems
helpful.

This would also make it much easier to write an alternative libc, or a
language standard library that doesn't want to depend on libc.

- Josh Triplett

2016-04-18 03:01:59

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On 04/17/16 19:37, Josh Triplett wrote:
>
> It seems like one of the main problems with syscall() is that it forces
> userspace to handle weird ABI issues, such as syscall numbers varying by
> architecture, encoding of 64-bit arguments on 32-bit platforms (see the
> example in the syscall manpage), and other subtleties that will break on
> architectures other than the one the developer is most likely to be
> running. libinux bindings would eliminate those issues.
>
> What cases do you have in mind where the libinux binding should look
> different than the C API of the SYSCALL_DEFINE'd function in the kernel?
>
> Users can still call the libc syscall when they want libc's behavior;
> for syscalls that have a libc binding, most users will want that
> version. But I've often needed to call the underlying syscall even for
> syscalls that *do* have a libc binding, for various purposes, and having
> a standard way to do that while still having safe type signatures seems
> helpful.
>
> This would also make it much easier to write an alternative libc, or a
> language standard library that doesn't want to depend on libc.
>

The main problem has to do with types, and the fact that the C library
may want to intersperse itself around system calls. If people start
writing programs that call, say, __linux_umask() then it would make it
hard for libc to do something special with umask().

There are other things like it, e.g. where dev_t and __kernel_dev_t are
concerned.

Now, we could of course have __linux_getrandom() and make a weak alias
for getrandom(), but I really don't understand the use case for
exporting all the system calls.

-hpa


2016-04-18 09:14:19

by Richard W.M. Jones

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On Sun, Apr 17, 2016 at 06:57:36PM -0700, Josh Triplett wrote:
> O_NOUMASK seems potentially useful to support implementation of umask
> entirely in userspace, which also addresses thread-safety. A program
> could read its process umask out at startup, handle umask entirely in
> userspace (including for threads), and only interact with the system
> umask after fork and before exec.

I had a look at O_NOUMASK and there are a few problems:

It's relatively easy to implement for open(2). A few filesystems
implement their own open so I had to go into those filesystems and
modify how they handle current_umask too. And FUSE support is tricky
so I passed on that.

The real problem is that mkdir/mkdirat/mknod/mknodat are affected by
umask, but there is no convenient flags parameter to pass the
O_NOUMASK flag. So I think the patch only half-solves the problem.

I have a patch which needs a bit more testing, which I can post if you
think that's helpful, but I don't think it would be acceptable in its
current state.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines. Supports shell scripting,
bindings from many languages. http://libguestfs.org

2016-04-18 10:05:00

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On April 18, 2016 2:14:12 AM PDT, "Richard W.M. Jones" <[email protected]> wrote:
>On Sun, Apr 17, 2016 at 06:57:36PM -0700, Josh Triplett wrote:
>> O_NOUMASK seems potentially useful to support implementation of umask
>> entirely in userspace, which also addresses thread-safety. A program
>> could read its process umask out at startup, handle umask entirely in
>> userspace (including for threads), and only interact with the system
>> umask after fork and before exec.
>
>I had a look at O_NOUMASK and there are a few problems:
>
>It's relatively easy to implement for open(2). A few filesystems
>implement their own open so I had to go into those filesystems and
>modify how they handle current_umask too. And FUSE support is tricky
>so I passed on that.
>
>The real problem is that mkdir/mkdirat/mknod/mknodat are affected by
>umask, but there is no convenient flags parameter to pass the
>O_NOUMASK flag. So I think the patch only half-solves the problem.
>
>I have a patch which needs a bit more testing, which I can post if you
>think that's helpful, but I don't think it would be acceptable in its
>current state.
>
>Rich.

Ironically this illustrates one of the limitations with flags arguments: this really belongs in the S_-flags, but we can't assume userspace is clean there... anymore than we can repurpose umask(-1).
--
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

2016-04-18 11:40:28

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] vfs: Define new syscall getumask.

On Sun, Apr 17, 2016 at 08:00:54PM -0700, H. Peter Anvin wrote:
> Now, we could of course have __linux_getrandom() and make a weak alias
> for getrandom(), but I really don't understand the use case for
> exporting all the system calls.

If we do create a libinux library, I'd definitely want getrandom(2) to
be defined as getrandom(), and not as sys_getrandom() or
linux_getrandom().

For bonus points we could also define a OpenBSD compatible
getentropy() interface so that programs that were already written to
use the OpenBSD interface (and perhaps have a configure test for it
already) would get the benefits of the getrandom(2) system call
immediately.

- Ted