Subject: plan9 semantics on Linux - mount namespaces

Hi folks,


I'm currently trying to implement plan9 semantics on Linux and
yet sorting out how to do the mount namespace handling.

On plan9, any unprivileged process can create its own namespace
and mount/bind at will, while on Linux this requires CAP_SYS_ADMIN.

What is the reason for not allowing arbitrary users to create their
own private mount namespace ? What could go wrong here ?

IMHO, we could allow mount/bind under the following conditions:

* the process is in a private mount namespace
* no suid-flag is honored (either force all mounts to nosuid or
completely mask it out)
* only certain whitelisted filesystems allowed (eg. 9P and FUSE)

Maybe that all could be enabled by a new capability.


any suggestions ?


--mtx

--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287


Subject: Re: plan9 semantics on Linux - mount namespaces

On 13.02.2018 22:12, Enrico Weigelt wrote:

CC @[email protected]

> Hi folks,
>
>
> I'm currently trying to implement plan9 semantics on Linux and
> yet sorting out how to do the mount namespace handling.
>
> On plan9, any unprivileged process can create its own namespace
> and mount/bind at will, while on Linux this requires CAP_SYS_ADMIN.
>
> What is the reason for not allowing arbitrary users to create their
> own private mount namespace ? What could go wrong here ?
>
> IMHO, we could allow mount/bind under the following conditions:
>
> * the process is in a private mount namespace
> * no suid-flag is honored (either force all mounts to nosuid or
>   completely mask it out)
> * only certain whitelisted filesystems allowed (eg. 9P and FUSE)
>
> Maybe that all could be enabled by a new capability.
>
>
> any suggestions ?
>
>
> --mtx
>


--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287

2018-02-13 22:29:03

by Aleksa Sarai

[permalink] [raw]
Subject: Re: plan9 semantics on Linux - mount namespaces

On 2018-02-13, Enrico Weigelt <[email protected]> wrote:
> On 13.02.2018 22:12, Enrico Weigelt wrote:
> > I'm currently trying to implement plan9 semantics on Linux and
> > yet sorting out how to do the mount namespace handling.
> >
> > On plan9, any unprivileged process can create its own namespace
> > and mount/bind at will, while on Linux this requires CAP_SYS_ADMIN.
> >
> > What is the reason for not allowing arbitrary users to create their
> > own private mount namespace ? What could go wrong here ?

You can do this by creating a new user namespace (CLONE_NEWUSER), which
then gives you the required permissions to create other namespaces
(CLONE_NEWNS). This is how "rootless containers" or unprivileged
containers operate.

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>


Attachments:
(No filename) (853.00 B)
signature.asc (849.00 B)
Download all attachments
Subject: Re: plan9 semantics on Linux - mount namespaces

On 13.02.2018 22:27, Aleksa Sarai wrote:

> You can do this by creating a new user namespace (CLONE_NEWUSER), which
> then gives you the required permissions to create other namespaces
> (CLONE_NEWNS). This is how "rootless containers" or unprivileged
> containers operate.

hmm, unshare -U doesn't work for me (even as root). But docker works,
so user namespaces should be working. Any idea what could be wrong ?


--mtx


--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287

2018-02-14 04:55:43

by Aleksa Sarai

[permalink] [raw]
Subject: Re: plan9 semantics on Linux - mount namespaces

On 2018-02-14, Enrico Weigelt <[email protected]> wrote:
> On 13.02.2018 22:27, Aleksa Sarai wrote:
>
> > You can do this by creating a new user namespace (CLONE_NEWUSER), which
> > then gives you the required permissions to create other namespaces
> > (CLONE_NEWNS). This is how "rootless containers" or unprivileged
> > containers operate.
>
> hmm, unshare -U doesn't work for me (even as root). But docker works,
> so user namespaces should be working. Any idea what could be wrong ?

It depends how old your kernel is and what distro you use. Arch Linux
disables user namespaces entirely, Debian requires that you set a sysctl
to enable unprivileged user namespaces, and RHEL requires you to set
both a sysctl and a kernel boot-flag. Also check how old your kernel is
(unprivileged user namespace support was added in 3.8).

Also Docker doesn't use user namespaces by default (you need to manually
enable it with --userns-remap, check the docs for more details). You
probably also want to be using "unshare -r" in your testing (as "unshare
-U" will leave you without mapped users).

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>


Attachments:
(No filename) (1.18 kB)
signature.asc (849.00 B)
Download all attachments
Subject: Re: plan9 semantics on Linux - mount namespaces

On 14.02.2018 04:54, Aleksa Sarai wrote:

> It depends how old your kernel is and what distro you use. Arch Linux > disables user namespaces entirely, Debian requires that you set a
sysctl> to enable unprivileged user namespaces, and RHEL requires you to
set> both a sysctl and a kernel boot-flag. Also check how old your
kernel is> (unprivileged user namespace support was added in 3.8).
Just tried on a mainline kernel (4.15). Same problem:

root@alphabox:~ unshare -U -r
unshare: unshare(0x14000000): Invalid argument


root@alphabox:/proc/sys/user cat max_user_namespaces
5922


Am I missing something ?


--mtx

--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287

2018-02-14 10:26:48

by Aleksa Sarai

[permalink] [raw]
Subject: Re: plan9 semantics on Linux - mount namespaces

On 2018-02-14, Enrico Weigelt <[email protected]> wrote:
> On 14.02.2018 04:54, Aleksa Sarai wrote:
>
> > It depends how old your kernel is and what distro you use. Arch Linux >
> > disables user namespaces entirely, Debian requires that you set a
> sysctl> to enable unprivileged user namespaces, and RHEL requires you to
> set> both a sysctl and a kernel boot-flag. Also check how old your kernel
> is> (unprivileged user namespace support was added in 3.8).
> Just tried on a mainline kernel (4.15). Same problem:
>
> root@alphabox:~ unshare -U -r
> unshare: unshare(0x14000000): Invalid argument
> root@alphabox:/proc/sys/user cat max_user_namespaces
> 5922

What distribution are you using and which release? Also, are you trying
to do this inside a Docker container or something similar (Docker has
seccomp filters that block CLONE_NEWUSER by default, for instance).

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>


Attachments:
(No filename) (994.00 B)
signature.asc (849.00 B)
Download all attachments

2018-02-14 11:32:48

by Richard Weinberger

[permalink] [raw]
Subject: Re: plan9 semantics on Linux - mount namespaces

On Wed, Feb 14, 2018 at 12:27 PM, Enrico Weigelt <[email protected]> wrote:
> On 14.02.2018 11:24, Aleksa Sarai wrote:
>
>> What distribution are you using and which release?
>
>
> On a self-compiled system.
>
> Forgot to enable namespaces in the kernel. Now it seems to work
> as root, but not as an unprivileged user:
>
>
> daemon@alphabox:~ unshare -r -U
> unshare: can't open '/proc/self/setgroups': Permission denied
> daemon@alphabox:~ unshare -f -r -U
> unshare: can't open '/proc/self/setgroups': Permission denied
>

Please read http://man7.org/linux/man-pages/man7/user_namespaces.7.html
setgroups is a corner case and needs special care.

--
Thanks,
//richard

Subject: Re: plan9 semantics on Linux - mount namespaces

On 14.02.2018 11:24, Aleksa Sarai wrote:

> What distribution are you using and which release?

On a self-compiled system.

Forgot to enable namespaces in the kernel. Now it seems to work
as root, but not as an unprivileged user:


daemon@alphabox:~ unshare -r -U
unshare: can't open '/proc/self/setgroups': Permission denied
daemon@alphabox:~ unshare -f -r -U
unshare: can't open '/proc/self/setgroups': Permission denied


--mtx

--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287

Subject: Re: plan9 semantics on Linux - mount namespaces

On 14.02.2018 12:30, Richard Weinberger wrote:
> On Wed, Feb 14, 2018 at 12:27 PM, Enrico Weigelt <[email protected]> wrote:
>> On 14.02.2018 11:24, Aleksa Sarai wrote:
>>
>>> What distribution are you using and which release?
>>
>>
>> On a self-compiled system.
>>
>> Forgot to enable namespaces in the kernel. Now it seems to work
>> as root, but not as an unprivileged user:
>>
>>
>> daemon@alphabox:~ unshare -r -U
>> unshare: can't open '/proc/self/setgroups': Permission denied
>> daemon@alphabox:~ unshare -f -r -U
>> unshare: can't open '/proc/self/setgroups': Permission denied
>>
>
> Please read http://man7.org/linux/man-pages/man7/user_namespaces.7.html
> setgroups is a corner case and needs special care.

I'm still confused. Does the unshare program do something wrong here ?

Anyways, I doubt that user namespaces help solving my problem.

What I'd like to achieve is that processes can manipulate their private
namespace at will and mount other filesystems (primarily 9p and fuse).

For that, I need to get rid of setuid (and per-file caps) for these
private namespaces.


--mtx

--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287

2018-02-14 12:53:25

by Richard Weinberger

[permalink] [raw]
Subject: Re: plan9 semantics on Linux - mount namespaces

Enrico,

Am Mittwoch, 14. Februar 2018, 13:38:48 CET schrieb Enrico Weigelt:
> On 14.02.2018 12:30, Richard Weinberger wrote:
> > On Wed, Feb 14, 2018 at 12:27 PM, Enrico Weigelt <[email protected]> wrote:
> >> On 14.02.2018 11:24, Aleksa Sarai wrote:
> >>> What distribution are you using and which release?
> >>
> >> On a self-compiled system.
> >>
> >> Forgot to enable namespaces in the kernel. Now it seems to work
> >> as root, but not as an unprivileged user:
> >>
> >>
> >> daemon@alphabox:~ unshare -r -U
> >> unshare: can't open '/proc/self/setgroups': Permission denied
> >> daemon@alphabox:~ unshare -f -r -U
> >> unshare: can't open '/proc/self/setgroups': Permission denied
> >
> > Please read http://man7.org/linux/man-pages/man7/user_namespaces.7.html
> > setgroups is a corner case and needs special care.
>
> I'm still confused. Does the unshare program do something wrong here ?

It does what you ask it for.
Also see the --setgroups switch.
AFAICT --setgroups=deny is the new default, then your command line should just
work. Maybe your unshare tool is too old.

> Anyways, I doubt that user namespaces help solving my problem.
>
> What I'd like to achieve is that processes can manipulate their private
> namespace at will and mount other filesystems (primarily 9p and fuse).
>
> For that, I need to get rid of setuid (and per-file caps) for these
> private namespaces.

This is exactly why we have the user namespace.
In the user namespace you can create your own mount namespace and do (almost)
whatever you want.
Please note that you cannot mount any kind of filesystem.
For FUSE, see https://lwn.net/Articles/684774/

Thanks,
//richard

--
sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria
ATU66964118 - FN 374287y

Subject: Re: plan9 semantics on Linux - mount namespaces

On 14.02.2018 13:53, Richard Weinberger wrote:

> It does what you ask it for. > Also see the --setgroups switch.> AFAICT --setgroups=deny is the new
default, then your command line should just> work. Maybe your unshare
tool is too old.
Also doesn't help:

daemon@alphabox:~ unshare -U -r --setgroups=deny
unshare: can't open '/proc/self/setgroups': Permission denied

>> What I'd like to achieve is that processes can manipulate their private >> namespace at will and mount other filesystems (primarily 9p and
fuse).>>>> For that, I need to get rid of setuid (and per-file caps) for
these>> private namespaces.>
> This is exactly why we have the user namespace.
> In the user namespace you can create your own mount namespace and do (almost)
> whatever you want.

What's the exact relation between user and mnt namespace ?
Why do I need an own user ns for private mnt ns ? (except for the suid
bit, which I wanna get rid of anyways).


--mtx

--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287

Subject: Re: plan9 semantics on Linux - mount namespaces

On 14.02.2018 15:19, Richard Weinberger wrote:

> Works here(tm).
> Can you debug it? Maybe we miss something obvious.

daemon@alphabox:~ strace unshare -U -r --setgroups=deny
execve("/bin/unshare", ["unshare", "-U", "-r", "--setgroups=deny"],
0x7ee51e0c /* 11 vars */) = 0
brk(NULL) = 0x58000
fcntl64(0, F_GETFD) = 0
fcntl64(1, F_GETFD) = 0
fcntl64(2, F_GETFD) = 0
access("/etc/suid-debug", F_OK) = -1 ENOENT (No such file or
directory)
uname({sysname="Linux", nodename="alphabox", ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x76f90000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file
or directory)
open("/lib/tls/v7l/neon/vfp/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT
(No such file or directory)
stat64("/lib/tls/v7l/neon/vfp", 0x7eae8710) = -1 ENOENT (No such file or
directory)
open("/lib/tls/v7l/neon/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No
such file or directory)
stat64("/lib/tls/v7l/neon", 0x7eae8710) = -1 ENOENT (No such file or
directory)
open("/lib/tls/v7l/vfp/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No
such file or directory)
stat64("/lib/tls/v7l/vfp", 0x7eae8710) = -1 ENOENT (No such file or
directory)
open("/lib/tls/v7l/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such
file or directory)
stat64("/lib/tls/v7l", 0x7eae8710) = -1 ENOENT (No such file or
directory)
open("/lib/tls/neon/vfp/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No
such file or directory)
stat64("/lib/tls/neon/vfp", 0x7eae8710) = -1 ENOENT (No such file or
directory)
open("/lib/tls/neon/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such
file or directory)
stat64("/lib/tls/neon", 0x7eae8710) = -1 ENOENT (No such file or
directory)
open("/lib/tls/vfp/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such
file or directory)
stat64("/lib/tls/vfp", 0x7eae8710) = -1 ENOENT (No such file or
directory)
open("/lib/tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file
or directory)
stat64("/lib/tls", 0x7eae8710) = -1 ENOENT (No such file or
directory)
open("/lib/v7l/neon/vfp/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No
such file or directory)
stat64("/lib/v7l/neon/vfp", 0x7eae8710) = -1 ENOENT (No such file or
directory)
open("/lib/v7l/neon/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such
file or directory)
stat64("/lib/v7l/neon", 0x7eae8710) = -1 ENOENT (No such file or
directory)
open("/lib/v7l/vfp/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such
file or directory)
stat64("/lib/v7l/vfp", 0x7eae8710) = -1 ENOENT (No such file or
directory)
open("/lib/v7l/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file
or directory)
stat64("/lib/v7l", 0x7eae8710) = -1 ENOENT (No such file or
directory)
open("/lib/neon/vfp/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such
file or directory)
stat64("/lib/neon/vfp", 0x7eae8710) = -1 ENOENT (No such file or
directory)
open("/lib/neon/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such
file or directory)
stat64("/lib/neon", 0x7eae8710) = -1 ENOENT (No such file or
directory)
open("/lib/vfp/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file
or directory)
stat64("/lib/vfp", 0x7eae8710) = -1 ENOENT (No such file or
directory)
open("/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3,
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0Yi\1\0004\0\0\0"..., 512)
= 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=878136, ...}) = 0
mmap2(NULL, 947496, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x76e82000
mprotect(0x76f55000, 61440, PROT_NONE) = 0
mmap2(0x76f64000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd2000) = 0x76f64000
mmap2(0x76f67000, 9512, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x76f67000
close(3) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x76f8f000
set_tls(0x76f8f4c0, 0x76f8fb98, 0x76f92050, 0x76f8f4c0, 0x76f92050) = 0
mprotect(0x76f64000, 8192, PROT_READ) = 0
mprotect(0x76f91000, 4096, PROT_READ) = 0
getuid32() = 1
stat64("/etc/busybox.conf", {st_mode=S_IFREG|0644, st_size=198, ...}) = 0
brk(NULL) = 0x58000
brk(0x79000) = 0x79000
open("/etc/busybox.conf", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=198, ...}) = 0
read(3, "[SUID]\n#lines starting with # ar"..., 1024) = 198
read(3, "", 1024) = 0
close(3) = 0
getgid32() = 1
setgid32(1) = 0
setuid32(1) = 0
geteuid32() = 1
getegid32() = 1
unshare(CLONE_NEWUTS|CLONE_NEWUSER) = 0
open("/proc/self/setgroups", O_WRONLY|O_LARGEFILE) = 3
write(3, "deny", 4) = 4
close(3) = 0
open("/proc/self/uid_map", O_WRONLY|O_LARGEFILE) = 3
write(3, "1 0 1", 5) = -1 EPERM (Operation not permitted)
write(2, "unshare: write error: Operation "..., 46unshare: write error:
Operation not permitted
) = 46
exit_group(1) = ?
+++ exited with 1 +++

Seems it fails to write the uid map.
Is the order of setgroups vs uid_map correct ?

>> What's the exact relation between user and mnt namespace ?
>> Why do I need an own user ns for private mnt ns ? (except for the suid
>> bit, which I wanna get rid of anyways).
>
> mount related system calls are root-only. Therefore you need the user
> namespace to become a root in your own little world. :)

I'm looking for a way to do that w/o being root (or something similar).
Actually, I don't like to change the user namespace, as it would cause
a lot of trouble w/ the /dev/cap[hash|use] devices, which I'm using for
user switching (as said: I'm going to get rid of suid completely).

--mtx

--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287

2018-02-14 15:17:35

by Richard Weinberger

[permalink] [raw]
Subject: Re: plan9 semantics on Linux - mount namespaces

Enrico,

Am Mittwoch, 14. Februar 2018, 16:02:18 CET schrieb Enrico Weigelt:
> stat64("/etc/busybox.conf", {st_mode=S_IFREG|0644, st_size=198, ...}) = 0

busybox...

> brk(NULL) = 0x58000
> brk(0x79000) = 0x79000
> open("/etc/busybox.conf", O_RDONLY|O_LARGEFILE) = 3
> fstat64(3, {st_mode=S_IFREG|0644, st_size=198, ...}) = 0
> read(3, "[SUID]\n#lines starting with # ar"..., 1024) = 198
> read(3, "", 1024) = 0
> close(3) = 0
> getgid32() = 1
> setgid32(1) = 0
> setuid32(1) = 0
> geteuid32() = 1
> getegid32() = 1
> unshare(CLONE_NEWUTS|CLONE_NEWUSER) = 0
> open("/proc/self/setgroups", O_WRONLY|O_LARGEFILE) = 3
> write(3, "deny", 4) = 4
> close(3) = 0
> open("/proc/self/uid_map", O_WRONLY|O_LARGEFILE) = 3
> write(3, "1 0 1", 5) = -1 EPERM (Operation not permitted)

This mapping looks broken.
Please report to busybox folks.

From taking a *very* quick look into busybox source, I suspect this should fix
it:

diff --git a/util-linux/unshare.c b/util-linux/unshare.c
index 875e3f86e304..3f59cf4d27c2 100644
--- a/util-linux/unshare.c
+++ b/util-linux/unshare.c
@@ -350,9 +350,9 @@ int unshare_main(int argc UNUSED_PARAM, char **argv)
* in that user namespace.
*/
xopen_xwrite_close(PATH_PROC_SETGROUPS, "deny");
- sprintf(uidmap_buf, "%u 0 1", (unsigned)reuid);
+ sprintf(uidmap_buf, "0 %u 1", (unsigned)reuid);
xopen_xwrite_close(PATH_PROC_UIDMAP, uidmap_buf);
- sprintf(uidmap_buf, "%u 0 1", (unsigned)regid);
+ sprintf(uidmap_buf, "0 %u 1", (unsigned)regid);
xopen_xwrite_close(PATH_PROC_GIDMAP, uidmap_buf);
} else
if (setgrp_str) {

Thanks,
//richard

--
sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria
ATU66964118 - FN 374287y

2018-02-14 17:50:32

by Richard Weinberger

[permalink] [raw]
Subject: Re: plan9 semantics on Linux - mount namespaces

Am Mittwoch, 14. Februar 2018, 18:21:12 CET schrieb Enrico Weigelt:
> On 14.02.2018 16:17, Richard Weinberger wrote:
> > From taking a *very* quick look into busybox source, I suspect this
> > should fix>
> > it:
> >
> > diff --git a/util-linux/unshare.c b/util-linux/unshare.c
> > index 875e3f86e304..3f59cf4d27c2 100644
> > --- a/util-linux/unshare.c
> > +++ b/util-linux/unshare.c
> > @@ -350,9 +350,9 @@ int unshare_main(int argc UNUSED_PARAM, char **argv)
> >
> > * in that user namespace.
> > */
> >
> > xopen_xwrite_close(PATH_PROC_SETGROUPS, "deny");
> >
> > - sprintf(uidmap_buf, "%u 0 1", (unsigned)reuid);
> > + sprintf(uidmap_buf, "0 %u 1", (unsigned)reuid);
> >
> > xopen_xwrite_close(PATH_PROC_UIDMAP, uidmap_buf);
> >
> > - sprintf(uidmap_buf, "%u 0 1", (unsigned)regid);
> > + sprintf(uidmap_buf, "0 %u 1", (unsigned)regid);
> >
> > xopen_xwrite_close(PATH_PROC_GIDMAP, uidmap_buf);
> >
> > } else
> > if (setgrp_str) {
>
> hmm, now it works, but only when strace'ing it.
> that's really strange.

On my box, with my patch applied, also busybox works now.

> But still I wonder whether user_ns really solves my problem, as I don't
> want to create sandboxed users, but only private namespaces just like
> on Plan9.

Well, I'd be surprised if that works out of the box.
Since you're posting on LKML I assumed you're hacking the kernel to support
plan9-alike namespaces...

Thanks,
//richard

--
sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria
ATU66964118 - FN 374287y

Subject: Re: plan9 semantics on Linux - mount namespaces

On 14.02.2018 18:50, Richard Weinberger wrote:

>> hmm, now it works, but only when strace'ing it.
>> that's really strange.
>
> On my box, with my patch applied, also busybox works now.

hmm, w/o strace, too ?
Which version are you using ? I've got 1.27.2

>> But still I wonder whether user_ns really solves my problem, as I don't
>> want to create sandboxed users, but only private namespaces just like
>> on Plan9.
>
> Well, I'd be surprised if that works out of the box.
> Since you're posting on LKML I assumed you're hacking the kernel to support
> plan9-alike namespaces...

Yes, that's the plan :)


--mtx

--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287

Subject: Re: plan9 semantics on Linux - mount namespaces

On 14.02.2018 19:12, Richard Weinberger wrote:

> BTW: Your issue is fixed/known. Just checked.

aha, on 1.2.28 ... I'll have to upgrade.


--mtx


--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287

2018-02-14 19:55:47

by Richard Weinberger

[permalink] [raw]
Subject: Re: plan9 semantics on Linux - mount namespaces

Am Mittwoch, 14. Februar 2018, 15:03:55 CET schrieb Enrico Weigelt:
> On 14.02.2018 13:53, Richard Weinberger wrote:
> > It does what you ask it for. > Also see the --setgroups switch.> AFAICT
> > --setgroups=deny is the new
> default, then your command line should just> work. Maybe your unshare
> tool is too old.
> Also doesn't help:
>
> daemon@alphabox:~ unshare -U -r --setgroups=deny
> unshare: can't open '/proc/self/setgroups': Permission denied

Works here(tm).
Can you debug it? Maybe we miss something obvious.

> >> What I'd like to achieve is that processes can manipulate their private
> >> >> namespace at will and mount other filesystems (primarily 9p and
> fuse).>>>> For that, I need to get rid of setuid (and per-file caps) for
> these>> private namespaces.>
>
> > This is exactly why we have the user namespace.
> > In the user namespace you can create your own mount namespace and do
> > (almost) whatever you want.
>
> What's the exact relation between user and mnt namespace ?
> Why do I need an own user ns for private mnt ns ? (except for the suid
> bit, which I wanna get rid of anyways).

mount related system calls are root-only. Therefore you need the user
namespace to become a root in your own little world. :)

Thanks,
//richard

--
sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria
ATU66964118 - FN 374287y

Subject: Re: plan9 semantics on Linux - mount namespaces

On 14.02.2018 16:17, Richard Weinberger wrote:

> From taking a *very* quick look into busybox source, I suspect this should fix
> it:
>
> diff --git a/util-linux/unshare.c b/util-linux/unshare.c
> index 875e3f86e304..3f59cf4d27c2 100644
> --- a/util-linux/unshare.c
> +++ b/util-linux/unshare.c
> @@ -350,9 +350,9 @@ int unshare_main(int argc UNUSED_PARAM, char **argv)
> * in that user namespace.
> */
> xopen_xwrite_close(PATH_PROC_SETGROUPS, "deny");
> - sprintf(uidmap_buf, "%u 0 1", (unsigned)reuid);
> + sprintf(uidmap_buf, "0 %u 1", (unsigned)reuid);
> xopen_xwrite_close(PATH_PROC_UIDMAP, uidmap_buf);
> - sprintf(uidmap_buf, "%u 0 1", (unsigned)regid);
> + sprintf(uidmap_buf, "0 %u 1", (unsigned)regid);
> xopen_xwrite_close(PATH_PROC_GIDMAP, uidmap_buf);
> } else
> if (setgrp_str) {
>

hmm, now it works, but only when strace'ing it.
that's really strange.

But still I wonder whether user_ns really solves my problem, as I don't
want to create sandboxed users, but only private namespaces just like
on Plan9.


--mtx

--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287

2018-02-14 20:11:27

by Richard Weinberger

[permalink] [raw]
Subject: Re: plan9 semantics on Linux - mount namespaces

Am Mittwoch, 14. Februar 2018, 19:01:52 CET schrieb Enrico Weigelt:
> On 14.02.2018 18:50, Richard Weinberger wrote:
> >> hmm, now it works, but only when strace'ing it.
> >> that's really strange.
> >
> > On my box, with my patch applied, also busybox works now.
>
> hmm, w/o strace, too ?

Sure.

> Which version are you using ? I've got 1.27.2

Both master and 1.12.x

BTW: Your issue is fixed/known. Just checked.

commit 1b510900e24459353922a1bc83c0b58bc8bafe1c
Author: Denys Vlasenko <[email protected]>
Date: Thu Nov 9 16:06:33 2017 +0100

unshare: -r should map root to user, not the other way around

Signed-off-by: Denys Vlasenko <[email protected]>

Thanks,
//richard

--
sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria
ATU66964118 - FN 374287y

2018-02-14 20:40:37

by Aleksa Sarai

[permalink] [raw]
Subject: Re: plan9 semantics on Linux - mount namespaces

On 2018-02-14, Enrico Weigelt <[email protected]> wrote:
> But still I wonder whether user_ns really solves my problem, as I don't
> want to create sandboxed users, but only private namespaces just like
> on Plan9.

On Linux you need to have CAP_SYS_ADMIN (in the user_ns that owns your
current mnt_ns) in order to mount anything, and to create any namespaces
(in your current user_ns). So, in order to use the functionality of
mnt_ns (the ability to create mounts only a subset of processes can
see) as an unprivileged user, you need to use user_ns.

(Note there is an additional restriction, namely that a mnt_ns that was
set up in the non-root user_ns cannot mount any filesystems that do not
have the FS_USERNS_MOUNT option set. This is also for security, as
exposing the kernel filesystem parser to arbitrary data by unprivileged
users wasn't deemed to be a safe thing to do. The unprivileged FUSE work
that Richard linked to will likely be useful for pushing FS_USERNS_MOUNT
into more filesystems -- like 9p.)

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>


Attachments:
(No filename) (1.11 kB)
signature.asc (849.00 B)
Download all attachments

2018-02-16 19:24:10

by Eric W. Biederman

[permalink] [raw]
Subject: Re: plan9 semantics on Linux - mount namespaces

Enrico Weigelt <[email protected]> writes:

> On 13.02.2018 22:12, Enrico Weigelt wrote:
>
> CC @[email protected]
>
>> Hi folks,
>>
>>
>> I'm currently trying to implement plan9 semantics on Linux and
>> yet sorting out how to do the mount namespace handling.
>>
>> On plan9, any unprivileged process can create its own namespace
>> and mount/bind at will, while on Linux this requires CAP_SYS_ADMIN.
>>
>> What is the reason for not allowing arbitrary users to create their
>> own private mount namespace ? What could go wrong here ?

suid root executables could be fooled. An easy case is fooling
/bin/su into reading a different copy of /etc/shadow, and allowing
arbitrary changes between users.

>> IMHO, we could allow mount/bind under the following conditions:
>>
>> * the process is in a private mount namespace
>> * no suid-flag is honored (either force all mounts to nosuid or
>>   completely mask it out)
>> * only certain whitelisted filesystems allowed (eg. 9P and FUSE)
>>
>> Maybe that all could be enabled by a new capability.
>>
>>
>> any suggestions ?

User namespaces limit the contained processes to not having any
permissions outside of the user namespace. While still allowing the
fully unix permission model inside user namespaces.

I am in the final stages of getting the changes in the vfs and in fuse
to allow unprivileged users to mount that filesystem. plan9fs would
also be a candidate for that kind of treatment if it had a maintainer.

Eric