Subject: Documenting the ioctl interfaces to discover relationships between namespaces

[was: [PATCH 0/4 v3] Add an interface to discover relationships
between namespaces]

Hello Andrei

See below for my attempt to document the following.

On 6 September 2016 at 09:47, Andrei Vagin <[email protected]> wrote:
> From: Andrey Vagin <[email protected]>
>
> Each namespace has an owning user namespace and now there is not way
> to discover these relationships.
>
> Pid and user namepaces are hierarchical. There is no way to discover
> parent-child relationships too.
>
> Why we may want to know relationships between namespaces?
>
> One use would be visualization, in order to understand the running
> system. Another would be to answer the question: what capability does
> process X have to perform operations on a resource governed by namespace
> Y?
>
> One more use-case (which usually called abnormal) is checkpoint/restart.
> In CRIU we are going to dump and restore nested namespaces.
>
> There [1] was a discussion about which interface to choose to determing
> relationships between namespaces.
>
> Eric suggested to add two ioctl-s [2]:
>> Grumble, Grumble. I think this may actually a case for creating ioctls
>> for these two cases. Now that random nsfs file descriptors are bind
>> mountable the original reason for using proc files is not as pressing.
>>
>> One ioctl for the user namespace that owns a file descriptor.
>> One ioctl for the parent namespace of a namespace file descriptor.
>
> Here is an implementaions of these ioctl-s.
>
> $ man man7/namespaces.7
> ...
> Since Linux 4.X, the following ioctl(2) calls are supported for
> namespace file descriptors. The correct syntax is:
>
> fd = ioctl(ns_fd, ioctl_type);
>
> where ioctl_type is one of the following:
>
> NS_GET_USERNS
> Returns a file descriptor that refers to an owning user names‐
> pace.
>
> NS_GET_PARENT
> Returns a file descriptor that refers to a parent namespace.
> This ioctl(2) can be used for pid and user namespaces. For
> user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same
> meaning.
>
> In addition to generic ioctl(2) errors, the following specific ones
> can occur:
>
> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
>
> EPERM The requested namespace is outside of the current namespace
> scope.
>
> [1] https://lkml.org/lkml/2016/7/6/158
> [2] https://lkml.org/lkml/2016/7/9/101

The following is the text I propose to add to the namespaces(7) page.
Could you please review and let me know of corrections and
improvements.

Thanks,

Michael


Introspecting namespace relationships
Since Linux 4.9, two ioctl(2) operations are provided to allow
introspection of namespace relationships (see user_namespaces(7)
and pid_namespaces(7)). The form of the calls is:

ioctl(fd, request);

In each case, fd refers to a /proc/[pid]/ns/* file.

NS_GET_USERNS
Returns a file descriptor that refers to the owning user
namespace for the namespace referred to by fd.

NS_GET_PARENT
Returns a file descriptor that refers to the parent names‐
pace of the namespace referred to by fd. This operation is
valid only for hierarchical namespaces (i.e., PID and user
namespaces). For user namespaces, NS_GET_PARENT is synony‐
mous with NS_GET_USERNS.

In each case, the returned file descriptor is opened with O_RDONLY
and O_CLOEXEC (close-on-exec).

By applying fstat(2) to the returned file descriptor, one obtains
a stat structure whose st_ino (inode number) field identifies the
owning/parent namespace. This inode number can be matched with
the inode number of another /proc/[pid]/ns/{pid,user} file to
determine whether that is the owning/parent namespace.

Either of these ioctl(2) operations can fail with the following
error:

EPERM The requested namespace is outside of the caller's names‐
pace scope. This error can occur if, for example, the own‐
ing user namespace is an ancestor of the caller's current
user namespace. It can also occur on attempts to obtain
the parent of the initial user or PID namespace.

Additionally, the NS_GET_PARENT operation can fail with the fol‐
lowing error:

EINVAL fd refers to a nonhierarchical namespace.

See the EXAMPLE section for an example of the use of these opera‐
tions.

[...]

EXAMPLE
The example shown below uses the ioctl(2) operations described
above to perform simple introspection of namespace relationships.
The following shell sessions show various examples of the use of
this program.

Trying to get the parent of the initial user namespace fails, for
the reasons explained earlier:

$ ./ns_introspect /proc/self/ns/user p
The parent namespace is outside your namespace scope

Create a process running sleep(1) that resides in new user and UTS
namespaces, and show that new UTS namespace is associated with the
new user namespace:

$ unshare -Uu sleep 1000 &
[1] 23235
$ ./ns_introspect /proc/23235/ns/uts
Inode number of owning user namespace is: 4026532448
$ readlink /proc/23235/ns/user
user:[4026532448]

Then show that the parent of the new user namespace in the preced‐
ing example is the initial user namespace:

$ readlink /proc/self/ns/user
user:[4026531837]
$ ./ns_introspect /proc/23235/ns/user
Inode number of owning user namespace is: 4026531837

Start a shell in a new user namespace, and show that from within
this shell, the parent user namespace can't be discovered. Simi‐
larly, the UTS namespace (which is associated with the initial
user namespace) can't be discovered.

$ PS1="sh2$ " unshare -U bash
sh2$ ./ns_introspect /proc/self/ns/user p
The parent namespace is outside your namespace scope
sh2$ ./ns_introspect /proc/self/ns/uts u
The owning user namespace is outside your namespace scope

Program source

/* ns_introspect.c

Licensed under GNU General Public License v2 or later
*/
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <string.h>
#include <errno.h>

#ifndef NS_GET_USERNS
#define NSIO 0xb7
#define NS_GET_USERNS _IO(NSIO, 0x1)
#define NS_GET_PARENT _IO(NSIO, 0x2)
#endif

int
main(int argc, char *argv[])
{
int fd, userns_fd, parent_fd;
struct stat sb;

if (argc < 2) {
fprintf(stderr, "Usage: %s /proc/[pid]/ns/[file] [p|u]\n",
argv[0]);
fprintf(stderr, "\nDisplay the result of one or both "
"of NS_GET_USERNS (u) or NS_GET_PARENT (p)\n"
"for the specified /proc/[pid]/ns/[file]. If neither "
"'p' nor 'u' is specified,\n"
"NS_GET_USERNS is the default.\n");
exit(EXIT_FAILURE);
}

/* Obtain a file descriptor for the 'ns' file specified
in argv[1] */

fd = open(argv[1], O_RDONLY);
if (fd == -1) {
perror("open");
exit(EXIT_FAILURE);
}

/* Obtain a file descriptor for the owning user namespace and
then obtain and display the inode number of that namespace */

if (argc < 3 || strchr(argv[2], 'u')) {
userns_fd = ioctl(fd, NS_GET_USERNS);

if (userns_fd == -1) {
if (errno == EPERM)
printf("The owning user namespace is outside "
"your namespace scope\n");
else
perror("ioctl-NS_GET_USERNS");
exit(EXIT_FAILURE);
}

if (fstat(userns_fd, &sb) == -1) {
perror("fstat-userns");
exit(EXIT_FAILURE);
}
printf("Inode number of owning user namespace is: %ld\n",
(long) sb.st_ino);

close(userns_fd);
}

/* Obtain a file descriptor for the parent namespace and
then obtain and display the inode number of that namespace */

if (argc > 2 && strchr(argv[2], 'p')) {
parent_fd = ioctl(fd, NS_GET_PARENT);

if (parent_fd == -1) {
if (errno == EINVAL)
printf("Can' get parent namespace of a "
"nonhierarchical namespace\n");
else if (errno == EPERM)
printf("The parent namespace is outside "
"your namespace scope\n");
else
perror("ioctl-NS_GET_PARENT");
exit(EXIT_FAILURE);
}

if (fstat(parent_fd, &sb) == -1) {
perror("fstat-parentns");
exit(EXIT_FAILURE);
}
printf("Inode number of parent namespace is: %ld\n",
(long) sb.st_ino);

close(parent_fd);
}

exit(EXIT_SUCCESS);
}


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


2016-12-11 22:33:45

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Documenting the ioctl interfaces to discover relationships between namespaces

"Michael Kerrisk (man-pages)" <[email protected]> writes:

> [was: [PATCH 0/4 v3] Add an interface to discover relationships
> between namespaces]

One small comment below.

>
> Introspecting namespace relationships
> Since Linux 4.9, two ioctl(2) operations are provided to allow
> introspection of namespace relationships (see user_namespaces(7)
> and pid_namespaces(7)). The form of the calls is:
>
> ioctl(fd, request);
>
> In each case, fd refers to a /proc/[pid]/ns/* file.
>
> NS_GET_USERNS
> Returns a file descriptor that refers to the owning user
> namespace for the namespace referred to by fd.
>
> NS_GET_PARENT
> Returns a file descriptor that refers to the parent names‐
> pace of the namespace referred to by fd. This operation is
> valid only for hierarchical namespaces (i.e., PID and user
> namespaces). For user namespaces, NS_GET_PARENT is synony‐
> mous with NS_GET_USERNS.
>
> In each case, the returned file descriptor is opened with O_RDONLY
> and O_CLOEXEC (close-on-exec).
>
> By applying fstat(2) to the returned file descriptor, one obtains
> a stat structure whose st_ino (inode number) field identifies the
> owning/parent namespace. This inode number can be matched with
> the inode number of another /proc/[pid]/ns/{pid,user} file to
> determine whether that is the owning/parent namespace.

Like all fstat inode comparisons to be fully accurate you need to
compare both the st_ino and st_dev. I reserve the right for st_dev to
be significant when comparing namespaces. Otherwise I might have to
create a namespace of namespaces someday and that is ugly.

> Either of these ioctl(2) operations can fail with the following
> error:
>
> EPERM The requested namespace is outside of the caller's names‐
> pace scope. This error can occur if, for example, the own‐
> ing user namespace is an ancestor of the caller's current
> user namespace. It can also occur on attempts to obtain
> the parent of the initial user or PID namespace.
>
> Additionally, the NS_GET_PARENT operation can fail with the fol‐
> lowing error:
>
> EINVAL fd refers to a nonhierarchical namespace.
>
> See the EXAMPLE section for an example of the use of these opera‐
> tions.
>
> [...]

Eric

Subject: Re: Documenting the ioctl interfaces to discover relationships between namespaces

[Fixing Serge's address in my original CC]

On 12/11/2016 11:30 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>
>> [was: [PATCH 0/4 v3] Add an interface to discover relationships
>> between namespaces]
>
> One small comment below.
>
>>
>> Introspecting namespace relationships
>> Since Linux 4.9, two ioctl(2) operations are provided to allow
>> introspection of namespace relationships (see user_namespaces(7)
>> and pid_namespaces(7)). The form of the calls is:
>>
>> ioctl(fd, request);
>>
>> In each case, fd refers to a /proc/[pid]/ns/* file.
>>
>> NS_GET_USERNS
>> Returns a file descriptor that refers to the owning user
>> namespace for the namespace referred to by fd.
>>
>> NS_GET_PARENT
>> Returns a file descriptor that refers to the parent names‐
>> pace of the namespace referred to by fd. This operation is
>> valid only for hierarchical namespaces (i.e., PID and user
>> namespaces). For user namespaces, NS_GET_PARENT is synony‐
>> mous with NS_GET_USERNS.
>>
>> In each case, the returned file descriptor is opened with O_RDONLY
>> and O_CLOEXEC (close-on-exec).
>>
>> By applying fstat(2) to the returned file descriptor, one obtains
>> a stat structure whose st_ino (inode number) field identifies the
>> owning/parent namespace. This inode number can be matched with
>> the inode number of another /proc/[pid]/ns/{pid,user} file to
>> determine whether that is the owning/parent namespace.
>
> Like all fstat inode comparisons to be fully accurate you need to
> compare both the st_ino and st_dev. I reserve the right for st_dev to
> be significant when comparing namespaces. Otherwise I might have to
> create a namespace of namespaces someday and that is ugly.

Ah yes. Thanks for catching that. I've adjusted the text,
and the example program.

Cheers,

Michael

>> Either of these ioctl(2) operations can fail with the following
>> error:
>>
>> EPERM The requested namespace is outside of the caller's names‐
>> pace scope. This error can occur if, for example, the own‐
>> ing user namespace is an ancestor of the caller's current
>> user namespace. It can also occur on attempts to obtain
>> the parent of the initial user or PID namespace.
>>
>> Additionally, the NS_GET_PARENT operation can fail with the fol‐
>> lowing error:
>>
>> EINVAL fd refers to a nonhierarchical namespace.
>>
>> See the EXAMPLE section for an example of the use of these opera‐
>> tions.
>>
>> [...]
>
> Eric
>


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Subject: Re: Documenting the ioctl interfaces to discover relationships between namespaces

On 12/11/2016 11:30 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>
>> [was: [PATCH 0/4 v3] Add an interface to discover relationships
>> between namespaces]
>
> One small comment below.
>
>>
>> Introspecting namespace relationships
>> Since Linux 4.9, two ioctl(2) operations are provided to allow
>> introspection of namespace relationships (see user_namespaces(7)
>> and pid_namespaces(7)). The form of the calls is:
>>
>> ioctl(fd, request);
>>
>> In each case, fd refers to a /proc/[pid]/ns/* file.
>>
>> NS_GET_USERNS
>> Returns a file descriptor that refers to the owning user
>> namespace for the namespace referred to by fd.
>>
>> NS_GET_PARENT
>> Returns a file descriptor that refers to the parent names‐
>> pace of the namespace referred to by fd. This operation is
>> valid only for hierarchical namespaces (i.e., PID and user
>> namespaces). For user namespaces, NS_GET_PARENT is synony‐
>> mous with NS_GET_USERNS.
>>
>> In each case, the returned file descriptor is opened with O_RDONLY
>> and O_CLOEXEC (close-on-exec).
>>
>> By applying fstat(2) to the returned file descriptor, one obtains
>> a stat structure whose st_ino (inode number) field identifies the
>> owning/parent namespace. This inode number can be matched with
>> the inode number of another /proc/[pid]/ns/{pid,user} file to
>> determine whether that is the owning/parent namespace.
>
> Like all fstat inode comparisons to be fully accurate you need to
> compare both the st_ino and st_dev. I reserve the right for st_dev to
> be significant when comparing namespaces. Otherwise I might have to
> create a namespace of namespaces someday and that is ugly.
>
>> Either of these ioctl(2) operations can fail with the following
>> error:
>>
>> EPERM The requested namespace is outside of the caller's names‐
>> pace scope. This error can occur if, for example, the own‐
>> ing user namespace is an ancestor of the caller's current
>> user namespace. It can also occur on attempts to obtain
>> the parent of the initial user or PID namespace.
>>
>> Additionally, the NS_GET_PARENT operation can fail with the fol‐
>> lowing error:
>>
>> EINVAL fd refers to a nonhierarchical namespace.
>>
>> See the EXAMPLE section for an example of the use of these opera‐
>> tions.

So, after playing with this a bit, I have a question.

I gather that in order to, for example, elaborate the tree of user
namespaces on the system, one would use NS_GET_PARENT on each of
the /proc/*/ns/user files and match up the results. Right?

What happens if one of the parent user namespaces contains no
processes? That is, the parent namespace exists by virtue of being
pinned because a proc/PID/ns/user file is open or bind mounted.
(Chrome seems to do this sort of dance with user namespaces, for
example.) How do we find the ancestor of *that* user namespace?

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-12-12 18:21:41

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Documenting the ioctl interfaces to discover relationships between namespaces

"Michael Kerrisk (man-pages)" <[email protected]> writes:

> On 12/11/2016 11:30 PM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>
>>> [was: [PATCH 0/4 v3] Add an interface to discover relationships
>>> between namespaces]
>>
>> One small comment below.
>>
>>>
>>> Introspecting namespace relationships
>>> Since Linux 4.9, two ioctl(2) operations are provided to allow
>>> introspection of namespace relationships (see user_namespaces(7)
>>> and pid_namespaces(7)). The form of the calls is:
>>>
>>> ioctl(fd, request);
>>>
>>> In each case, fd refers to a /proc/[pid]/ns/* file.
>>>
>>> NS_GET_USERNS
>>> Returns a file descriptor that refers to the owning user
>>> namespace for the namespace referred to by fd.
>>>
>>> NS_GET_PARENT
>>> Returns a file descriptor that refers to the parent names‐
>>> pace of the namespace referred to by fd. This operation is
>>> valid only for hierarchical namespaces (i.e., PID and user
>>> namespaces). For user namespaces, NS_GET_PARENT is synony‐
>>> mous with NS_GET_USERNS.
>>>
>>> In each case, the returned file descriptor is opened with O_RDONLY
>>> and O_CLOEXEC (close-on-exec).
>>>
>>> By applying fstat(2) to the returned file descriptor, one obtains
>>> a stat structure whose st_ino (inode number) field identifies the
>>> owning/parent namespace. This inode number can be matched with
>>> the inode number of another /proc/[pid]/ns/{pid,user} file to
>>> determine whether that is the owning/parent namespace.
>>
>> Like all fstat inode comparisons to be fully accurate you need to
>> compare both the st_ino and st_dev. I reserve the right for st_dev to
>> be significant when comparing namespaces. Otherwise I might have to
>> create a namespace of namespaces someday and that is ugly.
>>
>>> Either of these ioctl(2) operations can fail with the following
>>> error:
>>>
>>> EPERM The requested namespace is outside of the caller's names‐
>>> pace scope. This error can occur if, for example, the own‐
>>> ing user namespace is an ancestor of the caller's current
>>> user namespace. It can also occur on attempts to obtain
>>> the parent of the initial user or PID namespace.
>>>
>>> Additionally, the NS_GET_PARENT operation can fail with the fol‐
>>> lowing error:
>>>
>>> EINVAL fd refers to a nonhierarchical namespace.
>>>
>>> See the EXAMPLE section for an example of the use of these opera‐
>>> tions.
>
> So, after playing with this a bit, I have a question.
>
> I gather that in order to, for example, elaborate the tree of user
> namespaces on the system, one would use NS_GET_PARENT on each of
> the /proc/*/ns/user files and match up the results. Right?
>
> What happens if one of the parent user namespaces contains no
> processes? That is, the parent namespace exists by virtue of being
> pinned because a proc/PID/ns/user file is open or bind mounted.
> (Chrome seems to do this sort of dance with user namespaces, for
> example.) How do we find the ancestor of *that* user namespace?

What is returned from NS_GET_USERNS and NS_GET_PARENT is a file
descriptor, that you can call NS_GET_PARENT on.

Eric

Subject: Re: Documenting the ioctl interfaces to discover relationships between namespaces

On 12/12/2016 07:18 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>
>> On 12/11/2016 11:30 PM, Eric W. Biederman wrote:
>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>
>>>> [was: [PATCH 0/4 v3] Add an interface to discover relationships
>>>> between namespaces]
>>>
>>> One small comment below.
>>>
>>>>
>>>> Introspecting namespace relationships
>>>> Since Linux 4.9, two ioctl(2) operations are provided to allow
>>>> introspection of namespace relationships (see user_namespaces(7)
>>>> and pid_namespaces(7)). The form of the calls is:
>>>>
>>>> ioctl(fd, request);
>>>>
>>>> In each case, fd refers to a /proc/[pid]/ns/* file.
>>>>
>>>> NS_GET_USERNS
>>>> Returns a file descriptor that refers to the owning user
>>>> namespace for the namespace referred to by fd.
>>>>
>>>> NS_GET_PARENT
>>>> Returns a file descriptor that refers to the parent names‐
>>>> pace of the namespace referred to by fd. This operation is
>>>> valid only for hierarchical namespaces (i.e., PID and user
>>>> namespaces). For user namespaces, NS_GET_PARENT is synony‐
>>>> mous with NS_GET_USERNS.
>>>>
>>>> In each case, the returned file descriptor is opened with O_RDONLY
>>>> and O_CLOEXEC (close-on-exec).
>>>>
>>>> By applying fstat(2) to the returned file descriptor, one obtains
>>>> a stat structure whose st_ino (inode number) field identifies the
>>>> owning/parent namespace. This inode number can be matched with
>>>> the inode number of another /proc/[pid]/ns/{pid,user} file to
>>>> determine whether that is the owning/parent namespace.
>>>
>>> Like all fstat inode comparisons to be fully accurate you need to
>>> compare both the st_ino and st_dev. I reserve the right for st_dev to
>>> be significant when comparing namespaces. Otherwise I might have to
>>> create a namespace of namespaces someday and that is ugly.
>>>
>>>> Either of these ioctl(2) operations can fail with the following
>>>> error:
>>>>
>>>> EPERM The requested namespace is outside of the caller's names‐
>>>> pace scope. This error can occur if, for example, the own‐
>>>> ing user namespace is an ancestor of the caller's current
>>>> user namespace. It can also occur on attempts to obtain
>>>> the parent of the initial user or PID namespace.
>>>>
>>>> Additionally, the NS_GET_PARENT operation can fail with the fol‐
>>>> lowing error:
>>>>
>>>> EINVAL fd refers to a nonhierarchical namespace.
>>>>
>>>> See the EXAMPLE section for an example of the use of these opera‐
>>>> tions.
>>
>> So, after playing with this a bit, I have a question.
>>
>> I gather that in order to, for example, elaborate the tree of user
>> namespaces on the system, one would use NS_GET_PARENT on each of
>> the /proc/*/ns/user files and match up the results. Right?
>>
>> What happens if one of the parent user namespaces contains no
>> processes? That is, the parent namespace exists by virtue of being
>> pinned because a proc/PID/ns/user file is open or bind mounted.
>> (Chrome seems to do this sort of dance with user namespaces, for
>> example.) How do we find the ancestor of *that* user namespace?
>
> What is returned from NS_GET_USERNS and NS_GET_PARENT is a file
> descriptor, that you can call NS_GET_PARENT on.

Thanks, Eric. While trying to solve the small task I set myself,
and probably confused by past discussions[1], I was overlooking
the obvious.

Cheers,

Michael

[1] https://lkml.org/lkml/2016/7/28/365

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Subject: Re: Documenting the ioctl interfaces to discover relationships between namespaces

On 12/15/2016 01:46 AM, Andrei Vagin wrote:
> On Sun, Dec 11, 2016 at 12:54:56PM +0100, Michael Kerrisk (man-pages) wrote:
>> [was: [PATCH 0/4 v3] Add an interface to discover relationships
>> between namespaces]
>>
>> Hello Andrei
>>
>> See below for my attempt to document the following.
>
> Hi Michael,
>
> Eric already did my work:). I have read this documentation and it looks
> good for me. I have nothing to add to Eric's comments.

Thanks, Andrei!

Cheers,

Michael

>>
>> On 6 September 2016 at 09:47, Andrei Vagin <[email protected]> wrote:
>>> From: Andrey Vagin <[email protected]>
>>>
>>> Each namespace has an owning user namespace and now there is not way
>>> to discover these relationships.
>>>
>>> Pid and user namepaces are hierarchical. There is no way to discover
>>> parent-child relationships too.
>>>
>>> Why we may want to know relationships between namespaces?
>>>
>>> One use would be visualization, in order to understand the running
>>> system. Another would be to answer the question: what capability does
>>> process X have to perform operations on a resource governed by namespace
>>> Y?
>>>
>>> One more use-case (which usually called abnormal) is checkpoint/restart.
>>> In CRIU we are going to dump and restore nested namespaces.
>>>
>>> There [1] was a discussion about which interface to choose to determing
>>> relationships between namespaces.
>>>
>>> Eric suggested to add two ioctl-s [2]:
>>>> Grumble, Grumble. I think this may actually a case for creating ioctls
>>>> for these two cases. Now that random nsfs file descriptors are bind
>>>> mountable the original reason for using proc files is not as pressing.
>>>>
>>>> One ioctl for the user namespace that owns a file descriptor.
>>>> One ioctl for the parent namespace of a namespace file descriptor.
>>>
>>> Here is an implementaions of these ioctl-s.
>>>
>>> $ man man7/namespaces.7
>>> ...
>>> Since Linux 4.X, the following ioctl(2) calls are supported for
>>> namespace file descriptors. The correct syntax is:
>>>
>>> fd = ioctl(ns_fd, ioctl_type);
>>>
>>> where ioctl_type is one of the following:
>>>
>>> NS_GET_USERNS
>>> Returns a file descriptor that refers to an owning user names‐
>>> pace.
>>>
>>> NS_GET_PARENT
>>> Returns a file descriptor that refers to a parent namespace.
>>> This ioctl(2) can be used for pid and user namespaces. For
>>> user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same
>>> meaning.
>>>
>>> In addition to generic ioctl(2) errors, the following specific ones
>>> can occur:
>>>
>>> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
>>>
>>> EPERM The requested namespace is outside of the current namespace
>>> scope.
>>>
>>> [1] https://lkml.org/lkml/2016/7/6/158
>>> [2] https://lkml.org/lkml/2016/7/9/101
>>
>> The following is the text I propose to add to the namespaces(7) page.
>> Could you please review and let me know of corrections and
>> improvements.
>>
>> Thanks,
>>
>> Michael
>>
>>
>> Introspecting namespace relationships
>> Since Linux 4.9, two ioctl(2) operations are provided to allow
>> introspection of namespace relationships (see user_namespaces(7)
>> and pid_namespaces(7)). The form of the calls is:
>>
>> ioctl(fd, request);
>>
>> In each case, fd refers to a /proc/[pid]/ns/* file.
>>
>> NS_GET_USERNS
>> Returns a file descriptor that refers to the owning user
>> namespace for the namespace referred to by fd.
>>
>> NS_GET_PARENT
>> Returns a file descriptor that refers to the parent names‐
>> pace of the namespace referred to by fd. This operation is
>> valid only for hierarchical namespaces (i.e., PID and user
>> namespaces). For user namespaces, NS_GET_PARENT is synony‐
>> mous with NS_GET_USERNS.
>>
>> In each case, the returned file descriptor is opened with O_RDONLY
>> and O_CLOEXEC (close-on-exec).
>>
>> By applying fstat(2) to the returned file descriptor, one obtains
>> a stat structure whose st_ino (inode number) field identifies the
>> owning/parent namespace. This inode number can be matched with
>> the inode number of another /proc/[pid]/ns/{pid,user} file to
>> determine whether that is the owning/parent namespace.
>>
>> Either of these ioctl(2) operations can fail with the following
>> error:
>>
>> EPERM The requested namespace is outside of the caller's names‐
>> pace scope. This error can occur if, for example, the own‐
>> ing user namespace is an ancestor of the caller's current
>> user namespace. It can also occur on attempts to obtain
>> the parent of the initial user or PID namespace.
>>
>> Additionally, the NS_GET_PARENT operation can fail with the fol‐
>> lowing error:
>>
>> EINVAL fd refers to a nonhierarchical namespace.
>>
>> See the EXAMPLE section for an example of the use of these opera‐
>> tions.
>>
>> [...]
>>
>> EXAMPLE
>> The example shown below uses the ioctl(2) operations described
>> above to perform simple introspection of namespace relationships.
>> The following shell sessions show various examples of the use of
>> this program.
>>
>> Trying to get the parent of the initial user namespace fails, for
>> the reasons explained earlier:
>>
>> $ ./ns_introspect /proc/self/ns/user p
>> The parent namespace is outside your namespace scope
>>
>> Create a process running sleep(1) that resides in new user and UTS
>> namespaces, and show that new UTS namespace is associated with the
>> new user namespace:
>>
>> $ unshare -Uu sleep 1000 &
>> [1] 23235
>> $ ./ns_introspect /proc/23235/ns/uts
>> Inode number of owning user namespace is: 4026532448
>> $ readlink /proc/23235/ns/user
>> user:[4026532448]
>>
>> Then show that the parent of the new user namespace in the preced‐
>> ing example is the initial user namespace:
>>
>> $ readlink /proc/self/ns/user
>> user:[4026531837]
>> $ ./ns_introspect /proc/23235/ns/user
>> Inode number of owning user namespace is: 4026531837
>>
>> Start a shell in a new user namespace, and show that from within
>> this shell, the parent user namespace can't be discovered. Simi‐
>> larly, the UTS namespace (which is associated with the initial
>> user namespace) can't be discovered.
>>
>> $ PS1="sh2$ " unshare -U bash
>> sh2$ ./ns_introspect /proc/self/ns/user p
>> The parent namespace is outside your namespace scope
>> sh2$ ./ns_introspect /proc/self/ns/uts u
>> The owning user namespace is outside your namespace scope
>>
>> Program source
>>
>> /* ns_introspect.c
>>
>> Licensed under GNU General Public License v2 or later
>> */
>> #include <stdlib.h>
>> #include <unistd.h>
>> #include <stdio.h>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>> #include <sys/ioctl.h>
>> #include <string.h>
>> #include <errno.h>
>>
>> #ifndef NS_GET_USERNS
>> #define NSIO 0xb7
>> #define NS_GET_USERNS _IO(NSIO, 0x1)
>> #define NS_GET_PARENT _IO(NSIO, 0x2)
>> #endif
>>
>> int
>> main(int argc, char *argv[])
>> {
>> int fd, userns_fd, parent_fd;
>> struct stat sb;
>>
>> if (argc < 2) {
>> fprintf(stderr, "Usage: %s /proc/[pid]/ns/[file] [p|u]\n",
>> argv[0]);
>> fprintf(stderr, "\nDisplay the result of one or both "
>> "of NS_GET_USERNS (u) or NS_GET_PARENT (p)\n"
>> "for the specified /proc/[pid]/ns/[file]. If neither "
>> "'p' nor 'u' is specified,\n"
>> "NS_GET_USERNS is the default.\n");
>> exit(EXIT_FAILURE);
>> }
>>
>> /* Obtain a file descriptor for the 'ns' file specified
>> in argv[1] */
>>
>> fd = open(argv[1], O_RDONLY);
>> if (fd == -1) {
>> perror("open");
>> exit(EXIT_FAILURE);
>> }
>>
>> /* Obtain a file descriptor for the owning user namespace and
>> then obtain and display the inode number of that namespace */
>>
>> if (argc < 3 || strchr(argv[2], 'u')) {
>> userns_fd = ioctl(fd, NS_GET_USERNS);
>>
>> if (userns_fd == -1) {
>> if (errno == EPERM)
>> printf("The owning user namespace is outside "
>> "your namespace scope\n");
>> else
>> perror("ioctl-NS_GET_USERNS");
>> exit(EXIT_FAILURE);
>> }
>>
>> if (fstat(userns_fd, &sb) == -1) {
>> perror("fstat-userns");
>> exit(EXIT_FAILURE);
>> }
>> printf("Inode number of owning user namespace is: %ld\n",
>> (long) sb.st_ino);
>>
>> close(userns_fd);
>> }
>>
>> /* Obtain a file descriptor for the parent namespace and
>> then obtain and display the inode number of that namespace */
>>
>> if (argc > 2 && strchr(argv[2], 'p')) {
>> parent_fd = ioctl(fd, NS_GET_PARENT);
>>
>> if (parent_fd == -1) {
>> if (errno == EINVAL)
>> printf("Can' get parent namespace of a "
>> "nonhierarchical namespace\n");
>> else if (errno == EPERM)
>> printf("The parent namespace is outside "
>> "your namespace scope\n");
>> else
>> perror("ioctl-NS_GET_PARENT");
>> exit(EXIT_FAILURE);
>> }
>>
>> if (fstat(parent_fd, &sb) == -1) {
>> perror("fstat-parentns");
>> exit(EXIT_FAILURE);
>> }
>> printf("Inode number of parent namespace is: %ld\n",
>> (long) sb.st_ino);
>>
>> close(parent_fd);
>> }
>>
>> exit(EXIT_SUCCESS);
>> }
>>
>>
>> --
>> Michael Kerrisk
>> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
>> Linux/UNIX System Programming Training: http://man7.org/training/
>


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-12-15 12:26:50

by Andrei Vagin

[permalink] [raw]
Subject: Re: Documenting the ioctl interfaces to discover relationships between namespaces

On Sun, Dec 11, 2016 at 12:54:56PM +0100, Michael Kerrisk (man-pages) wrote:
> [was: [PATCH 0/4 v3] Add an interface to discover relationships
> between namespaces]
>
> Hello Andrei
>
> See below for my attempt to document the following.

Hi Michael,

Eric already did my work:). I have read this documentation and it looks
good for me. I have nothing to add to Eric's comments.

Thanks,
Andrei

>
> On 6 September 2016 at 09:47, Andrei Vagin <[email protected]> wrote:
> > From: Andrey Vagin <[email protected]>
> >
> > Each namespace has an owning user namespace and now there is not way
> > to discover these relationships.
> >
> > Pid and user namepaces are hierarchical. There is no way to discover
> > parent-child relationships too.
> >
> > Why we may want to know relationships between namespaces?
> >
> > One use would be visualization, in order to understand the running
> > system. Another would be to answer the question: what capability does
> > process X have to perform operations on a resource governed by namespace
> > Y?
> >
> > One more use-case (which usually called abnormal) is checkpoint/restart.
> > In CRIU we are going to dump and restore nested namespaces.
> >
> > There [1] was a discussion about which interface to choose to determing
> > relationships between namespaces.
> >
> > Eric suggested to add two ioctl-s [2]:
> >> Grumble, Grumble. I think this may actually a case for creating ioctls
> >> for these two cases. Now that random nsfs file descriptors are bind
> >> mountable the original reason for using proc files is not as pressing.
> >>
> >> One ioctl for the user namespace that owns a file descriptor.
> >> One ioctl for the parent namespace of a namespace file descriptor.
> >
> > Here is an implementaions of these ioctl-s.
> >
> > $ man man7/namespaces.7
> > ...
> > Since Linux 4.X, the following ioctl(2) calls are supported for
> > namespace file descriptors. The correct syntax is:
> >
> > fd = ioctl(ns_fd, ioctl_type);
> >
> > where ioctl_type is one of the following:
> >
> > NS_GET_USERNS
> > Returns a file descriptor that refers to an owning user names‐
> > pace.
> >
> > NS_GET_PARENT
> > Returns a file descriptor that refers to a parent namespace.
> > This ioctl(2) can be used for pid and user namespaces. For
> > user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same
> > meaning.
> >
> > In addition to generic ioctl(2) errors, the following specific ones
> > can occur:
> >
> > EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
> >
> > EPERM The requested namespace is outside of the current namespace
> > scope.
> >
> > [1] https://lkml.org/lkml/2016/7/6/158
> > [2] https://lkml.org/lkml/2016/7/9/101
>
> The following is the text I propose to add to the namespaces(7) page.
> Could you please review and let me know of corrections and
> improvements.
>
> Thanks,
>
> Michael
>
>
> Introspecting namespace relationships
> Since Linux 4.9, two ioctl(2) operations are provided to allow
> introspection of namespace relationships (see user_namespaces(7)
> and pid_namespaces(7)). The form of the calls is:
>
> ioctl(fd, request);
>
> In each case, fd refers to a /proc/[pid]/ns/* file.
>
> NS_GET_USERNS
> Returns a file descriptor that refers to the owning user
> namespace for the namespace referred to by fd.
>
> NS_GET_PARENT
> Returns a file descriptor that refers to the parent names‐
> pace of the namespace referred to by fd. This operation is
> valid only for hierarchical namespaces (i.e., PID and user
> namespaces). For user namespaces, NS_GET_PARENT is synony‐
> mous with NS_GET_USERNS.
>
> In each case, the returned file descriptor is opened with O_RDONLY
> and O_CLOEXEC (close-on-exec).
>
> By applying fstat(2) to the returned file descriptor, one obtains
> a stat structure whose st_ino (inode number) field identifies the
> owning/parent namespace. This inode number can be matched with
> the inode number of another /proc/[pid]/ns/{pid,user} file to
> determine whether that is the owning/parent namespace.
>
> Either of these ioctl(2) operations can fail with the following
> error:
>
> EPERM The requested namespace is outside of the caller's names‐
> pace scope. This error can occur if, for example, the own‐
> ing user namespace is an ancestor of the caller's current
> user namespace. It can also occur on attempts to obtain
> the parent of the initial user or PID namespace.
>
> Additionally, the NS_GET_PARENT operation can fail with the fol‐
> lowing error:
>
> EINVAL fd refers to a nonhierarchical namespace.
>
> See the EXAMPLE section for an example of the use of these opera‐
> tions.
>
> [...]
>
> EXAMPLE
> The example shown below uses the ioctl(2) operations described
> above to perform simple introspection of namespace relationships.
> The following shell sessions show various examples of the use of
> this program.
>
> Trying to get the parent of the initial user namespace fails, for
> the reasons explained earlier:
>
> $ ./ns_introspect /proc/self/ns/user p
> The parent namespace is outside your namespace scope
>
> Create a process running sleep(1) that resides in new user and UTS
> namespaces, and show that new UTS namespace is associated with the
> new user namespace:
>
> $ unshare -Uu sleep 1000 &
> [1] 23235
> $ ./ns_introspect /proc/23235/ns/uts
> Inode number of owning user namespace is: 4026532448
> $ readlink /proc/23235/ns/user
> user:[4026532448]
>
> Then show that the parent of the new user namespace in the preced‐
> ing example is the initial user namespace:
>
> $ readlink /proc/self/ns/user
> user:[4026531837]
> $ ./ns_introspect /proc/23235/ns/user
> Inode number of owning user namespace is: 4026531837
>
> Start a shell in a new user namespace, and show that from within
> this shell, the parent user namespace can't be discovered. Simi‐
> larly, the UTS namespace (which is associated with the initial
> user namespace) can't be discovered.
>
> $ PS1="sh2$ " unshare -U bash
> sh2$ ./ns_introspect /proc/self/ns/user p
> The parent namespace is outside your namespace scope
> sh2$ ./ns_introspect /proc/self/ns/uts u
> The owning user namespace is outside your namespace scope
>
> Program source
>
> /* ns_introspect.c
>
> Licensed under GNU General Public License v2 or later
> */
> #include <stdlib.h>
> #include <unistd.h>
> #include <stdio.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/ioctl.h>
> #include <string.h>
> #include <errno.h>
>
> #ifndef NS_GET_USERNS
> #define NSIO 0xb7
> #define NS_GET_USERNS _IO(NSIO, 0x1)
> #define NS_GET_PARENT _IO(NSIO, 0x2)
> #endif
>
> int
> main(int argc, char *argv[])
> {
> int fd, userns_fd, parent_fd;
> struct stat sb;
>
> if (argc < 2) {
> fprintf(stderr, "Usage: %s /proc/[pid]/ns/[file] [p|u]\n",
> argv[0]);
> fprintf(stderr, "\nDisplay the result of one or both "
> "of NS_GET_USERNS (u) or NS_GET_PARENT (p)\n"
> "for the specified /proc/[pid]/ns/[file]. If neither "
> "'p' nor 'u' is specified,\n"
> "NS_GET_USERNS is the default.\n");
> exit(EXIT_FAILURE);
> }
>
> /* Obtain a file descriptor for the 'ns' file specified
> in argv[1] */
>
> fd = open(argv[1], O_RDONLY);
> if (fd == -1) {
> perror("open");
> exit(EXIT_FAILURE);
> }
>
> /* Obtain a file descriptor for the owning user namespace and
> then obtain and display the inode number of that namespace */
>
> if (argc < 3 || strchr(argv[2], 'u')) {
> userns_fd = ioctl(fd, NS_GET_USERNS);
>
> if (userns_fd == -1) {
> if (errno == EPERM)
> printf("The owning user namespace is outside "
> "your namespace scope\n");
> else
> perror("ioctl-NS_GET_USERNS");
> exit(EXIT_FAILURE);
> }
>
> if (fstat(userns_fd, &sb) == -1) {
> perror("fstat-userns");
> exit(EXIT_FAILURE);
> }
> printf("Inode number of owning user namespace is: %ld\n",
> (long) sb.st_ino);
>
> close(userns_fd);
> }
>
> /* Obtain a file descriptor for the parent namespace and
> then obtain and display the inode number of that namespace */
>
> if (argc > 2 && strchr(argv[2], 'p')) {
> parent_fd = ioctl(fd, NS_GET_PARENT);
>
> if (parent_fd == -1) {
> if (errno == EINVAL)
> printf("Can' get parent namespace of a "
> "nonhierarchical namespace\n");
> else if (errno == EPERM)
> printf("The parent namespace is outside "
> "your namespace scope\n");
> else
> perror("ioctl-NS_GET_PARENT");
> exit(EXIT_FAILURE);
> }
>
> if (fstat(parent_fd, &sb) == -1) {
> perror("fstat-parentns");
> exit(EXIT_FAILURE);
> }
> printf("Inode number of parent namespace is: %ld\n",
> (long) sb.st_ino);
>
> close(parent_fd);
> }
>
> exit(EXIT_SUCCESS);
> }
>
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/