Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753189AbcLKLzV (ORCPT ); Sun, 11 Dec 2016 06:55:21 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:34810 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752089AbcLKLzT (ORCPT ); Sun, 11 Dec 2016 06:55:19 -0500 MIME-Version: 1.0 Reply-To: mtk.manpages@gmail.com From: "Michael Kerrisk (man-pages)" Date: Sun, 11 Dec 2016 12:54:56 +0100 Message-ID: Subject: Documenting the ioctl interfaces to discover relationships between namespaces To: Andrei Vagin Cc: "Eric W. Biederman" , Containers , Linux API , lkml , "linux-fsdevel@vger.kernel.org" , James Bottomley , "W. Trevor King" , Alexander Viro , Serge Hallyn , Michael Kerrisk Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id uBBBtX11006549 Content-Length: 9906 Lines: 276 [was: [PATCH 0/4 v3] Add an interface to discover relationships between namespaces] Hello Andrei See below for my attempt to document the following. On 6 September 2016 at 09:47, Andrei Vagin wrote: > From: Andrey Vagin > > Each namespace has an owning user namespace and now there is not way > to discover these relationships. > > Pid and user namepaces are hierarchical. There is no way to discover > parent-child relationships too. > > Why we may want to know relationships between namespaces? > > One use would be visualization, in order to understand the running > system. Another would be to answer the question: what capability does > process X have to perform operations on a resource governed by namespace > Y? > > One more use-case (which usually called abnormal) is checkpoint/restart. > In CRIU we are going to dump and restore nested namespaces. > > There [1] was a discussion about which interface to choose to determing > relationships between namespaces. > > Eric suggested to add two ioctl-s [2]: >> Grumble, Grumble. I think this may actually a case for creating ioctls >> for these two cases. Now that random nsfs file descriptors are bind >> mountable the original reason for using proc files is not as pressing. >> >> One ioctl for the user namespace that owns a file descriptor. >> One ioctl for the parent namespace of a namespace file descriptor. > > Here is an implementaions of these ioctl-s. > > $ man man7/namespaces.7 > ... > Since Linux 4.X, the following ioctl(2) calls are supported for > namespace file descriptors. The correct syntax is: > > fd = ioctl(ns_fd, ioctl_type); > > where ioctl_type is one of the following: > > NS_GET_USERNS > Returns a file descriptor that refers to an owning user names‐ > pace. > > NS_GET_PARENT > Returns a file descriptor that refers to a parent namespace. > This ioctl(2) can be used for pid and user namespaces. For > user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same > meaning. > > In addition to generic ioctl(2) errors, the following specific ones > can occur: > > EINVAL NS_GET_PARENT was called for a nonhierarchical namespace. > > EPERM The requested namespace is outside of the current namespace > scope. > > [1] https://lkml.org/lkml/2016/7/6/158 > [2] https://lkml.org/lkml/2016/7/9/101 The following is the text I propose to add to the namespaces(7) page. Could you please review and let me know of corrections and improvements. Thanks, Michael Introspecting namespace relationships Since Linux 4.9, two ioctl(2) operations are provided to allow introspection of namespace relationships (see user_namespaces(7) and pid_namespaces(7)). The form of the calls is: ioctl(fd, request); In each case, fd refers to a /proc/[pid]/ns/* file. NS_GET_USERNS Returns a file descriptor that refers to the owning user namespace for the namespace referred to by fd. NS_GET_PARENT Returns a file descriptor that refers to the parent names‐ pace of the namespace referred to by fd. This operation is valid only for hierarchical namespaces (i.e., PID and user namespaces). For user namespaces, NS_GET_PARENT is synony‐ mous with NS_GET_USERNS. In each case, the returned file descriptor is opened with O_RDONLY and O_CLOEXEC (close-on-exec). By applying fstat(2) to the returned file descriptor, one obtains a stat structure whose st_ino (inode number) field identifies the owning/parent namespace. This inode number can be matched with the inode number of another /proc/[pid]/ns/{pid,user} file to determine whether that is the owning/parent namespace. Either of these ioctl(2) operations can fail with the following error: EPERM The requested namespace is outside of the caller's names‐ pace scope. This error can occur if, for example, the own‐ ing user namespace is an ancestor of the caller's current user namespace. It can also occur on attempts to obtain the parent of the initial user or PID namespace. Additionally, the NS_GET_PARENT operation can fail with the fol‐ lowing error: EINVAL fd refers to a nonhierarchical namespace. See the EXAMPLE section for an example of the use of these opera‐ tions. [...] EXAMPLE The example shown below uses the ioctl(2) operations described above to perform simple introspection of namespace relationships. The following shell sessions show various examples of the use of this program. Trying to get the parent of the initial user namespace fails, for the reasons explained earlier: $ ./ns_introspect /proc/self/ns/user p The parent namespace is outside your namespace scope Create a process running sleep(1) that resides in new user and UTS namespaces, and show that new UTS namespace is associated with the new user namespace: $ unshare -Uu sleep 1000 & [1] 23235 $ ./ns_introspect /proc/23235/ns/uts Inode number of owning user namespace is: 4026532448 $ readlink /proc/23235/ns/user user:[4026532448] Then show that the parent of the new user namespace in the preced‐ ing example is the initial user namespace: $ readlink /proc/self/ns/user user:[4026531837] $ ./ns_introspect /proc/23235/ns/user Inode number of owning user namespace is: 4026531837 Start a shell in a new user namespace, and show that from within this shell, the parent user namespace can't be discovered. Simi‐ larly, the UTS namespace (which is associated with the initial user namespace) can't be discovered. $ PS1="sh2$ " unshare -U bash sh2$ ./ns_introspect /proc/self/ns/user p The parent namespace is outside your namespace scope sh2$ ./ns_introspect /proc/self/ns/uts u The owning user namespace is outside your namespace scope Program source /* ns_introspect.c Licensed under GNU General Public License v2 or later */ #include #include #include #include #include #include #include #include #ifndef NS_GET_USERNS #define NSIO 0xb7 #define NS_GET_USERNS _IO(NSIO, 0x1) #define NS_GET_PARENT _IO(NSIO, 0x2) #endif int main(int argc, char *argv[]) { int fd, userns_fd, parent_fd; struct stat sb; if (argc < 2) { fprintf(stderr, "Usage: %s /proc/[pid]/ns/[file] [p|u]\n", argv[0]); fprintf(stderr, "\nDisplay the result of one or both " "of NS_GET_USERNS (u) or NS_GET_PARENT (p)\n" "for the specified /proc/[pid]/ns/[file]. If neither " "'p' nor 'u' is specified,\n" "NS_GET_USERNS is the default.\n"); exit(EXIT_FAILURE); } /* Obtain a file descriptor for the 'ns' file specified in argv[1] */ fd = open(argv[1], O_RDONLY); if (fd == -1) { perror("open"); exit(EXIT_FAILURE); } /* Obtain a file descriptor for the owning user namespace and then obtain and display the inode number of that namespace */ if (argc < 3 || strchr(argv[2], 'u')) { userns_fd = ioctl(fd, NS_GET_USERNS); if (userns_fd == -1) { if (errno == EPERM) printf("The owning user namespace is outside " "your namespace scope\n"); else perror("ioctl-NS_GET_USERNS"); exit(EXIT_FAILURE); } if (fstat(userns_fd, &sb) == -1) { perror("fstat-userns"); exit(EXIT_FAILURE); } printf("Inode number of owning user namespace is: %ld\n", (long) sb.st_ino); close(userns_fd); } /* Obtain a file descriptor for the parent namespace and then obtain and display the inode number of that namespace */ if (argc > 2 && strchr(argv[2], 'p')) { parent_fd = ioctl(fd, NS_GET_PARENT); if (parent_fd == -1) { if (errno == EINVAL) printf("Can' get parent namespace of a " "nonhierarchical namespace\n"); else if (errno == EPERM) printf("The parent namespace is outside " "your namespace scope\n"); else perror("ioctl-NS_GET_PARENT"); exit(EXIT_FAILURE); } if (fstat(parent_fd, &sb) == -1) { perror("fstat-parentns"); exit(EXIT_FAILURE); } printf("Inode number of parent namespace is: %ld\n", (long) sb.st_ino); close(parent_fd); } exit(EXIT_SUCCESS); } -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/