Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757764AbcLOKGm (ORCPT ); Thu, 15 Dec 2016 05:06:42 -0500 Received: from mail-wj0-f194.google.com ([209.85.210.194]:33870 "EHLO mail-wj0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755148AbcLOKFh (ORCPT ); Thu, 15 Dec 2016 05:05:37 -0500 Subject: Re: Documenting the ioctl interfaces to discover relationships between namespaces To: Andrei Vagin References: <20161215004636.GB31670@outlook.office365.com> Cc: mtk.manpages@gmail.com, Andrei Vagin , "Eric W. Biederman" , Containers , Linux API , lkml , "linux-fsdevel@vger.kernel.org" , James Bottomley , "W. Trevor King" , Alexander Viro , Serge Hallyn From: "Michael Kerrisk (man-pages)" Message-ID: <9690af90-f696-e867-bbbd-a511b46b9fe2@gmail.com> Date: Thu, 15 Dec 2016 10:53:02 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161215004636.GB31670@outlook.office365.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11090 Lines: 297 On 12/15/2016 01:46 AM, Andrei Vagin wrote: > On Sun, Dec 11, 2016 at 12:54:56PM +0100, Michael Kerrisk (man-pages) wrote: >> [was: [PATCH 0/4 v3] Add an interface to discover relationships >> between namespaces] >> >> Hello Andrei >> >> See below for my attempt to document the following. > > Hi Michael, > > Eric already did my work:). I have read this documentation and it looks > good for me. I have nothing to add to Eric's comments. Thanks, Andrei! Cheers, Michael >> >> On 6 September 2016 at 09:47, Andrei Vagin wrote: >>> From: Andrey Vagin >>> >>> Each namespace has an owning user namespace and now there is not way >>> to discover these relationships. >>> >>> Pid and user namepaces are hierarchical. There is no way to discover >>> parent-child relationships too. >>> >>> Why we may want to know relationships between namespaces? >>> >>> One use would be visualization, in order to understand the running >>> system. Another would be to answer the question: what capability does >>> process X have to perform operations on a resource governed by namespace >>> Y? >>> >>> One more use-case (which usually called abnormal) is checkpoint/restart. >>> In CRIU we are going to dump and restore nested namespaces. >>> >>> There [1] was a discussion about which interface to choose to determing >>> relationships between namespaces. >>> >>> Eric suggested to add two ioctl-s [2]: >>>> Grumble, Grumble. I think this may actually a case for creating ioctls >>>> for these two cases. Now that random nsfs file descriptors are bind >>>> mountable the original reason for using proc files is not as pressing. >>>> >>>> One ioctl for the user namespace that owns a file descriptor. >>>> One ioctl for the parent namespace of a namespace file descriptor. >>> >>> Here is an implementaions of these ioctl-s. >>> >>> $ man man7/namespaces.7 >>> ... >>> Since Linux 4.X, the following ioctl(2) calls are supported for >>> namespace file descriptors. The correct syntax is: >>> >>> fd = ioctl(ns_fd, ioctl_type); >>> >>> where ioctl_type is one of the following: >>> >>> NS_GET_USERNS >>> Returns a file descriptor that refers to an owning user names‐ >>> pace. >>> >>> NS_GET_PARENT >>> Returns a file descriptor that refers to a parent namespace. >>> This ioctl(2) can be used for pid and user namespaces. For >>> user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same >>> meaning. >>> >>> In addition to generic ioctl(2) errors, the following specific ones >>> can occur: >>> >>> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace. >>> >>> EPERM The requested namespace is outside of the current namespace >>> scope. >>> >>> [1] https://lkml.org/lkml/2016/7/6/158 >>> [2] https://lkml.org/lkml/2016/7/9/101 >> >> The following is the text I propose to add to the namespaces(7) page. >> Could you please review and let me know of corrections and >> improvements. >> >> Thanks, >> >> Michael >> >> >> Introspecting namespace relationships >> Since Linux 4.9, two ioctl(2) operations are provided to allow >> introspection of namespace relationships (see user_namespaces(7) >> and pid_namespaces(7)). The form of the calls is: >> >> ioctl(fd, request); >> >> In each case, fd refers to a /proc/[pid]/ns/* file. >> >> NS_GET_USERNS >> Returns a file descriptor that refers to the owning user >> namespace for the namespace referred to by fd. >> >> NS_GET_PARENT >> Returns a file descriptor that refers to the parent names‐ >> pace of the namespace referred to by fd. This operation is >> valid only for hierarchical namespaces (i.e., PID and user >> namespaces). For user namespaces, NS_GET_PARENT is synony‐ >> mous with NS_GET_USERNS. >> >> In each case, the returned file descriptor is opened with O_RDONLY >> and O_CLOEXEC (close-on-exec). >> >> By applying fstat(2) to the returned file descriptor, one obtains >> a stat structure whose st_ino (inode number) field identifies the >> owning/parent namespace. This inode number can be matched with >> the inode number of another /proc/[pid]/ns/{pid,user} file to >> determine whether that is the owning/parent namespace. >> >> Either of these ioctl(2) operations can fail with the following >> error: >> >> EPERM The requested namespace is outside of the caller's names‐ >> pace scope. This error can occur if, for example, the own‐ >> ing user namespace is an ancestor of the caller's current >> user namespace. It can also occur on attempts to obtain >> the parent of the initial user or PID namespace. >> >> Additionally, the NS_GET_PARENT operation can fail with the fol‐ >> lowing error: >> >> EINVAL fd refers to a nonhierarchical namespace. >> >> See the EXAMPLE section for an example of the use of these opera‐ >> tions. >> >> [...] >> >> EXAMPLE >> The example shown below uses the ioctl(2) operations described >> above to perform simple introspection of namespace relationships. >> The following shell sessions show various examples of the use of >> this program. >> >> Trying to get the parent of the initial user namespace fails, for >> the reasons explained earlier: >> >> $ ./ns_introspect /proc/self/ns/user p >> The parent namespace is outside your namespace scope >> >> Create a process running sleep(1) that resides in new user and UTS >> namespaces, and show that new UTS namespace is associated with the >> new user namespace: >> >> $ unshare -Uu sleep 1000 & >> [1] 23235 >> $ ./ns_introspect /proc/23235/ns/uts >> Inode number of owning user namespace is: 4026532448 >> $ readlink /proc/23235/ns/user >> user:[4026532448] >> >> Then show that the parent of the new user namespace in the preced‐ >> ing example is the initial user namespace: >> >> $ readlink /proc/self/ns/user >> user:[4026531837] >> $ ./ns_introspect /proc/23235/ns/user >> Inode number of owning user namespace is: 4026531837 >> >> Start a shell in a new user namespace, and show that from within >> this shell, the parent user namespace can't be discovered. Simi‐ >> larly, the UTS namespace (which is associated with the initial >> user namespace) can't be discovered. >> >> $ PS1="sh2$ " unshare -U bash >> sh2$ ./ns_introspect /proc/self/ns/user p >> The parent namespace is outside your namespace scope >> sh2$ ./ns_introspect /proc/self/ns/uts u >> The owning user namespace is outside your namespace scope >> >> Program source >> >> /* ns_introspect.c >> >> Licensed under GNU General Public License v2 or later >> */ >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> >> #ifndef NS_GET_USERNS >> #define NSIO 0xb7 >> #define NS_GET_USERNS _IO(NSIO, 0x1) >> #define NS_GET_PARENT _IO(NSIO, 0x2) >> #endif >> >> int >> main(int argc, char *argv[]) >> { >> int fd, userns_fd, parent_fd; >> struct stat sb; >> >> if (argc < 2) { >> fprintf(stderr, "Usage: %s /proc/[pid]/ns/[file] [p|u]\n", >> argv[0]); >> fprintf(stderr, "\nDisplay the result of one or both " >> "of NS_GET_USERNS (u) or NS_GET_PARENT (p)\n" >> "for the specified /proc/[pid]/ns/[file]. If neither " >> "'p' nor 'u' is specified,\n" >> "NS_GET_USERNS is the default.\n"); >> exit(EXIT_FAILURE); >> } >> >> /* Obtain a file descriptor for the 'ns' file specified >> in argv[1] */ >> >> fd = open(argv[1], O_RDONLY); >> if (fd == -1) { >> perror("open"); >> exit(EXIT_FAILURE); >> } >> >> /* Obtain a file descriptor for the owning user namespace and >> then obtain and display the inode number of that namespace */ >> >> if (argc < 3 || strchr(argv[2], 'u')) { >> userns_fd = ioctl(fd, NS_GET_USERNS); >> >> if (userns_fd == -1) { >> if (errno == EPERM) >> printf("The owning user namespace is outside " >> "your namespace scope\n"); >> else >> perror("ioctl-NS_GET_USERNS"); >> exit(EXIT_FAILURE); >> } >> >> if (fstat(userns_fd, &sb) == -1) { >> perror("fstat-userns"); >> exit(EXIT_FAILURE); >> } >> printf("Inode number of owning user namespace is: %ld\n", >> (long) sb.st_ino); >> >> close(userns_fd); >> } >> >> /* Obtain a file descriptor for the parent namespace and >> then obtain and display the inode number of that namespace */ >> >> if (argc > 2 && strchr(argv[2], 'p')) { >> parent_fd = ioctl(fd, NS_GET_PARENT); >> >> if (parent_fd == -1) { >> if (errno == EINVAL) >> printf("Can' get parent namespace of a " >> "nonhierarchical namespace\n"); >> else if (errno == EPERM) >> printf("The parent namespace is outside " >> "your namespace scope\n"); >> else >> perror("ioctl-NS_GET_PARENT"); >> exit(EXIT_FAILURE); >> } >> >> if (fstat(parent_fd, &sb) == -1) { >> perror("fstat-parentns"); >> exit(EXIT_FAILURE); >> } >> printf("Inode number of parent namespace is: %ld\n", >> (long) sb.st_ino); >> >> close(parent_fd); >> } >> >> exit(EXIT_SUCCESS); >> } >> >> >> -- >> Michael Kerrisk >> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ >> Linux/UNIX System Programming Training: http://man7.org/training/ > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/