Subject: [PATCH 0/2] Add further ioctl() operations for namespace discovery

Eric,

The code proposed in this patch series is pretty small. Is there any
chance we could make the 4.10 merge window, if the changes seem
acceptable to you?

I would like to write code that can answer the question: "what
capabilities does process X have in namespace Y"? (where Y is defined by
a file descriptor referring to one of the /proc/PID/ns/xxxx files). The
rules that determine the answer to this question are described in the
capabilities(7) manual page:

The rules for determining whether or not a process has a capabil‐
ity in a particular user namespace are as follows:

1. A process has a capability inside a user namespace if it is a
member of that namespace and it has the capability in its
effective capability set. A process can gain capabilities in
its effective capability set in various ways. For example, it
may execute a set-user-ID program or an executable with associ‐
ated file capabilities. In addition, a process may gain capa‐
bilities via the effect of clone(2), unshare(2), or setns(2),
as already described.

2. If a process has a capability in a user namespace, then it has
that capability in all child (and further removed descendant)
namespaces as well.

3. When a user namespace is created, the kernel records the effec‐
tive user ID of the creating process as being the "owner" of
the namespace. A process that resides in the parent of the
user namespace and whose effective user ID matches the owner of
the namespace has all capabilities in the namespace. By virtue
of the previous rule, this means that the process has all capa‐
bilities in all further removed descendant user namespaces as
well.

The NS_GET_PARENT and NS_GET_USERNS ioctl() operations added in Linux
4.9 provide much of what is needed, but AFAICT there are still a couple
of small pieces missing. Those pieces are added with this patch series.

Here's an example program that makes use of the new ioctl() operations.

8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
/* ns_capable.c

(C) 2016 Michael Kerrisk, <[email protected]>

Licensed under the GNU General Public License v2 or later.
*/
#define _GNU_SOURCE
#include <sched.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <limits.h>
#include <sys/capability.h>

#ifndef NS_GET_USERNS
#define NSIO 0xb7
#define NS_GET_USERNS _IO(NSIO, 0x1)
#define NS_GET_PARENT _IO(NSIO, 0x2)
#define NS_GET_NSTYPE _IO(NSIO, 0x3)
#define NS_GET_CREATOR_UID _IO(NSIO, 0x4)
#endif

#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)

#define fatal(msg) do { fprintf(stderr, "%s\n", msg); \
exit(EXIT_FAILURE); } while (0)

/* Display capabilities sets of process with specified PID */

static void
show_cap(pid_t pid)
{
cap_t caps;
char *cap_string;

caps = cap_get_pid(pid);
if (caps == NULL)
errExit("cap_get_proc");

cap_string = cap_to_text(caps, NULL);
if (cap_string == NULL)
errExit("cap_to_text");

printf("Capabilities: %s\n", cap_string);
}

/* Obtain the effective UID pf the process 'pid' by
scanning its /proc/PID/file */

static uid_t
get_euid_of_process(pid_t pid)
{
char path[PATH_MAX];
char line[1024];
int uid;

snprintf(path, sizeof(path), "/proc/%ld/status", (long) pid);

FILE *fp;
fp = fopen(path, "r");
if (fp == NULL)
errExit("fopen-/proc/PID/status");

for (;;) {
if (fgets(line, sizeof(line), fp) == NULL) {

/* Should never happen... */

fprintf(stderr, "Failure scanning %s\n", path);
exit(EXIT_FAILURE);
}

if (strstr(line, "Uid:") == line) {
sscanf(line, "Uid: %*d %d %*d %*d", &uid);
return uid;
}
}
}

int
main(int argc, char *argv[])
{
int ns_fd, userns_fd, pid_userns_fd;
int nstype;
int next_fd;
struct stat pid_stat;
struct stat target_stat;
char *pid_str;
pid_t pid;
char path[PATH_MAX];

if (argc < 2) {
fprintf(stderr, "Usage: %s PID [ns-file]\n", argv[0]);
fprintf(stderr, "\t'ns-file' is a /proc/PID/ns/xxxx file; if omitted, "
"use the namespace\n"
"\treferred to by standard input "
"(file descriptor 0)\n");
exit(EXIT_FAILURE);
}

pid_str = argv[1];
pid = atoi(pid_str);

if (argc <= 2) {
ns_fd = STDIN_FILENO;
} else {
ns_fd = open(argv[2], O_RDONLY);
if (ns_fd == -1)
errExit("open-ns-file");
}

/* Get the relevant user namespace FD, which is 'ns_fd' if 'ns_fd' refers
to a user namespace, otherwise the user namespace that owns 'ns_fd' */

nstype = ioctl(ns_fd, NS_GET_NSTYPE);
if (nstype == -1)
errExit("ioctl-NS_GET_NSTYPE");

if (nstype == CLONE_NEWUSER) {
userns_fd = ns_fd;
} else {
userns_fd = ioctl(ns_fd, NS_GET_USERNS);
if (userns_fd == -1)
errExit("ioctl-NS_GET_USERNS");
}

/* Obtain 'stat' info for the user namespace of the specified PID */

snprintf(path, sizeof(path), "/proc/%s/ns/user", pid_str);

pid_userns_fd = open(path, O_RDONLY);
if (pid_userns_fd == -1)
errExit("open-PID");

if (fstat(pid_userns_fd, &pid_stat) == -1)
errExit("fstat-PID");

/* Get 'stat' info for the target user namesapce */

if (fstat(userns_fd, &target_stat) == -1)
errExit("fstat-PID");

/* If the PID is in the target user namespace, then it has
whatever capabilities are in its sets. */

if (pid_stat.st_dev == target_stat.st_dev &&
pid_stat.st_ino == target_stat.st_ino) {
printf("PID is in target namespace\n");

show_cap(pid);

exit(EXIT_SUCCESS);
}

/* Otherwise, we need to walk through the ancestors of the target
user namespace to see if PID is in an ancestor namespace */

for (;;) {
int f;

next_fd = ioctl(userns_fd, NS_GET_PARENT);

if (next_fd == -1) {

/* The error here should be EPERM... */

if (errno != EPERM)
errExit("ioctl-NS_GET_PARENT");

printf("PID is not in an ancestor namespace\n");
printf("It has no capabilities in the target namespace\n");

exit(EXIT_SUCCESS);
}

if (fstat(next_fd, &target_stat) == -1)
errExit("fstat-PID");

/* If the 'stat' info for this user namespace matches the 'stat'
* info for 'next_fd', then the PID is in an ancestor namespace */

if (pid_stat.st_dev == target_stat.st_dev &&
pid_stat.st_ino == target_stat.st_ino)
break;

/* Next time round, get the next parent */

f = userns_fd;
userns_fd = next_fd;
close(f);
}

/* At this point, we found that PID is in an ancestor of the target
user namespace, and 'userns_fd' refers to the immediate descendant
user namespace of PID in the chain of user namespaces from PID to
the target user namespace. If the effective UID of PID matches the
creator UID of descendant user namespace, then PID has all
capabilities in the descendant namespace(s); otherwise, it just has
the capabilities that are in its sets. */

uid_t creator_uid, uid;

creator_uid = ioctl(userns_fd, NS_GET_CREATOR_UID);
if (creator_uid == -1)
errExit("ioctl-NS_GET_CREATOR_UID");

uid = get_euid_of_process(pid);

printf("PID is in an ancestor namespace\n");
if (creator_uid == uid) {
printf("And its effective UID matches the creator "
"of the namespace\n");
printf("PID has all capabilities in that namespace!\n");
} else {
printf("But its effective UID does not match the creator "
"of the namespace\n");
show_cap(pid);
}

exit(EXIT_SUCCESS);
}
8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---

Michael Kerrisk (2):
nsfs: Add an ioctl() to return the namespace type
nsfs: Add an ioctl() to return creator UID of a user namespace

fs/nsfs.c | 8 ++++++++
include/uapi/linux/nsfs.h | 9 +++++++--
2 files changed, 15 insertions(+), 2 deletions(-)

--
2.5.5

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


2016-12-19 22:56:54

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/2] Add further ioctl() operations for namespace discovery

"Michael Kerrisk (man-pages)" <[email protected]> writes:

> Eric,
>
> The code proposed in this patch series is pretty small. Is there any
> chance we could make the 4.10 merge window, if the changes seem
> acceptable to you?

I see why you are asking but I am not comfortable with aiming for
the merge window that is on-going and could close at any moment.
I have seen recenly too many patches that should work fine have
some odd minor issue. Like an extra _ in a label used in an ifdef
that resulted in memory stomps. Linus might be more brave but i would
rather wait until the next merge window, so I don't need to worry about
spoiling anyone's holidays with a typo someone over looked.

At first glance these patches seem reasonable. I don't see any problem
with the ioctls you have added.

That said I have a question. Should we provide a more direct way to
find the answer to your question? Something like the access system
call?

I think a more direct answer would be more maintainable in the long run
as it does not bind tools to specific implementation details in the
future. Which could allow us to account for LSM policies and the like.

Eric

Subject: Re: [PATCH 0/2] Add further ioctl() operations for namespace discovery

Hello Eric,

On 12/19/2016 11:53 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>
>> Eric,
>>
>> The code proposed in this patch series is pretty small. Is there any
>> chance we could make the 4.10 merge window, if the changes seem
>> acceptable to you?
>
> I see why you are asking but I am not comfortable with aiming for
> the merge window that is on-going and could close at any moment.
> I have seen recenly too many patches that should work fine have
> some odd minor issue. Like an extra _ in a label used in an ifdef
> that resulted in memory stomps. Linus might be more brave but i would
> rather wait until the next merge window, so I don't need to worry about
> spoiling anyone's holidays with a typo someone over looked.

I'll just gently ask if you'll reconsider and take another look at the
patches. They patches are very small, and don't change any existing
behavior. And if we see a problem in the next weeks they could be pulled.
In the meantime, I'd be aiming to publicize this API somewhat, so that we
might get some eyeballs to spot design bugs. But, I do understand your
position, if the answer is still "not for this merge window".

> At first glance these patches seem reasonable. I don't see any problem
> with the ioctls you have added.
>
> That said I have a question. Should we provide a more direct way to
> find the answer to your question? Something like the access system
> call?
>
> I think a more direct answer would be more maintainable in the long run
> as it does not bind tools to specific implementation details in the
> future. Which could allow us to account for LSM policies and the like.

My thoughts:

1. Regarding NS_GET_NSTYPE... It always struck me as a little odd
that you could ask setns() to check if the supplied FD referred
to a certain type of NS (and thus, in a round about way, setns()
gives us the same information as NS_GET_NSTYPE), but you can't
directly ask what the NS type is. The fact that setns() has this
facility suggests that there could be other uses for the operation
"tell me what type of NS this FD refers to".

2. Regarding NS_GET_CREATOR_UID... There are defined rules about what
this UID means with respect to capabilities in a namespace. It's
not an implementation detail, as I understand it. Also in terms of
introspecting to try to understand the structure of namespaces on
a running system, knowing this UID is useful in and of itself.

3. NS_GET_NSTYPE and NS_GET_CREATOR_UID solve my problem, but
obviously your idea would make life simpler for user space.
Am I correct to understand that you mean an API that takes
three pieces of info: a PID, a capability, and an fd referring
to a /proc/PID/ns/xxx, and tells us whether PID has the specified
capability for operations in the specified namespace?

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-12-20 20:25:37

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/2] Add further ioctl() operations for namespace discovery

"Michael Kerrisk (man-pages)" <[email protected]> writes:

> Hello Eric,
>
> On 12/19/2016 11:53 PM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>
>>> Eric,
>>>
>>> The code proposed in this patch series is pretty small. Is there any
>>> chance we could make the 4.10 merge window, if the changes seem
>>> acceptable to you?
>>
>> I see why you are asking but I am not comfortable with aiming for
>> the merge window that is on-going and could close at any moment.
>> I have seen recenly too many patches that should work fine have
>> some odd minor issue. Like an extra _ in a label used in an ifdef
>> that resulted in memory stomps. Linus might be more brave but i would
>> rather wait until the next merge window, so I don't need to worry about
>> spoiling anyone's holidays with a typo someone over looked.
>
> I'll just gently ask if you'll reconsider and take another look at the
> patches. They patches are very small, and don't change any existing
> behavior. And if we see a problem in the next weeks they could be pulled.
> In the meantime, I'd be aiming to publicize this API somewhat, so that we
> might get some eyeballs to spot design bugs. But, I do understand your
> position, if the answer is still "not for this merge window".

My position is still not this merge window. I am more than happy to
queue up the changes for the next one. Even on the best of days there
is a reasonable chance Linus would not be happy to receive code
development done in the merge window.

I think there is also just a little bit of discussion that needs
to happen with these new userspace APIs (below). And I have seen way
too many times user space APIs added too quickly and having to be
repaired afterwards.

>> At first glance these patches seem reasonable. I don't see any problem
>> with the ioctls you have added.
>>
>> That said I have a question. Should we provide a more direct way to
>> find the answer to your question? Something like the access system
>> call?
>>
>> I think a more direct answer would be more maintainable in the long run
>> as it does not bind tools to specific implementation details in the
>> future. Which could allow us to account for LSM policies and the like.
>
> My thoughts:
>
> 1. Regarding NS_GET_NSTYPE... It always struck me as a little odd
> that you could ask setns() to check if the supplied FD referred
> to a certain type of NS (and thus, in a round about way, setns()
> gives us the same information as NS_GET_NSTYPE), but you can't
> directly ask what the NS type is. The fact that setns() has this
> facility suggests that there could be other uses for the operation
> "tell me what type of NS this FD refers to".

Yes. I have no problem with that one.

> 2. Regarding NS_GET_CREATOR_UID... There are defined rules about what
> this UID means with respect to capabilities in a namespace. It's
> not an implementation detail, as I understand it. Also in terms of
> introspecting to try to understand the structure of namespaces on
> a running system, knowing this UID is useful in and of itself.

I am not quite sold on the name NS_GET_CREATOR_UID. NS_GET_OWNER_UID
seems to match the code better. The owner is the creator but
the important part seems to be the ownership not the act of creation.

> 3. NS_GET_NSTYPE and NS_GET_CREATOR_UID solve my problem, but
> obviously your idea would make life simpler for user space.
> Am I correct to understand that you mean an API that takes
> three pieces of info: a PID, a capability, and an fd referring
> to a /proc/PID/ns/xxx, and tells us whether PID has the specified
> capability for operations in the specified namespace?

Something like that. But yes something we can wire up to
ns_capable_noaudit and be told the result. That will let the
LSMs and any future kerel changes have their say, without any extra
maintenance burden in the kernel.

What I really don't want is for userspace to start depending on the
current formula being the only factors that say if it has a capabliltiy
in a certain situation because in practice that just isn't true.
Permission checks just keep evoloving in the kernel.

Eric

Subject: Re: [PATCH 0/2] Add further ioctl() operations for namespace discovery

Hi Eric,

On 12/20/2016 09:22 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>
>> Hello Eric,
>>
>> On 12/19/2016 11:53 PM, Eric W. Biederman wrote:
>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>
>>>> Eric,
>>>>
>>>> The code proposed in this patch series is pretty small. Is there any
>>>> chance we could make the 4.10 merge window, if the changes seem
>>>> acceptable to you?
>>>
>>> I see why you are asking but I am not comfortable with aiming for
>>> the merge window that is on-going and could close at any moment.
>>> I have seen recenly too many patches that should work fine have
>>> some odd minor issue. Like an extra _ in a label used in an ifdef
>>> that resulted in memory stomps. Linus might be more brave but i would
>>> rather wait until the next merge window, so I don't need to worry about
>>> spoiling anyone's holidays with a typo someone over looked.
>>
>> I'll just gently ask if you'll reconsider and take another look at the
>> patches. They patches are very small, and don't change any existing
>> behavior. And if we see a problem in the next weeks they could be pulled.
>> In the meantime, I'd be aiming to publicize this API somewhat, so that we
>> might get some eyeballs to spot design bugs. But, I do understand your
>> position, if the answer is still "not for this merge window".
>
> My position is still not this merge window. I am more than happy to
> queue up the changes for the next one. Even on the best of days there
> is a reasonable chance Linus would not be happy to receive code
> development done in the merge window.

Okay. So, I can at least think about this at leisure! (Actually,
I think I really do mean: thanks for saying "no" again.)

> I think there is also just a little bit of discussion that needs
> to happen with these new userspace APIs (below). And I have seen way
> too many times user space APIs added too quickly and having to be
> repaired afterwards.

Yes, I certainly understand that.

>>> At first glance these patches seem reasonable. I don't see any problem
>>> with the ioctls you have added.
>>>
>>> That said I have a question. Should we provide a more direct way to
>>> find the answer to your question? Something like the access system
>>> call?
>>>
>>> I think a more direct answer would be more maintainable in the long run
>>> as it does not bind tools to specific implementation details in the
>>> future. Which could allow us to account for LSM policies and the like.
>>
>> My thoughts:
>>
>> 1. Regarding NS_GET_NSTYPE... It always struck me as a little odd
>> that you could ask setns() to check if the supplied FD referred
>> to a certain type of NS (and thus, in a round about way, setns()
>> gives us the same information as NS_GET_NSTYPE), but you can't
>> directly ask what the NS type is. The fact that setns() has this
>> facility suggests that there could be other uses for the operation
>> "tell me what type of NS this FD refers to".
>
> Yes. I have no problem with that one.
>
>> 2. Regarding NS_GET_CREATOR_UID... There are defined rules about what
>> this UID means with respect to capabilities in a namespace. It's
>> not an implementation detail, as I understand it. Also in terms of
>> introspecting to try to understand the structure of namespaces on
>> a running system, knowing this UID is useful in and of itself.
>
> I am not quite sold on the name NS_GET_CREATOR_UID. NS_GET_OWNER_UID
> seems to match the code better. The owner is the creator but
> the important part seems to be the ownership not the act of creation.

I actually thought about NS_GET_OWNER, but shied away from it
because it had echoes of "get owning userns". NS_GET_OWNER_UID
is better than NS_GET_OWNER, and certainly not worse than
NS_GET_CREATOR_UID.

>> 3. NS_GET_NSTYPE and NS_GET_CREATOR_UID solve my problem, but
>> obviously your idea would make life simpler for user space.
>> Am I correct to understand that you mean an API that takes
>> three pieces of info: a PID, a capability, and an fd referring
>> to a /proc/PID/ns/xxx, and tells us whether PID has the specified
>> capability for operations in the specified namespace?
>
> Something like that. But yes something we can wire up to
> ns_capable_noaudit and be told the result.

Yes, that was my line of thinking also. It seems to me that to
prevent information leaks, we also should check that the caller
has some suitable capability in the target namespace, right?
(I presume a ptrace_may_access() check.)

> That will let the
> LSMs and any future kerel changes have their say, without any extra
> maintenance burden in the kernel.

Yes.

> What I really don't want is for userspace to start depending on the
> current formula being the only factors that say if it has a capabliltiy
> in a certain situation because in practice that just isn't true.
> Permission checks just keep evoloving in the kernel.

This was the bit I hadn't really considered when I first started down
this path, but I started to see the light a bit already today, but
didn't have it so crisply in my mind as you just said it there.

So, we have two ioctls already in 4.9, I proposed two more. And
then we have this fifth operation. Should we have an nsctl(2)?

In the meantime, here's something I hacked together. I know it
needs work, but I just want to check whether it's the direction
that you were meaning in terms of the checks. It's done as an
ioctl() (structure hard coded in place while I play about, and
some names and types should certainly be better). Leaving aside
the messy bits, is the below roughly the kind of checking you
expect to be embodied in this operation?

Cheers,

Michael

diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h
index b3c6c78..88b7d78 100644
--- a/include/uapi/linux/nsfs.h
+++ b/include/uapi/linux/nsfs.h
@@ -14,5 +14,7 @@
#define NS_GET_NSTYPE _IO(NSIO, 0x3)
/* Get creator UID for a user namespace */
#define NS_GET_CREATOR_UID _IO(NSIO, 0x4)
+/* Test whether a process has a capability in a namespace */
+#define NS_CAPABLE _IO(NSIO, 0x5)

#endif /* __LINUX_NSFS_H */
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -4,6 +4,8 @@
#include <linux/proc_ns.h>
#include <linux/magic.h>
#include <linux/ktime.h>
+#include <linux/ptrace.h>
+#include <asm/uaccess.h>
#include <linux/seq_file.h>
#include <linux/user_namespace.h>
#include <linux/nsfs.h>
@@ -160,6 +162,38 @@ static int open_related_ns(struct ns_common *ns,
return fd;
}

+static long check_capable(struct user_namespace *user_ns, unsigned long arg)
+{
+ struct cc_param {
+ long pid;
+ long capability;
+ };
+ struct cc_param __user *cc_par_p = (void __user *)arg;
+ struct cc_param cc_par;
+ struct task_struct *p;
+ int retval;
+
+ retval = copy_from_user(&cc_par, cc_par_p, sizeof(cc_par));
+ if (retval)
+ return -EFAULT;
+
+ rcu_read_lock();
+
+ retval = -ESRCH;
+ p = find_task_by_vpid(cc_par.pid);
+ if (!p)
+ goto out;
+
+ retval = -EPERM;
+ if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS))
+ goto out;
+
+ retval = has_ns_capability_noaudit(p, user_ns, cc_par.capability);
+out:
+ rcu_read_unlock();
+ return retval;
+}
+
static long ns_ioctl(struct file *filp, unsigned int ioctl,
unsigned long arg)
{
@@ -180,6 +214,12 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl,
return -EINVAL;
user_ns = container_of(ns, struct user_namespace, ns);
return from_kuid_munged(current_user_ns(), user_ns->owner);
+ case NS_CAPABLE:
+ if (ns->ops->type == CLONE_NEWUSER)
+ user_ns = container_of(ns, struct user_namespace, ns);
+ else
+ user_ns = (*ns->ops->owner)(ns);
+ return check_capable(user_ns, arg);
default:
return -ENOTTY;
}


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-12-21 00:20:24

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/2] Add further ioctl() operations for namespace discovery

"Michael Kerrisk (man-pages)" <[email protected]> writes:

> Hi Eric,
>
> On 12/20/2016 09:22 PM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>
>>> Hello Eric,
>>>
>>> On 12/19/2016 11:53 PM, Eric W. Biederman wrote:
>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>
>>>>> Eric,
>>>>>
>>>>> The code proposed in this patch series is pretty small. Is there any
>>>>> chance we could make the 4.10 merge window, if the changes seem
>>>>> acceptable to you?
>>>>
>>>> I see why you are asking but I am not comfortable with aiming for
>>>> the merge window that is on-going and could close at any moment.
>>>> I have seen recenly too many patches that should work fine have
>>>> some odd minor issue. Like an extra _ in a label used in an ifdef
>>>> that resulted in memory stomps. Linus might be more brave but i would
>>>> rather wait until the next merge window, so I don't need to worry about
>>>> spoiling anyone's holidays with a typo someone over looked.
>>>
>>> I'll just gently ask if you'll reconsider and take another look at the
>>> patches. They patches are very small, and don't change any existing
>>> behavior. And if we see a problem in the next weeks they could be pulled.
>>> In the meantime, I'd be aiming to publicize this API somewhat, so that we
>>> might get some eyeballs to spot design bugs. But, I do understand your
>>> position, if the answer is still "not for this merge window".
>>
>> My position is still not this merge window. I am more than happy to
>> queue up the changes for the next one. Even on the best of days there
>> is a reasonable chance Linus would not be happy to receive code
>> development done in the merge window.
>
> Okay. So, I can at least think about this at leisure! (Actually,
> I think I really do mean: thanks for saying "no" again.)
>
>> I think there is also just a little bit of discussion that needs
>> to happen with these new userspace APIs (below). And I have seen way
>> too many times user space APIs added too quickly and having to be
>> repaired afterwards.
>
> Yes, I certainly understand that.
>
>>>> At first glance these patches seem reasonable. I don't see any problem
>>>> with the ioctls you have added.
>>>>
>>>> That said I have a question. Should we provide a more direct way to
>>>> find the answer to your question? Something like the access system
>>>> call?
>>>>
>>>> I think a more direct answer would be more maintainable in the long run
>>>> as it does not bind tools to specific implementation details in the
>>>> future. Which could allow us to account for LSM policies and the like.
>>>
>>> My thoughts:
>>>
>>> 1. Regarding NS_GET_NSTYPE... It always struck me as a little odd
>>> that you could ask setns() to check if the supplied FD referred
>>> to a certain type of NS (and thus, in a round about way, setns()
>>> gives us the same information as NS_GET_NSTYPE), but you can't
>>> directly ask what the NS type is. The fact that setns() has this
>>> facility suggests that there could be other uses for the operation
>>> "tell me what type of NS this FD refers to".
>>
>> Yes. I have no problem with that one.
>>
>>> 2. Regarding NS_GET_CREATOR_UID... There are defined rules about what
>>> this UID means with respect to capabilities in a namespace. It's
>>> not an implementation detail, as I understand it. Also in terms of
>>> introspecting to try to understand the structure of namespaces on
>>> a running system, knowing this UID is useful in and of itself.
>>
>> I am not quite sold on the name NS_GET_CREATOR_UID. NS_GET_OWNER_UID
>> seems to match the code better. The owner is the creator but
>> the important part seems to be the ownership not the act of creation.
>
> I actually thought about NS_GET_OWNER, but shied away from it
> because it had echoes of "get owning userns". NS_GET_OWNER_UID
> is better than NS_GET_OWNER, and certainly not worse than
> NS_GET_CREATOR_UID.
>
>>> 3. NS_GET_NSTYPE and NS_GET_CREATOR_UID solve my problem, but
>>> obviously your idea would make life simpler for user space.
>>> Am I correct to understand that you mean an API that takes
>>> three pieces of info: a PID, a capability, and an fd referring
>>> to a /proc/PID/ns/xxx, and tells us whether PID has the specified
>>> capability for operations in the specified namespace?
>>
>> Something like that. But yes something we can wire up to
>> ns_capable_noaudit and be told the result.
>
> Yes, that was my line of thinking also. It seems to me that to
> prevent information leaks, we also should check that the caller
> has some suitable capability in the target namespace, right?
> (I presume a ptrace_may_access() check.)

Well over the target process but yes.

>> That will let the
>> LSMs and any future kerel changes have their say, without any extra
>> maintenance burden in the kernel.
>
> Yes.
>
>> What I really don't want is for userspace to start depending on the
>> current formula being the only factors that say if it has a capabliltiy
>> in a certain situation because in practice that just isn't true.
>> Permission checks just keep evoloving in the kernel.
>
> This was the bit I hadn't really considered when I first started down
> this path, but I started to see the light a bit already today, but
> didn't have it so crisply in my mind as you just said it there.
>
> So, we have two ioctls already in 4.9, I proposed two more. And
> then we have this fifth operation. Should we have an nsctl(2)?

I would rather move the other direction and have a system call.

> In the meantime, here's something I hacked together. I know it
> needs work, but I just want to check whether it's the direction
> that you were meaning in terms of the checks. It's done as an
> ioctl() (structure hard coded in place while I play about, and
> some names and types should certainly be better). Leaving aside
> the messy bits, is the below roughly the kind of checking you
> expect to be embodied in this operation?

Yes. It probably nees u32 instead of long, or eles we need to have
a compat version for 32bit OS's.

Now the question becomes who are the users of this? Because it just
occurred to me that we now have an interesting complication. Userspace
extending the meaning of the capability bits, and using to protect
additional things. Ugh. That could be a maintenance problem of another
flavor. Definitely not my favorite.

The access system is limited to asking about yourself, and to
asking very specific questions. If our new operation did something
similar and only allowed asking about yourself, and a capablity I would
find it less scary, and I am wondering if it would be possible to ask
not about a capability but an operation that the capability guards
such as chroot.

Implementing target operations instead of target capabilities would be a
bit trickier to implement (as it requires factoring out the permission
checks) but it may be much more useful in the long run.

So why are we asking the questions about what permissions a process has?

Eric

Subject: Re: [PATCH 0/2] Add further ioctl() operations for namespace discovery

Hi Eric,

On 12/21/2016 01:17 AM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>
>> Hi Eric,
>>
>> On 12/20/2016 09:22 PM, Eric W. Biederman wrote:
>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>
>>>> Hello Eric,
>>>>
>>>> On 12/19/2016 11:53 PM, Eric W. Biederman wrote:
>>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>>
>>>>>> Eric,

[...]

>>>> 3. NS_GET_NSTYPE and NS_GET_CREATOR_UID solve my problem, but
>>>> obviously your idea would make life simpler for user space.
>>>> Am I correct to understand that you mean an API that takes
>>>> three pieces of info: a PID, a capability, and an fd referring
>>>> to a /proc/PID/ns/xxx, and tells us whether PID has the specified
>>>> capability for operations in the specified namespace?
>>>
>>> Something like that. But yes something we can wire up to
>>> ns_capable_noaudit and be told the result.
>>
>> Yes, that was my line of thinking also. It seems to me that to
>> prevent information leaks, we also should check that the caller
>> has some suitable capability in the target namespace, right?
>> (I presume a ptrace_may_access() check.)
>
> Well over the target process but yes.
>
>>> That will let the
>>> LSMs and any future kerel changes have their say, without any extra
>>> maintenance burden in the kernel.
>>
>> Yes.
>>
>>> What I really don't want is for userspace to start depending on the
>>> current formula being the only factors that say if it has a capabliltiy
>>> in a certain situation because in practice that just isn't true.
>>> Permission checks just keep evoloving in the kernel.
>>
>> This was the bit I hadn't really considered when I first started down
>> this path, but I started to see the light a bit already today, but
>> didn't have it so crisply in my mind as you just said it there.
>>
>> So, we have two ioctls already in 4.9, I proposed two more. And
>> then we have this fifth operation. Should we have an nsctl(2)?
>
> I would rather move the other direction and have a system call.

Okay -- I'll give that a some thought.

>> In the meantime, here's something I hacked together. I know it
>> needs work, but I just want to check whether it's the direction
>> that you were meaning in terms of the checks. It's done as an
>> ioctl() (structure hard coded in place while I play about, and
>> some names and types should certainly be better). Leaving aside
>> the messy bits, is the below roughly the kind of checking you
>> expect to be embodied in this operation?
>
> Yes. It probably nees u32 instead of long, or eles we need to have
> a compat version for 32bit OS's.

Yes.

> Now the question becomes who are the users of this? Because it just
> occurred to me that we now have an interesting complication. Userspace
> extending the meaning of the capability bits, and using to protect
> additional things. Ugh. That could be a maintenance problem of another
> flavor. Definitely not my favorite.

I don't follow you here. Could you say some more about what you mean?

> The access system is limited to asking about yourself, and to
> asking very specific questions. If our new operation did something
> similar and only allowed asking about yourself, and a capablity I would
> find it less scary,

Okay. But that's a less interesting operation from my point of view.
I mean: one way of knowing if we have permission to do an operation
is to try to do it.

> and I am wondering if it would be possible to ask
> not about a capability but an operation that the capability guards
> such as chroot.
>
> Implementing target operations instead of target capabilities would be a
> bit trickier to implement (as it requires factoring out the permission
> checks) but it may be much more useful in the long run.

But, would this not mean factoring out the permission checks on a per
operation basis? (There are of courses hundreds of such checks.)

> So why are we asking the questions about what permissions a process has?

My main interest here is monitoring/discovery/debugging on a running
system. NS_GET_PARENT, NS_GET_USERNS, NS_GET_CREATOR_UID, and
NS_GET_NSTYPE provide most of what I'd like to see. Being able to ask
"does this process have permissions in that namespace?" would be nice
to have in terms of understanding/debugging a system.

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-12-22 01:16:35

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/2] Add further ioctl() operations for namespace discovery

"Michael Kerrisk (man-pages)" <[email protected]> writes:

> Hi Eric,
>
> On 12/21/2016 01:17 AM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>
>>> Hi Eric,
>>>
>>> On 12/20/2016 09:22 PM, Eric W. Biederman wrote:
>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>
>>>>> Hello Eric,
>>>>>
>>>>> On 12/19/2016 11:53 PM, Eric W. Biederman wrote:
>>>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>>>

>> Now the question becomes who are the users of this? Because it just
>> occurred to me that we now have an interesting complication. Userspace
>> extending the meaning of the capability bits, and using to protect
>> additional things. Ugh. That could be a maintenance problem of another
>> flavor. Definitely not my favorite.
>
> I don't follow you here. Could you say some more about what you mean?

I have seen user space userspace do thing such as extend CAP_SYS_REBOOT
to things such as permission to invoke "shutdown -r now". Which
depending on what a clean reboot entails could be greately increasing
the scope of CAP_SYS_REBOOT.

I am concerned for that and similar situations that userspace
applications could lead us into situation that one wrong decision could
wind up being an unfixable mistake because fixing the mistake would
break userspsace.

>> So why are we asking the questions about what permissions a process has?
>
> My main interest here is monitoring/discovery/debugging on a running
> system. NS_GET_PARENT, NS_GET_USERNS, NS_GET_CREATOR_UID, and
> NS_GET_NSTYPE provide most of what I'd like to see. Being able to ask
> "does this process have permissions in that namespace?" would be nice
> to have in terms of understanding/debugging a system.

If we are just looking at explanations then I seem to have been
over-engineering things. So let's just aim at the two ioctls.
Or at least the information in those ioctls.

With at least a comment on the ioctl returning the OWNER_UID that
describes why it is not a problem to if the owners uid is something like
((uid_t)-3). Which overlaps with the space for error return codes.

I don't know if we are fine or not, but that review comment definitely
deserves some consideration.

Eric

Subject: Re: [PATCH 0/2] Add further ioctl() operations for namespace discovery

Hi Eric,

On 12/22/2016 01:27 AM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>
>> Hi Eric,
>>
>> On 12/21/2016 01:17 AM, Eric W. Biederman wrote:
>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>
>>>> Hi Eric,
>>>>
>>>> On 12/20/2016 09:22 PM, Eric W. Biederman wrote:
>>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>>
>>>>>> Hello Eric,
>>>>>>
>>>>>> On 12/19/2016 11:53 PM, Eric W. Biederman wrote:
>>>>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>>>>
>
>>> Now the question becomes who are the users of this? Because it just
>>> occurred to me that we now have an interesting complication. Userspace
>>> extending the meaning of the capability bits, and using to protect
>>> additional things. Ugh. That could be a maintenance problem of another
>>> flavor. Definitely not my favorite.
>>
>> I don't follow you here. Could you say some more about what you mean?
>
> I have seen user space userspace do thing such as extend CAP_SYS_REBOOT
> to things such as permission to invoke "shutdown -r now". Which
> depending on what a clean reboot entails could be greately increasing
> the scope of CAP_SYS_REBOOT.
>
> I am concerned for that and similar situations that userspace
> applications could lead us into situation that one wrong decision could
> wind up being an unfixable mistake because fixing the mistake would
> break userspsace.

Okay.

>>> So why are we asking the questions about what permissions a process has?
>>
>> My main interest here is monitoring/discovery/debugging on a running
>> system. NS_GET_PARENT, NS_GET_USERNS, NS_GET_CREATOR_UID, and
>> NS_GET_NSTYPE provide most of what I'd like to see. Being able to ask
>> "does this process have permissions in that namespace?" would be nice
>> to have in terms of understanding/debugging a system.
>
> If we are just looking at explanations then I seem to have been
> over-engineering things. So let's just aim at the two ioctls.
> Or at least the information in those ioctls.

Okay.

> With at least a comment on the ioctl returning the OWNER_UID that
> describes why it is not a problem to if the owners uid is something like
> ((uid_t)-3). Which overlaps with the space for error return codes.
>
> I don't know if we are fine or not, but that review comment definitely
> deserves some consideration.


See my reply just sent to Andrei. We should instead then just return
the UID via a buffer pointed to by the ioctl() argument:

ioctl(fd, NS_GET_OWNER_UID, &uid);

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-12-22 10:33:27

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/2] Add further ioctl() operations for namespace discovery

"Michael Kerrisk (man-pages)" <[email protected]> writes:

> Hi Eric,
>
> On 12/22/2016 01:27 AM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>
>>> Hi Eric,
>>>
>>> On 12/21/2016 01:17 AM, Eric W. Biederman wrote:
>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>
>>>>> Hi Eric,
>>>>>
>>>>> On 12/20/2016 09:22 PM, Eric W. Biederman wrote:
>>>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>>>
>>>>>>> Hello Eric,
>>>>>>>
>>>>>>> On 12/19/2016 11:53 PM, Eric W. Biederman wrote:
>>>>>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>>>>>
>>
>>>> Now the question becomes who are the users of this? Because it just
>>>> occurred to me that we now have an interesting complication. Userspace
>>>> extending the meaning of the capability bits, and using to protect
>>>> additional things. Ugh. That could be a maintenance problem of another
>>>> flavor. Definitely not my favorite.
>>>
>>> I don't follow you here. Could you say some more about what you mean?
>>
>> I have seen user space userspace do thing such as extend CAP_SYS_REBOOT
>> to things such as permission to invoke "shutdown -r now". Which
>> depending on what a clean reboot entails could be greately increasing
>> the scope of CAP_SYS_REBOOT.
>>
>> I am concerned for that and similar situations that userspace
>> applications could lead us into situation that one wrong decision could
>> wind up being an unfixable mistake because fixing the mistake would
>> break userspsace.
>
> Okay.
>
>>>> So why are we asking the questions about what permissions a process has?
>>>
>>> My main interest here is monitoring/discovery/debugging on a running
>>> system. NS_GET_PARENT, NS_GET_USERNS, NS_GET_CREATOR_UID, and
>>> NS_GET_NSTYPE provide most of what I'd like to see. Being able to ask
>>> "does this process have permissions in that namespace?" would be nice
>>> to have in terms of understanding/debugging a system.
>>
>> If we are just looking at explanations then I seem to have been
>> over-engineering things. So let's just aim at the two ioctls.
>> Or at least the information in those ioctls.
>
> Okay.
>
>> With at least a comment on the ioctl returning the OWNER_UID that
>> describes why it is not a problem to if the owners uid is something like
>> ((uid_t)-3). Which overlaps with the space for error return codes.
>>
>> I don't know if we are fine or not, but that review comment definitely
>> deserves some consideration.
>
>
> See my reply just sent to Andrei. We should instead then just return
> the UID via a buffer pointed to by the ioctl() argument:
>
> ioctl(fd, NS_GET_OWNER_UID, &uid);

That will work without problem. Especially as unsigned int is the same
on both 32bit and 64bit so we won't need a compat ioctl.

Eric