2017-12-09 10:29:38

by Mickaël Salaün

[permalink] [raw]
Subject: Re: RFC(v2): Audit Kernel Container IDs


On 12/10/2017 18:33, Casey Schaufler wrote:
> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>> Containers are a userspace concept. The kernel knows nothing of them.
>>
>> The Linux audit system needs a way to be able to track the container
>> provenance of events and actions. Audit needs the kernel's help to do
>> this.
>>
>> Since the concept of a container is entirely a userspace concept, a
>> registration from the userspace container orchestration system initiates
>> this. This will define a point in time and a set of resources
>> associated with a particular container with an audit container ID.
>>
>> The registration is a pseudo filesystem (proc, since PID tree already
>> exists) write of a u8[16] UUID representing the container ID to a file
>> representing a process that will become the first process in a new
>> container. This write might place restrictions on mount namespaces
>> required to define a container, or at least careful checking of
>> namespaces in the kernel to verify permissions of the orchestrator so it
>> can't change its own container ID. A bind mount of nsfs may be
>> necessary in the container orchestrator's mntNS.
>> Note: Use a 128-bit scalar rather than a string to make compares faster
>> and simpler.
>>
>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>> registration.
>
> Hang on. If containers are a user space concept, how can
> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> a container, how can you be asking for a capability to manage
> them?
>
>> At that time, record the target container's user-supplied
>> container identifier along with the target container's first process
>> (which may become the target container's "init" process) process ID
>> (referenced from the initial PID namespace), all namespace IDs (in the
>> form of a nsfs device number and inode number tuple) in a new auxilliary
>> record AUDIT_CONTAINER with a qualifying op=$action field.

Here is an idea to avoid privilege problems or the need for a new
capability: make it automatic. What makes a container a container seems
to be the use of at least a namespace. What about automatically create
and assign an ID to a process when it enters a namespace different than
one of its parent process? This delegates the (permission)
responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

One interesting side effect of this approach would be to be able to
identify which processes are in the same set of namespaces, even if not
spawn from the container but entered after its creation (i.e. using
setns), by creating container IDs as a (deterministic) checksum from the
/proc/self/ns/* IDs.

Since the concern is to identify a container, I think the ability to
audit the switch from one container ID to another is enough. I don't
think we need nested IDs.

As a side note, you may want to take a look at the Linux-VServer's XID.

Regards,
Micka?l


Attachments:
signature.asc (488.00 B)
OpenPGP digital signature

2017-12-09 18:28:17

by Casey Schaufler

[permalink] [raw]
Subject: Re: RFC(v2): Audit Kernel Container IDs

On 12/9/2017 2:20 AM, Micka�l Sala�n wrote:
> On 12/10/2017 18:33, Casey Schaufler wrote:
>> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>>> Containers are a userspace concept. The kernel knows nothing of them.
>>>
>>> The Linux audit system needs a way to be able to track the container
>>> provenance of events and actions. Audit needs the kernel's help to do
>>> this.
>>>
>>> Since the concept of a container is entirely a userspace concept, a
>>> registration from the userspace container orchestration system initiates
>>> this. This will define a point in time and a set of resources
>>> associated with a particular container with an audit container ID.
>>>
>>> The registration is a pseudo filesystem (proc, since PID tree already
>>> exists) write of a u8[16] UUID representing the container ID to a file
>>> representing a process that will become the first process in a new
>>> container. This write might place restrictions on mount namespaces
>>> required to define a container, or at least careful checking of
>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>> can't change its own container ID. A bind mount of nsfs may be
>>> necessary in the container orchestrator's mntNS.
>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>> and simpler.
>>>
>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>> registration.
>> Hang on. If containers are a user space concept, how can
>> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
>> a container, how can you be asking for a capability to manage
>> them?
>>
>>> At that time, record the target container's user-supplied
>>> container identifier along with the target container's first process
>>> (which may become the target container's "init" process) process ID
>>> (referenced from the initial PID namespace), all namespace IDs (in the
>>> form of a nsfs device number and inode number tuple) in a new auxilliary
>>> record AUDIT_CONTAINER with a qualifying op=$action field.
> Here is an idea to avoid privilege problems or the need for a new
> capability: make it automatic. What makes a container a container seems
> to be the use of at least a namespace.

You might think so, but I am assured that you can have a container
without using namespaces. Intel's "Clear Containers", which use
virtualization technology, are one example. I have considered creating
"Smack Containers" using mandatory access control technology, more
to press the point that "containers" is a marketing concept, not
technology.

> What about automatically create
> and assign an ID to a process when it enters a namespace different than
> one of its parent process? This delegates the (permission)
> responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

That gets ugly when you have a container that uses user, filesystem,
network and whatever else namespaces. If all containers used the same
set of namespaces I think this would be a fine idea, but they don't.

> One interesting side effect of this approach would be to be able to
> identify which processes are in the same set of namespaces, even if not
> spawn from the container but entered after its creation (i.e. using
> setns), by creating container IDs as a (deterministic) checksum from the
> /proc/self/ns/* IDs.
>
> Since the concern is to identify a container, I think the ability to
> audit the switch from one container ID to another is enough. I don't
> think we need nested IDs.

Because a container doesn't have to use namespaces to be a container
you still need a mechanism for a process to declare that it is in fact
in a container, and to identify the container.

>
> As a side note, you may want to take a look at the Linux-VServer's XID.
>
> Regards,
> Micka�l
>

2017-12-11 15:12:57

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: RFC(v2): Audit Kernel Container IDs

On 2017-12-09 11:20, Micka?l Sala?n wrote:
>
> On 12/10/2017 18:33, Casey Schaufler wrote:
> > On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> >> Containers are a userspace concept. The kernel knows nothing of them.
> >>
> >> The Linux audit system needs a way to be able to track the container
> >> provenance of events and actions. Audit needs the kernel's help to do
> >> this.
> >>
> >> Since the concept of a container is entirely a userspace concept, a
> >> registration from the userspace container orchestration system initiates
> >> this. This will define a point in time and a set of resources
> >> associated with a particular container with an audit container ID.
> >>
> >> The registration is a pseudo filesystem (proc, since PID tree already
> >> exists) write of a u8[16] UUID representing the container ID to a file
> >> representing a process that will become the first process in a new
> >> container. This write might place restrictions on mount namespaces
> >> required to define a container, or at least careful checking of
> >> namespaces in the kernel to verify permissions of the orchestrator so it
> >> can't change its own container ID. A bind mount of nsfs may be
> >> necessary in the container orchestrator's mntNS.
> >> Note: Use a 128-bit scalar rather than a string to make compares faster
> >> and simpler.
> >>
> >> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> >> registration.
> >
> > Hang on. If containers are a user space concept, how can
> > you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> > a container, how can you be asking for a capability to manage
> > them?
> >
> >> At that time, record the target container's user-supplied
> >> container identifier along with the target container's first process
> >> (which may become the target container's "init" process) process ID
> >> (referenced from the initial PID namespace), all namespace IDs (in the
> >> form of a nsfs device number and inode number tuple) in a new auxilliary
> >> record AUDIT_CONTAINER with a qualifying op=$action field.
>
> Here is an idea to avoid privilege problems or the need for a new
> capability: make it automatic. What makes a container a container seems
> to be the use of at least a namespace. What about automatically create
> and assign an ID to a process when it enters a namespace different than
> one of its parent process? This delegates the (permission)
> responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

A container doesn't imply a namespace and vice versa.

> One interesting side effect of this approach would be to be able to
> identify which processes are in the same set of namespaces, even if not
> spawn from the container but entered after its creation (i.e. using
> setns), by creating container IDs as a (deterministic) checksum from the
> /proc/self/ns/* IDs.

This would be really helpful, but it isn't the case.

> Since the concern is to identify a container, I think the ability to
> audit the switch from one container ID to another is enough. I don't
> think we need nested IDs.

Since container namespace membership is arbitrary between container
orchestrators, this needs a registration process and a way for the
container orchestrator to know the ID.


I completely agree with Casey here.

> As a side note, you may want to take a look at the Linux-VServer's XID.
>
> Regards,
> Micka?l

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2017-12-11 16:31:17

by Eric Paris

[permalink] [raw]
Subject: Re: RFC(v2): Audit Kernel Container IDs

On Sat, 2017-12-09 at 10:28 -0800, Casey Schaufler wrote:
> On 12/9/2017 2:20 AM, Micka�l Sala�n wrote:

> > What about automatically create
> > and assign an ID to a process when it enters a namespace different
> > than
> > one of its parent process? This delegates the (permission)
> > responsibility to the use of namespaces (e.g. /proc/sys/user/max_*
> > limit).
>
> That gets ugly when you have a container that uses user, filesystem,
> network and whatever else namespaces. If all containers used the same
> set of namespaces I think this would be a fine idea, but they don't.
>
> > One interesting side effect of this approach would be to be able to
> > identify which processes are in the same set of namespaces, even if
> > not
> > spawn from the container but entered after its creation (i.e. using
> > setns), by creating container IDs as a (deterministic) checksum
> > from the
> > /proc/self/ns/* IDs.
> >
> > Since the concern is to identify a container, I think the ability
> > to
> > audit the switch from one container ID to another is enough. I
> > don't
> > think we need nested IDs.
>
> Because a container doesn't have to use namespaces to be a container
> you still need a mechanism for a process to declare that it is in
> fact
> in a container, and to identify the container.

I like the idea but I'm still tossing it around in my head (and
thinking about Casey's statement too). Lets say we have a 'docker-like'
container with pid=100 netns=X,userns=Y,mountns=Z. If I'm on the host
in all init namespaces and I run
nsenter -t 100 -n ip link set eth0 promisc on
How should this be logged? Did this command run in it's own 'container'
unrelated to the 'docker-like' container?

-Eric

2017-12-11 16:52:50

by Casey Schaufler

[permalink] [raw]
Subject: Re: RFC(v2): Audit Kernel Container IDs

On 12/11/2017 8:30 AM, Eric Paris wrote:
> On Sat, 2017-12-09 at 10:28 -0800, Casey Schaufler wrote:
>> Because a container doesn't have to use namespaces to be a container
>> you still need a mechanism for a process to declare that it is in
>> fact
>> in a container, and to identify the container.
> I like the idea but I'm still tossing it around in my head (and
> thinking about Casey's statement too). Lets say we have a 'docker-like'
> container with pid=100 netns=X,userns=Y,mountns=Z. If I'm on the host
> in all init namespaces and I run
> nsenter -t 100 -n ip link set eth0 promisc on
> How should this be logged? Did this command run in it's own 'container'
> unrelated to the 'docker-like' container?

Jose Bollo's PTAGS ( https://gitlab.com/jobol/ptags ) would be
prefect. Any time you declare something to be a container or
enter a namespace you slap a tag on it. Identifying nested
containers would be easy, you'd have multiple tags.

PTAGS unfortunately needs module stacking, but how hard could that be?


> -Eric

2017-12-11 19:37:18

by Steve Grubb

[permalink] [raw]
Subject: Re: RFC(v2): Audit Kernel Container IDs

On Monday, December 11, 2017 11:30:57 AM EST Eric Paris wrote:
> > Because a container doesn't have to use namespaces to be a container
> > you still need a mechanism for a process to declare that it is in
> > fact
> > in a container, and to identify the container.
>
> I like the idea but I'm still tossing it around in my head (and
> thinking about Casey's statement too). Lets say we have a 'docker-like'
> container with pid=100 netns=X,userns=Y,mountns=Z. If I'm on the host
> in all init namespaces and I run
> nsenter -t 100 -n ip link set eth0 promisc on
> How should this be logged?

If it is a normal process, then everything would match the init name space and
you wouldn't have entered a container. If it were a container, any generated
event should have the container ID from registration attached to it.

> Did this command run in it's own 'container' unrelated to the 'docker-like'
> container?

That should be determined by what's in the task struct.

-Steve