Date: Mon, 11 Dec 2017 10:10:57 -0500
From: Richard Guy Briggs <rgb@redhat.com>
To: =?iso-8859-1?Q?Micka=EBl_Sala=FCn?= <mic@digikod.net>
Cc: Casey Schaufler <casey@schaufler-ca.com>, cgroups@vger.kernel.org,
        Linux Containers <containers@lists.linux-foundation.org>,
        Linux API <linux-api@vger.kernel.org>,
        Linux Audit <linux-audit@redhat.com>,
        Linux FS Devel <linux-fsdevel@vger.kernel.org>,
        Linux Kernel <linux-kernel@vger.kernel.org>,
        Linux Network Development <netdev@vger.kernel.org>,
        mszeredi@redhat.com, "Eric W. Biederman" <ebiederm@xmission.com>,
        Simo Sorce <simo@redhat.com>, jlayton@redhat.com,
        "Carlos O'Donell" <carlos@redhat.com>,
        David Howells <dhowells@redhat.com>, Al Viro <viro@zeniv.linux.org.uk>,
        Andy Lutomirski <luto@kernel.org>, Eric Paris <eparis@parisplace.org>,
        trondmy@primarydata.com, Michael Kerrisk <mtk.manpages@gmail.com>
Subject: Re: RFC(v2): Audit Kernel Container IDs
Message-ID: <20171211151057.uncby5fykre2tdjn@madcap2.tricolour.ca>
References: <20171012141359.saqdtnodwmbz33b2@madcap2.tricolour.ca>
 <75b7d6a6-42ba-2dff-1836-1091c7c024e7@schaufler-ca.com>
 <7ebca85a-425c-2b95-9a5f-59d81707339e@digikod.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <7ebca85a-425c-2b95-9a5f-59d81707339e@digikod.net>
User-Agent: NeoMutt/20171027
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3620
Lines: 82

On 2017-12-09 11:20, Micka?l Sala?n wrote:
> 
> On 12/10/2017 18:33, Casey Schaufler wrote:
> > On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> >> Containers are a userspace concept.  The kernel knows nothing of them.
> >>
> >> The Linux audit system needs a way to be able to track the container
> >> provenance of events and actions.  Audit needs the kernel's help to do
> >> this.
> >>
> >> Since the concept of a container is entirely a userspace concept, a
> >> registration from the userspace container orchestration system initiates
> >> this.  This will define a point in time and a set of resources
> >> associated with a particular container with an audit container ID.
> >>
> >> The registration is a pseudo filesystem (proc, since PID tree already
> >> exists) write of a u8[16] UUID representing the container ID to a file
> >> representing a process that will become the first process in a new
> >> container.  This write might place restrictions on mount namespaces
> >> required to define a container, or at least careful checking of
> >> namespaces in the kernel to verify permissions of the orchestrator so it
> >> can't change its own container ID.  A bind mount of nsfs may be
> >> necessary in the container orchestrator's mntNS.
> >> Note: Use a 128-bit scalar rather than a string to make compares faster
> >> and simpler.
> >>
> >> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> >> registration.
> > 
> > Hang on. If containers are a user space concept, how can
> > you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> > a container, how can you be asking for a capability to manage
> > them?
> > 
> >>   At that time, record the target container's user-supplied
> >> container identifier along with the target container's first process
> >> (which may become the target container's "init" process) process ID
> >> (referenced from the initial PID namespace), all namespace IDs (in the
> >> form of a nsfs device number and inode number tuple) in a new auxilliary
> >> record AUDIT_CONTAINER with a qualifying op=$action field.
> 
> Here is an idea to avoid privilege problems or the need for a new
> capability: make it automatic. What makes a container a container seems
> to be the use of at least a namespace. What about automatically create
> and assign an ID to a process when it enters a namespace different than
> one of its parent process? This delegates the (permission)
> responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

A container doesn't imply a namespace and vice versa.

> One interesting side effect of this approach would be to be able to
> identify which processes are in the same set of namespaces, even if not
> spawn from the container but entered after its creation (i.e. using
> setns), by creating container IDs as a (deterministic) checksum from the
> /proc/self/ns/* IDs.

This would be really helpful, but it isn't the case.

> Since the concern is to identify a container, I think the ability to
> audit the switch from one container ID to another is enough. I don't
> think we need nested IDs.

Since container namespace membership is arbitrary between container
orchestrators, this needs a registration process and a way for the
container orchestrator to know the ID.


I completely agree with Casey here.

> As a side note, you may want to take a look at the Linux-VServer's XID.
> 
> Regards,
>  Micka?l

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635