Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp7185616ybi; Mon, 8 Jul 2019 16:26:25 -0700 (PDT) X-Google-Smtp-Source: APXvYqzxlK7Zsur6xhKEgKvugIhWbjqKq476yeoDJBah0xn4b5+hCULK+/PeYyFtKUsao8z+Hwk5 X-Received: by 2002:a63:1b23:: with SMTP id b35mr27369424pgb.128.1562628385706; Mon, 08 Jul 2019 16:26:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562628385; cv=none; d=google.com; s=arc-20160816; b=bPRYEfiEAaqVAaCdwPiQKGhM/JmecLaj2Hi+vhU4/mNpXkiBE6aMO+W3TLza4cfZne pBgYNt6nElhpYkVpxEDEo4kflsD7C+Emfc6Qy8kdwifUXwZeQ6kZ1BQ2rwlYg3fe9ukC oaSQdhpHwogh98y1Fp+rvNXnLCdT9V5swxteOY4lxYUXkZ58tn+GUHO7upnk0Fiz9chV 7Raxdb5lU1CemB3FU8TQ3kIzqbYZw3XttBkWUq1uz/gxARmtYLdZ8+qMIfVGlTODkWur ZcYQPTq37g144ehDOtHUFV5rGxRbki2LiVM7K4JUiTVJAe4Bu5W3l+gxTKLy1BrAhadr KqTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :subject:user-agent:references:in-reply-to:message-id:date:cc:to :from:dkim-signature; bh=R0n8Z37vuiuzeaTBasaidvJ2vtLHJp1fKzn9/xOAFFs=; b=08JU5UPt+gjnEr6NqmxKRlCQpTgwU74zRLg/SKcU+kG1V1qrOdXm3CLxJ7A7RNHnRm Zz7+HHfIs5nqk6WL1tA/3h+Ubpb7MU/guN1TWMqrdjlr1fMniaX5QDB56Zuqg9k8Sq7i IFEBRwUDha8lk7Jwt9E/qgCFR3mDjxbZKBcTese8I/japevFXOaySAl5wC2T+OyreHid VZYuj4DXl1mY55tgB+HjSltaJQawsnHZwvuc9Dm653WsbsBq8lhre29MrqSQui/erm9q tHq7DXgbO6jG5Bvg+howcFWLXAI41MtO4B60fZJzN3oh7UWNUGHJBvEZqPreJtQeXVO0 TH4A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@paul-moore-com.20150623.gappssmtp.com header.s=20150623 header.b=qEe8Jyev; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h11si20316905plr.34.2019.07.08.16.26.11; Mon, 08 Jul 2019 16:26:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@paul-moore-com.20150623.gappssmtp.com header.s=20150623 header.b=qEe8Jyev; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405182AbfGHUnd (ORCPT + 99 others); Mon, 8 Jul 2019 16:43:33 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:40319 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405016AbfGHUna (ORCPT ); Mon, 8 Jul 2019 16:43:30 -0400 Received: by mail-wr1-f65.google.com with SMTP id r1so12242603wrl.7 for ; Mon, 08 Jul 2019 13:43:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paul-moore-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:date:message-id:in-reply-to:references:user-agent :subject:mime-version:content-transfer-encoding; bh=R0n8Z37vuiuzeaTBasaidvJ2vtLHJp1fKzn9/xOAFFs=; b=qEe8JyevDvWoiXnfHt851w8szzgIcwnnHRmpgqbeaAhl2GWMtBhVG8q4RWkAhQw/P2 A1eFQlsVU8dKrcpXLuGhb5UWn/t3Ip8YoWt/CApR21GPzHcuu0ac3l5D7y4Nc0/n84Ba 1OSZwvA4gECQryu4Y76iWMjvGKUrI/5Zp3dzMqNJqxXeKBC4BDi/0F4TIw0+MN+7Irtr ad3VGJDCtsnt9ESmUIXSqUY565I//px3FGiJQMEonbSXgIzUi1rxcIQlMDAe9Jw2KnXr CFOSAjDm7IHvK4yUwHB/NlotTcYkYKYATSGRDUMvYxDL/CO5hexUmi6pvRkY9W5tNVO2 asgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:date:message-id:in-reply-to :references:user-agent:subject:mime-version :content-transfer-encoding; bh=R0n8Z37vuiuzeaTBasaidvJ2vtLHJp1fKzn9/xOAFFs=; b=O0/fTB3IqsKUxn07/lortx9h9fyHfErk5DnYALcRyRMv5SyPcPEnAiJOlAwD0AozoY p1LjriWto6POQwMsAEbRkh72v6WCpZ31bGWIWuvZ0Q+2IHxGlDt+GcOnibSm6Nqkycto S2E+63ksG9EVVFXCmAuB7CKReypqCfv92FnWBCLrZftG53CGBpXbmOYx48oy4u4u9zY0 Wb0Kc3lBu3Vj7SxTF/doIsNBP6YwT0xmemci6douj73CDDg/Pf3wVa85GS4Bol6R/kOB VRXBuDnTjD9mAbJSgv4P0mfqsNPForHXmIgWC7eC26a+T9kc8mT4O18KbBdr7WvAQpHl 0agw== X-Gm-Message-State: APjAAAUlGzWajd5t4PVoAVjSpC5/9A+xFjZQSw77rKC6pLgWQ0PdrktU 2fATVztPU7cbzJZHdav97suj X-Received: by 2002:adf:ce82:: with SMTP id r2mr19648257wrn.223.1562618607220; Mon, 08 Jul 2019 13:43:27 -0700 (PDT) Received: from [10.149.209.138] ([46.189.67.107]) by smtp.gmail.com with ESMTPSA id y7sm536362wmm.19.2019.07.08.13.43.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 08 Jul 2019 13:43:26 -0700 (PDT) From: Paul Moore To: Richard Guy Briggs CC: Tycho Andersen , "Serge E. Hallyn" , , , "Linux-Audit Mailing List" , , LKML , , , , , , , Eric Paris , , Date: Mon, 08 Jul 2019 22:43:23 +0200 Message-ID: <16bd353a5f8.280e.85c95baa4474aabc7814e68940a78392@paul-moore.com> In-Reply-To: <20190708181237.5poheliito7zpvmc@madcap2.tricolour.ca> References: <20190529145742.GA8959@cisco> <20190529153427.GB8959@cisco> <20190529222835.GD8959@cisco> <20190530170913.GA16722@mail.hallyn.com> <20190530212900.GC5739@cisco> <20190708181237.5poheliito7zpvmc@madcap2.tricolour.ca> User-Agent: AquaMail/1.20.0-1462 (build: 102100002) Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On July 8, 2019 8:12:56 PM Richard Guy Briggs wrote: > On 2019-05-30 19:26, Paul Moore wrote: >> On Thu, May 30, 2019 at 5:29 PM Tycho Andersen wrote: >>> On Thu, May 30, 2019 at 03:29:32PM -0400, Paul Moore wrote: >>>> >>>> >>>> [REMINDER: It is an "*audit* container ID" and not a general >>>> "container ID" ;) Smiley aside, I'm not kidding about that part.] >>> >>> This sort of seems like a distinction without a difference; presumably >>> audit is going to want to differentiate between everything that people >>> in userspace call a container. So you'll have to support all this >>> insanity anyway, even if it's "not a container ID". >> >> That's not quite right. Audit doesn't care about what a container is, >> or is not, it also doesn't care if the "audit container ID" actually >> matches the ID used by the container engine in userspace and I think >> that is a very important line to draw. Audit is simply given a value >> which it calls the "audit container ID", it ensures that the value is >> inherited appropriately (e.g. children inherit their parent's audit >> container ID), and it uses the value in audit records to provide some >> additional context for log analysis. The distinction isn't limited to >> the value itself, but also to how it is used; it is an "audit >> container ID" and not a "container ID" because this value is >> exclusively for use by the audit subsystem. We are very intentionally >> not adding a generic container ID to the kernel. If the kernel does >> ever grow a general purpose container ID we will be one of the first >> ones in line to make use of it, but we are not going to be the ones to >> generically add containers to the kernel. Enough people already hate >> audit ;) >> >>>> I'm not interested in supporting/merging something that isn't useful; >>>> if this doesn't work for your use case then we need to figure out what >>>> would work. It sounds like nested containers are much more common in >>>> the lxc world, can you elaborate a bit more on this? >>>> >>>> >>>> As far as the possible solutions you mention above, I'm not sure I >>>> like the per-userns audit container IDs, I'd much rather just emit the >>>> necessary tracking information via the audit record stream and let the >>>> log analysis tools figure it out. However, the bigger question is how >>>> to limit (re)setting the audit container ID when you are in a non-init >>>> userns. For reasons already mentioned, using capable() is a non >>>> starter for everything but the initial userns, and using ns_capable() >>>> is equally poor as it essentially allows any userns the ability to >>>> munge it's audit container ID (obviously not good). It appears we >>>> need a different method for controlling access to the audit container >>>> ID. >>> >>> One option would be to make it a string, and have it be append only. >>> That should be safe with no checks. >>> >>> I know there was a long thread about what type to make this thing. I >>> think you could accomplish the append-only-ness with a u64 if you had >>> some rule about only allowing setting lower order bits than those that >>> are already set. With 4 bits for simplicity: >>> >>> 1100 # initial container id >>> 1100 -> 1011 # not allowed >>> 1100 -> 1101 # allowed, but now 1101 is set in stone since there are >>> # no lower order bits left >>> >>> There are probably fancier ways to do it if you actually understand >>> math :) >> >> ;) >> >>> Since userns nesting is limited to 32 levels (right now, IIRC), and >>> you have 64 bits, this might be reasonable. You could just teach >>> container engines to use the first say N bits for themselves, with a 1 >>> bit for the barrier at the end. >> >> I like the creativity, but I worry that at some point these >> limitations are going to be raised (limits have a funny way of doing >> that over time) and we will be in trouble. I say "trouble" because I >> want to be able to quickly do an audit container ID comparison and >> we're going to pay a penalty for these larger values (we'll need this >> when we add multiple auditd support and the requisite record routing). >> >> Thinking about this makes me also realize we probably need to think a >> bit longer about audit container ID conflicts between orchestrators. >> Right now we just take the value that is given to us by the >> orchestrator, but if we want to allow multiple container orchestrators >> to work without some form of cooperation in userspace (I think we have >> to assume the orchestrators will not talk to each other) we likely >> need to have some way to block reuse of an audit container ID. We >> would either need to prevent the orchestrator from explicitly setting >> an audit container ID to a currently in use value, or instead generate >> the audit container ID in the kernel upon an event triggered by the >> orchestrator (e.g. a write to a /proc file). I suspect we should >> start looking at the idr code, I think we will need to make use of it. > > To address this, I'd suggest that it is enforced to only allow the > setting of descendants and to maintain a master list of audit container > identifiers (with a hash table if necessary later) that includes the > container owner. > > This also allows the orchestrator/engine to inject processes into > existing containers by checking that the audit container identifier is > only used again by the same owner. > > I have working code for both. Just a quick note that due to some holiday travel I'm not going to be able = to adequately respond to your latest messages on this thread for at least a= nother week, likely a bit more. I'm only checking mail to put out fires, a= nd the audit container ID work tends to be something that starts them ;) -- paul moore www.paul-moore.com