Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp1364222ybi; Thu, 30 May 2019 16:28:39 -0700 (PDT) X-Google-Smtp-Source: APXvYqw5+/lD+WWVW4wiYdns9HQFCSLz3nXBy5VYcLSwVBPIfE3KccCfauBg2SuTcsTkwnpD80Ha X-Received: by 2002:a62:b517:: with SMTP id y23mr6554587pfe.182.1559258919388; Thu, 30 May 2019 16:28:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559258919; cv=none; d=google.com; s=arc-20160816; b=0ckRpfaF0L1Z8JHErN+6Ekhv7LbUld2LRQ83uSVlCkjig/r1a9hNMe0FFuzXQk/cqB QgRePCCCAU/NNGN8daINNeDy0lDuC4N8ZMjt6t0iVsjbn8Ccws3MO2HsogEZmCb93KM+ T2oGKTowpN80LTdubowqWvLDS3qbqd3vGY/bzFlwc7GKHd5TxtvXp8cKm4zah45ga7mR c+4x/a1Hp+45mqKFmlK/IbccPXOslr2yWTev66L9QZoAu6x23EQrvD4i28fezk99Nvl4 JG0ETI40VV2NZ9mdNU9/J2HxfHYhfdz46SEPFwrNJQ5V69KMk7UzX1uQgDGHIJ/6DbD9 ET3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=0yVW5j/diaIQvY1yMq0bFKbS6dKMLujpejPsf0pKXzk=; b=DKgmIPIL4jpnCEWmysGNSlXQz4t7BYVwMkevsECIfYCoyGbp8tEd4EAJiZ92TMPx6C y9bKa8fCE+wV78wGQxbaOclCEDavlUVMgPx555Nwh7UhW11eGgKqVhRPvIaBKrl12KUb 142gKMz2Y9GQXIOeoyKNdvuef21Ckry6r8331jI6KDwTxTWa4VJvpN3/7AQbwAQcUKyE L9Kj8uQpTMfRXmf33hwFaRLAjCD/vWDSvvZXAVm2KzVMHZT1PGa0pJNK6GiHI+GSg8zY vuakA4o1zUMfwPll1KsJ4TTuH1bQmsZ1p1+j52Cpq4EwtvLRHh61tUjUUceVYnLyjgMS /qog== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@paul-moore-com.20150623.gappssmtp.com header.s=20150623 header.b=PDYHHACO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g3si4162299pgc.156.2019.05.30.16.28.21; Thu, 30 May 2019 16:28:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@paul-moore-com.20150623.gappssmtp.com header.s=20150623 header.b=PDYHHACO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726430AbfE3X1J (ORCPT + 99 others); Thu, 30 May 2019 19:27:09 -0400 Received: from mail-lj1-f195.google.com ([209.85.208.195]:36405 "EHLO mail-lj1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726029AbfE3X1I (ORCPT ); Thu, 30 May 2019 19:27:08 -0400 Received: by mail-lj1-f195.google.com with SMTP id m22so7517728ljc.3 for ; Thu, 30 May 2019 16:27:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paul-moore-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=0yVW5j/diaIQvY1yMq0bFKbS6dKMLujpejPsf0pKXzk=; b=PDYHHACOR7Nb4rkc+hTNYNd9Z9ljJBGEUsdJtZ3zYXhuFJn2Au8AvhNiTIMJ/6T/Fn c/D22cn2dH4cMZHNwFt4opSG7JIOnznRdCb3xLIBZVcKRgJTPmIQ7+CeSiiIMgNEDZVf xzN/juwui1M2GF3roXcCcidTOA6eecFUtKm7I9zkiECCwi3D9V8bX9tUd/meku2kKmQd 4veGjMVnDw03FlfPZ2S7igYflq3MT8QzuXPu9Q9ft9Ji/G0FGdkBfVhAB1hrXEfVmHjw Ko2YdU2Icxzjbwyj4jfiTc62HZBkxNv7p9sqA4vMZb7uVVMOj1OKkS+IPDKVOcw8epCZ /n1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0yVW5j/diaIQvY1yMq0bFKbS6dKMLujpejPsf0pKXzk=; b=ovSPCtXxl59E7/goTwFqYe30bRPjkcwy0fBUp0+WXfbzEeCiP98W53nmx3oRdK+plj kPX3seG6Tl09jlXhTmVjoVlJB1+xTTjknHD5QlI0mIeEI3bn2mlz+xxcPRhwpe2v2uI3 xFGfT79IJP+gcpL9IEjQwliQDgi+Va/PUPwZKTmzkCU2Lm6Qjl7wdYK4UUJnEHPRbMVw kkf4mtOYmCbZagr6XKNRSUIMugOoE+NzgHDOmSWX2DYS0OiL7SR6oMShrfwStexPhRAm rP2T/zmWR4xIt9uJ8VpwS171AkXDWFGNxykYWKAWvIwDHrlxd6a7WyZJoRI23UZqdkVC T/zg== X-Gm-Message-State: APjAAAWHv3P1VjjrmksTybAIyyBW7RMTtk9OleLXxLrrAE6m4LSW21+D Hblm3aLHOIkAZkKiUAHjFdFHEph2ND6AX6mvyeXn X-Received: by 2002:a2e:9a97:: with SMTP id p23mr3811340lji.160.1559258825495; Thu, 30 May 2019 16:27:05 -0700 (PDT) MIME-Version: 1.0 References: <9edad39c40671fb53f28d76862304cc2647029c6.1554732921.git.rgb@redhat.com> <20190529145742.GA8959@cisco> <20190529153427.GB8959@cisco> <20190529222835.GD8959@cisco> <20190530170913.GA16722@mail.hallyn.com> <20190530212900.GC5739@cisco> In-Reply-To: <20190530212900.GC5739@cisco> From: Paul Moore Date: Thu, 30 May 2019 19:26:54 -0400 Message-ID: Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id To: Tycho Andersen Cc: "Serge E. Hallyn" , Richard Guy Briggs , containers@lists.linux-foundation.org, linux-api@vger.kernel.org, Linux-Audit Mailing List , linux-fsdevel@vger.kernel.org, LKML , netdev@vger.kernel.org, netfilter-devel@vger.kernel.org, sgrubb@redhat.com, omosnace@redhat.com, dhowells@redhat.com, simo@redhat.com, Eric Paris , ebiederm@xmission.com, nhorman@tuxdriver.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 30, 2019 at 5:29 PM Tycho Andersen wrote: > On Thu, May 30, 2019 at 03:29:32PM -0400, Paul Moore wrote: > > > > [REMINDER: It is an "*audit* container ID" and not a general > > "container ID" ;) Smiley aside, I'm not kidding about that part.] > > This sort of seems like a distinction without a difference; presumably > audit is going to want to differentiate between everything that people > in userspace call a container. So you'll have to support all this > insanity anyway, even if it's "not a container ID". That's not quite right. Audit doesn't care about what a container is, or is not, it also doesn't care if the "audit container ID" actually matches the ID used by the container engine in userspace and I think that is a very important line to draw. Audit is simply given a value which it calls the "audit container ID", it ensures that the value is inherited appropriately (e.g. children inherit their parent's audit container ID), and it uses the value in audit records to provide some additional context for log analysis. The distinction isn't limited to the value itself, but also to how it is used; it is an "audit container ID" and not a "container ID" because this value is exclusively for use by the audit subsystem. We are very intentionally not adding a generic container ID to the kernel. If the kernel does ever grow a general purpose container ID we will be one of the first ones in line to make use of it, but we are not going to be the ones to generically add containers to the kernel. Enough people already hate audit ;) > > I'm not interested in supporting/merging something that isn't useful; > > if this doesn't work for your use case then we need to figure out what > > would work. It sounds like nested containers are much more common in > > the lxc world, can you elaborate a bit more on this? > > > > As far as the possible solutions you mention above, I'm not sure I > > like the per-userns audit container IDs, I'd much rather just emit the > > necessary tracking information via the audit record stream and let the > > log analysis tools figure it out. However, the bigger question is how > > to limit (re)setting the audit container ID when you are in a non-init > > userns. For reasons already mentioned, using capable() is a non > > starter for everything but the initial userns, and using ns_capable() > > is equally poor as it essentially allows any userns the ability to > > munge it's audit container ID (obviously not good). It appears we > > need a different method for controlling access to the audit container > > ID. > > One option would be to make it a string, and have it be append only. > That should be safe with no checks. > > I know there was a long thread about what type to make this thing. I > think you could accomplish the append-only-ness with a u64 if you had > some rule about only allowing setting lower order bits than those that > are already set. With 4 bits for simplicity: > > 1100 # initial container id > 1100 -> 1011 # not allowed > 1100 -> 1101 # allowed, but now 1101 is set in stone since there are > # no lower order bits left > > There are probably fancier ways to do it if you actually understand > math :) ;) > Since userns nesting is limited to 32 levels (right now, IIRC), and > you have 64 bits, this might be reasonable. You could just teach > container engines to use the first say N bits for themselves, with a 1 > bit for the barrier at the end. I like the creativity, but I worry that at some point these limitations are going to be raised (limits have a funny way of doing that over time) and we will be in trouble. I say "trouble" because I want to be able to quickly do an audit container ID comparison and we're going to pay a penalty for these larger values (we'll need this when we add multiple auditd support and the requisite record routing). Thinking about this makes me also realize we probably need to think a bit longer about audit container ID conflicts between orchestrators. Right now we just take the value that is given to us by the orchestrator, but if we want to allow multiple container orchestrators to work without some form of cooperation in userspace (I think we have to assume the orchestrators will not talk to each other) we likely need to have some way to block reuse of an audit container ID. We would either need to prevent the orchestrator from explicitly setting an audit container ID to a currently in use value, or instead generate the audit container ID in the kernel upon an event triggered by the orchestrator (e.g. a write to a /proc file). I suspect we should start looking at the idr code, I think we will need to make use of it. -- paul moore www.paul-moore.com