Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1788366yba; Thu, 4 Apr 2019 19:07:45 -0700 (PDT) X-Google-Smtp-Source: APXvYqxaCd3N3msSslgmzjdlL9KduRXtCFqeOm2goHewo2EkTgrKUB9V0D/JVSE9DjmGGCHyYuS3 X-Received: by 2002:a65:4183:: with SMTP id a3mr9521255pgq.121.1554430065293; Thu, 04 Apr 2019 19:07:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554430065; cv=none; d=google.com; s=arc-20160816; b=isRIhA6V5aMqt8RMup7OZRtW+W1gZoZhECdiuq94IxnXfJ+W0Gz03+zx9NQk57+47P t6c7f/lYyw6RlzxFWN7jRRYQB1ld6goRw/U/Ax4Xb4uZNUPgSvD/8oQsYpNMCtsXEyaY 72f/oQt6WhDvDy4dL+iA8/gtjKrrvRV3f1w3tSUk7068/+fsIJ1stWWmhvu7JwUr9dHN dI+0mEPrx51hDg8BqI++o8XOXylcaDHdxEdRiuX9YYKYlwth/WJ70tg97swoBfStZKbZ O3ejnMXN0S+XVokFnaiOaudiacJPyksEPA7ftDKr7HQESTHU+IATd6tMtDrN11HcQQka Yj4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=i0SVqH/6Kj1962AGBZkuoEdnZ7+ADTw/XymJ0q2hSFA=; b=Uh1XTtXUnhQbUw0U33E5JvgeOUSCaN4l9zGm0bOJXOE+QD+XmGfQ4UFGTRXq/N92zJ 3EWNM3p2x6CP+pK4uxmvQ4VlJ3yeI7l63R2SCDjQDE7b1JjO1rLIWdMt5R4MKUw5j2sK PcEdvECMXku4XQEWmdha+jZJfFV785S8YbHAEYqW9o/XB+m1uqolltllEo8h30yx1rGj tC9pET3O69UpJx5PxVaUEFGmNKyLcj756BQJVmR/kCndNrsPLUwbtOM4j3SoxO+6gQXZ KgkT8WSZdNoqq9Gny0mlUO6GuMw5T95TbDUmGrjvsz6EWJBYhefwWzTwujvhCxxTZHX/ fptQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@paul-moore-com.20150623.gappssmtp.com header.s=20150623 header.b=H+EapgdZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u4si7876152pga.245.2019.04.04.19.07.30; Thu, 04 Apr 2019 19:07:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@paul-moore-com.20150623.gappssmtp.com header.s=20150623 header.b=H+EapgdZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729275AbfDECG1 (ORCPT + 99 others); Thu, 4 Apr 2019 22:06:27 -0400 Received: from mail-lj1-f195.google.com ([209.85.208.195]:44930 "EHLO mail-lj1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729142AbfDECGZ (ORCPT ); Thu, 4 Apr 2019 22:06:25 -0400 Received: by mail-lj1-f195.google.com with SMTP id h16so3796162ljg.11 for ; Thu, 04 Apr 2019 19:06:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paul-moore-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=i0SVqH/6Kj1962AGBZkuoEdnZ7+ADTw/XymJ0q2hSFA=; b=H+EapgdZSZCqlG3kN0Iq1MJhAeBXcbDqnFBx7q55rD4K3WrpLrFwBtIqI2q9Ubuwzu ugZd4giYCA0dRsnriQYGjnpRPjMoMt1d0R0WwQ8XaY9kyGJarhl/iGSa3pvZMu5T6KS+ ROXknsNRefhDEelGjOhvriMhw3ttCiZghJp7yuGel2dAgyYzHFcORnGH8WfcCQ+sEqdi Mwx0pYwGOUzv4ioYomUM+J4cSpmI+WVj96bS+kKJ2C/S9hUGCXbs9aSXYSv+G54zeuua GpVwMZ1eisfTvvyHNiCVxhwJYC3w+v5Iv0iTun53uqUafP/U6Ea+jf12EgZN1MSkGZVG hYkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=i0SVqH/6Kj1962AGBZkuoEdnZ7+ADTw/XymJ0q2hSFA=; b=GvG++DXeliRSAWGLkl8pgnA4jnzmN9eoHEYLJSYlJhcfQw9shsyZueEAlJeS34W4vx W3aliOUNRgBEfOSrZTF/SoPC8gobeppO5rrZZfga2OMqRISk/hCEMNgFU+/kXwgVVaee 7Rh4OclfQbXFRAeCg29EasTN0UcV0Xv237kty+CbbJzeBfGhccH8RpT2l8k8i/MmUyfT BNKdROKDZzg5nodO654ayANkoXdsoTPgqxqmpr4G66scr0wRCKMvZ5fBTU8CIkXqovo1 WRBbQduyEawkIpBn8N5CqwZQbUHoFVBiU26A1fkX798NLqjB6nCdXG6zLxA2rWossxOG ZiUQ== X-Gm-Message-State: APjAAAW08BpiK0uzwyjw9wVeZ87FCs+SJbCAjpi/5IH66TORu7heSjVL D+jMiGseS6+4YH3GNDpR0+pxbMN3LuNfPTBUzL4c X-Received: by 2002:a05:651c:114:: with SMTP id a20mr5199822ljb.53.1554429982109; Thu, 04 Apr 2019 19:06:22 -0700 (PDT) MIME-Version: 1.0 References: <27473c84a274c64871cfa8e3636deaf05603c978.1552665316.git.rgb@redhat.com> <20190402113150.GA17593@hmswarspite.think-freely.org> <20190404214006.jcgmjb4u6iobu42s@madcap2.tricolour.ca> In-Reply-To: <20190404214006.jcgmjb4u6iobu42s@madcap2.tricolour.ca> From: Paul Moore Date: Thu, 4 Apr 2019 22:06:10 -0400 Message-ID: Subject: Re: [PATCH ghak90 V5 09/10] audit: add support for containerid to network namespaces To: Richard Guy Briggs Cc: Neil Horman , linux-api@vger.kernel.org, containers@lists.linux-foundation.org, LKML , dhowells@redhat.com, Linux-Audit Mailing List , netfilter-devel@vger.kernel.org, ebiederm@xmission.com, simo@redhat.com, netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, Eric Paris , Serge Hallyn Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 4, 2019 at 5:40 PM Richard Guy Briggs wrote: > On 2019-04-02 07:31, Neil Horman wrote: > > On Mon, Apr 01, 2019 at 10:50:03AM -0400, Paul Moore wrote: > > > On Fri, Mar 15, 2019 at 2:35 PM Richard Guy Briggs wrote: > > > > Audit events could happen in a network namespace outside of a task > > > > context due to packets received from the net that trigger an auditing > > > > rule prior to being associated with a running task. The network > > > > namespace could be in use by multiple containers by association to the > > > > tasks in that network namespace. We still want a way to attribute > > > > these events to any potential containers. Keep a list per network > > > > namespace to track these audit container identifiiers. > > > > > > > > Add/increment the audit container identifier on: > > > > - initial setting of the audit container identifier via /proc > > > > - clone/fork call that inherits an audit container identifier > > > > - unshare call that inherits an audit container identifier > > > > - setns call that inherits an audit container identifier > > > > Delete/decrement the audit container identifier on: > > > > - an inherited audit container identifier dropped when child set > > > > - process exit > > > > - unshare call that drops a net namespace > > > > - setns call that drops a net namespace > > > > > > > > See: https://github.com/linux-audit/audit-kernel/issues/92 > > > > See: https://github.com/linux-audit/audit-testsuite/issues/64 > > > > See: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID > > > > Signed-off-by: Richard Guy Briggs > > > > --- > > > > include/linux/audit.h | 19 ++++++++++++ > > > > kernel/audit.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++-- > > > > kernel/nsproxy.c | 4 +++ > > > > 3 files changed, 106 insertions(+), 3 deletions(-) ... > > > > + list_for_each_entry_rcu(cont, contid_list, list) > > > > + if (cont->id == contid) { > > > > + refcount_inc(&cont->refcount); > > > > + goto out; > > > > + } > > > > + cont = kmalloc(sizeof(struct audit_contid), GFP_ATOMIC); > > > > > > If you had to guess, what do you think is going to be more common: > > > bumping the refcount of an existing entry in the list, or adding a new > > > entry? I'm asking because I always get a little nervous when doing > > > allocations while holding a spinlock. Yes, you are doing it with > > > GFP_ATOMIC, but it still seems like something to try and avoid if this > > > is going to approach 50%. However, if the new entry is rare then the > > > extra work of always doing the allocation before taking the lock and > > > then freeing it afterwards might be a bad tradeoff. > > > > > I think this is another way of asking, will multiple processes exist in the same > > network namespace? That is to say, will we expect a process to create a net > > namespace, and then have other processes enter that namespace (thereby > > triggering multiple calls to audit_netns_contid_add with the same net pointer > > and cont id). Given that the kubernetes notion of a pod is almost by definition > > multiple containers sharing a network namespace, I think the answer is that we > > will be strongly biased towards the refcount_inc case, rather than the kmalloc > > case. I could be wrong, but I think this is likely already in the optimized > > order. > > I had a stab at doing a GFP_KERNEL alloc before the spinlock and releasing it > after if it wasn't needed (in Patch#1 below). I also went one step further and > converted it to kmem_cache (in Patch#2 below). > > > > My gut feeling says we might do about as many allocations as refcount > > > bumps, but I could be thinking about this wrong. > > > > > > Moving the allocation outside the spinlock might also open the door to > > > doing this as GFP_KERNEL, which is a good thing, but I haven't looked > > > at the callers to see if that is possible (it may not be). That's an > > > exercise left to the patch author (if he hasn't done that already). > > Both appear to work, but after successfully running both the contid test and > audit_testsuite, once I start to push that test system a bit harder I end up > with a deadlock warning. > > I am having trouble understanding why since it happens both without and with > the kmem_cache options, so it must be another part of the code that is > triggering it. The task_lock is being held at this point in > audit_set_contid(). I haven't tried changing this alloc over to a GFP_ATOMIC > to see if that prevents it, just as a debugging check... > At this point, I'm inclined to leave it as it was without these two patches > since it works and there doesn't seem to be an obvious best way given the > uncertainty of the potential workloads. I'm definitely not a mm expert, but I wonder if you might be able to get this working using GFP_NOFS, but I see just now that they recommend you *not* use GFP_NOFS; even if this did solve the problem, it's probably not the right approach. I'm okay with leaving it as-is with GFP_ATOMIC. My original response to this was more of a question about optimizing for a given use case, having something that works is far more important. -- paul moore www.paul-moore.com