From: ebiederm@xmission.com (Eric W. Biederman)
To: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: ast@fb.com, peterz@infradead.org, lkml <linux-kernel@vger.kernel.org>,
        acme@kernel.org, alexander.shishkin@linux.intel.com, mingo@redhat.com,
        daniel@iogearbox.net, rostedt@goodmis.org,
        Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>,
        sargun@sargun.me, Aravinda Prasad <aravinda@linux.vnet.ibm.com>,
        brendan.d.gregg@gmail.com
References: <147877784354.29988.8570048236764105701.stgit@hbathini.in.ibm.com>
        <87a8d7m805.fsf@xmission.com>
        <7f1d2f36-7bfc-dc97-0de8-f8a3203ca26e@linux.vnet.ibm.com>
Date: Wed, 16 Nov 2016 11:27:28 -0600
In-Reply-To: <7f1d2f36-7bfc-dc97-0de8-f8a3203ca26e@linux.vnet.ibm.com> (Hari
        Bathini's message of "Tue, 15 Nov 2016 17:51:09 +0530")
Message-ID: <87lgwjfi7z.fsf@xmission.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Subject: Re: [PATCH 0/3] perf: add support for analyzing events for containers
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3085
Lines: 66

Hari Bathini <hbathini@linux.vnet.ibm.com> writes:

> On Friday 11 November 2016 01:18 AM, Eric W. Biederman wrote:
>> Hari Bathini <hbathini@linux.vnet.ibm.com> writes:
>>
>>> Currently, there is no trivial mechanism to analyze events based on
>>> containers. perf -G can be used, but it will not filter events for the
>>> containers created after perf is invoked, making it difficult to assess/
>>> analyze performance issues of multiple containers at once.
>>>
>>> This patch-set overcomes this limitation by using cgroup identifier as
>>> container unique identifier. A new PERF_RECORD_NAMESPACES event that
>>> records namespaces related info is introduced, from which the cgroup
>>> namespace's inode number is used as cgroup identifier.  This is based
>>> on the assumption that each container is created with it's own cgroup
>>> namespace allowing assessment/analysis of multiple containers using
>>> cgroup identifier.
>>>
>>> The first patch introduces PERF_RECORD_NAMESPACES in kernel while the
>>> second patch makes the corresponding changes in perf tool to read this
>>> PERF_RECORD_NAMESPACES events. The third patch adds a cgroup identifier
>>> column in perf report, which is nothing but the cgroup namespace's
>>> inode number. This approach is based on the suggestion from Peter
>>> Zijlstra here: https://patchwork.kernel.org/patch/9305655/
>> Where is the check that ensures that only the someone with
>> capable(CAP_SYS_ADMIN) can use this interface.  This interface is not
>> namespace clean in multiple dimensions so it can not be used generally?
>
> Right. Will add the check..
>
>> You are not allowed to move struct mount_namespace into
>> include/linux/mnt_namespace.h.  Al Viro will crucify you with cause.
>> Those are implementation details the rest of the kernel should not be
>> digging into.
>
> Ouch! How about adding an accessor function(s) in fs/namespace.c ..?

For reasonable things of course.    I think the namespace operations
from ns common already has a large set of accessors so I don't know
what you are looking for.

>> Where are the device numbers that go with those inode numbers you are
>> exporting?  For now all of those inodes live on the filesystem but I am
>> not giving guarantees to userspace that do not work for ordinary
>> filesystems.
>
> Sorry! I didn't get this..
> Want to use these numbers as identity for namespace (like pid for process..)

Yes I understand you would like to have a global identifier like pids.
A global identifier would ultimately require the addition of a namespace
of namespaces so the global identifier would be relative to something.
I really don't want to go there.

Global identifiers are evil!

So you need specify not only the inode number but also which filesystem
the inode number applies to.  Aka the device number of the appropriate
filesystem as well.

Also please don't forget that modern inode numbers are 64bit not 32bit.
I don't know if that freedom will be used with namespaces or not, but we
need the freedom in a userspace API to make that change without breaking
userspace.

Eric