2016-12-15 18:38:02

by Hari Bathini

[permalink] [raw]
Subject: [PATCH v4 0/3] perf: add support for analyzing events for containers

Currently, there is no trivial mechanism to analyze events based on
containers. perf -G can be used, but it will not filter events for the
containers created after perf is invoked, making it difficult to assess/
analyze performance issues of multiple containers at once.

This patch-set overcomes this limitation by using cgroup identifier as
container unique identifier. A new PERF_RECORD_NAMESPACES event that
records namespaces related info is introduced, from which the cgroup
namespace's device & inode numbers are used as cgroup identifier. This
is based on the assumption that each container is created with it's own
cgroup namespace allowing assessment/analysis of multiple containers
using cgroup identifier.

The first patch introduces PERF_RECORD_NAMESPACES in kernel while the
second patch makes the corresponding changes in perf tool to read this
PERF_RECORD_NAMESPACES events. The third patch adds a cgroup identifier
column in perf report, which contains the cgroup namespace's device and
inode numbers.

Changes from v3:
* Saving device number for each inode.
* cgroup identifier includes device number along with inode number.

---

Hari Bathini (3):
perf: add PERF_RECORD_NAMESPACES to include namespaces related info
perf tool: add PERF_RECORD_NAMESPACES to include namespaces related info
perf tool: add cgroup identifier entry in perf report


include/linux/perf_event.h | 2
include/uapi/linux/perf_event.h | 31 +++++++
kernel/events/core.c | 135 ++++++++++++++++++++++++++++++++
kernel/fork.c | 3 +
kernel/nsproxy.c | 5 +
tools/include/uapi/linux/perf_event.h | 31 +++++++
tools/perf/builtin-annotate.c | 1
tools/perf/builtin-diff.c | 1
tools/perf/builtin-inject.c | 14 +++
tools/perf/builtin-kmem.c | 1
tools/perf/builtin-kvm.c | 2
tools/perf/builtin-lock.c | 1
tools/perf/builtin-mem.c | 1
tools/perf/builtin-record.c | 33 +++++++-
tools/perf/builtin-report.c | 1
tools/perf/builtin-sched.c | 1
tools/perf/builtin-script.c | 41 ++++++++++
tools/perf/builtin-trace.c | 3 -
tools/perf/perf.h | 1
tools/perf/util/Build | 1
tools/perf/util/data-convert-bt.c | 2
tools/perf/util/event.c | 138 ++++++++++++++++++++++++++++++++-
tools/perf/util/event.h | 18 ++++
tools/perf/util/evsel.c | 3 +
tools/perf/util/hist.c | 7 ++
tools/perf/util/hist.h | 1
tools/perf/util/machine.c | 25 ++++++
tools/perf/util/machine.h | 3 +
tools/perf/util/namespaces.c | 27 ++++++
tools/perf/util/namespaces.h | 18 ++++
tools/perf/util/session.c | 7 ++
tools/perf/util/sort.c | 41 ++++++++++
tools/perf/util/sort.h | 7 ++
tools/perf/util/thread.c | 44 ++++++++++-
tools/perf/util/thread.h | 6 +
tools/perf/util/tool.h | 2
36 files changed, 643 insertions(+), 15 deletions(-)
create mode 100644 tools/perf/util/namespaces.c
create mode 100644 tools/perf/util/namespaces.h


2016-12-15 18:38:04

by Hari Bathini

[permalink] [raw]
Subject: [PATCH v4 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info

With the advert of container technologies like docker, that depend
on namespaces for isolation, there is a need for tracing support for
namespaces. This patch introduces new PERF_RECORD_NAMESPACES event
for tracing based on namespaces related info.

Signed-off-by: Hari Bathini <[email protected]>
---
include/linux/perf_event.h | 2 +
include/uapi/linux/perf_event.h | 31 +++++++++
kernel/events/core.c | 135 +++++++++++++++++++++++++++++++++++++++
kernel/fork.c | 3 +
kernel/nsproxy.c | 5 +
5 files changed, 175 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 4741ecd..42d8aa6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1110,6 +1110,7 @@ extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks

extern void perf_event_exec(void);
extern void perf_event_comm(struct task_struct *tsk, bool exec);
+extern void perf_event_namespaces(struct task_struct *tsk);
extern void perf_event_fork(struct task_struct *tsk);

/* Callchains */
@@ -1312,6 +1313,7 @@ static inline int perf_unregister_guest_info_callbacks
static inline void perf_event_mmap(struct vm_area_struct *vma) { }
static inline void perf_event_exec(void) { }
static inline void perf_event_comm(struct task_struct *tsk, bool exec) { }
+static inline void perf_event_namespaces(struct task_struct *tsk) { }
static inline void perf_event_fork(struct task_struct *tsk) { }
static inline void perf_event_init(void) { }
static inline int perf_swevent_get_recursion_context(void) { return -1; }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c66a485..80024f4 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -344,7 +344,8 @@ struct perf_event_attr {
use_clockid : 1, /* use @clockid for time fields */
context_switch : 1, /* context switch data */
write_backward : 1, /* Write ring buffer from end to beginning */
- __reserved_1 : 36;
+ namespaces : 1, /* include namespaces data */
+ __reserved_1 : 35;

union {
__u32 wakeup_events; /* wakeup every n events */
@@ -610,6 +611,23 @@ struct perf_event_header {
__u16 size;
};

+struct perf_ns_link_info {
+ __u64 dev;
+ __u64 ino;
+};
+
+enum {
+ NET_NS_INDEX = 0,
+ UTS_NS_INDEX = 1,
+ IPC_NS_INDEX = 2,
+ PID_NS_INDEX = 3,
+ USER_NS_INDEX = 4,
+ MNT_NS_INDEX = 5,
+ CGROUP_NS_INDEX = 6,
+
+ NAMESPACES_MAX, /* maximum available namespaces */
+};
+
enum perf_event_type {

/*
@@ -862,6 +880,17 @@ enum perf_event_type {
*/
PERF_RECORD_SWITCH_CPU_WIDE = 15,

+ /*
+ * struct {
+ * struct perf_event_header header;
+ * u32 pid;
+ * u32 tid;
+ * struct namespace_link_info link_info[NAMESPACES_MAX];
+ * struct sample_id sample_id;
+ * };
+ */
+ PERF_RECORD_NAMESPACES = 16,
+
PERF_RECORD_MAX, /* non-ABI */
};

diff --git a/kernel/events/core.c b/kernel/events/core.c
index faf073d..c76485b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -46,6 +46,8 @@
#include <linux/filter.h>
#include <linux/namei.h>
#include <linux/parser.h>
+#include <linux/proc_ns.h>
+#include <linux/mount.h>

#include "internal.h"

@@ -375,6 +377,7 @@ static DEFINE_PER_CPU(struct pmu_event_list, pmu_sb_events);

static atomic_t nr_mmap_events __read_mostly;
static atomic_t nr_comm_events __read_mostly;
+static atomic_t nr_namespaces_events __read_mostly;
static atomic_t nr_task_events __read_mostly;
static atomic_t nr_freq_events __read_mostly;
static atomic_t nr_switch_events __read_mostly;
@@ -3882,6 +3885,8 @@ static void unaccount_event(struct perf_event *event)
atomic_dec(&nr_mmap_events);
if (event->attr.comm)
atomic_dec(&nr_comm_events);
+ if (event->attr.namespaces)
+ atomic_dec(&nr_namespaces_events);
if (event->attr.task)
atomic_dec(&nr_task_events);
if (event->attr.freq)
@@ -6382,6 +6387,7 @@ static void perf_event_task(struct task_struct *task,
void perf_event_fork(struct task_struct *task)
{
perf_event_task(task, NULL, 1);
+ perf_event_namespaces(task);
}

/*
@@ -6484,6 +6490,128 @@ void perf_event_comm(struct task_struct *task, bool exec)
}

/*
+ * namespaces tracking
+ */
+
+struct namespaces_event_id {
+ struct perf_event_header header;
+
+ u32 pid;
+ u32 tid;
+ struct perf_ns_link_info link_info[NAMESPACES_MAX];
+};
+
+struct perf_namespaces_event {
+ struct task_struct *task;
+
+ struct namespaces_event_id event_id;
+};
+
+static int perf_event_namespaces_match(struct perf_event *event)
+{
+ return event->attr.namespaces;
+}
+
+static void perf_fill_ns_link_info(struct perf_ns_link_info *ns_link_info,
+ struct task_struct *task,
+ const struct proc_ns_operations *ns_ops)
+{
+ struct path ns_path;
+ struct inode *ns_inode;
+ void *error;
+
+ error = ns_get_path(&ns_path, task, ns_ops);
+ if (!error) {
+ ns_inode = ns_path.dentry->d_inode;
+ ns_link_info->dev = new_encode_dev(ns_inode->i_sb->s_dev);
+ ns_link_info->ino = ns_inode->i_ino;
+ }
+}
+
+static void perf_event_namespaces_output(struct perf_event *event,
+ void *data)
+{
+ struct perf_namespaces_event *namespaces_event = data;
+ struct perf_output_handle handle;
+ struct perf_sample_data sample;
+ struct namespaces_event_id *ei;
+ struct task_struct *task = namespaces_event->task;
+ int ret;
+
+ if (!perf_event_namespaces_match(event))
+ return;
+
+ ei = &namespaces_event->event_id;
+ perf_event_header__init_id(&ei->header, &sample, event);
+ ret = perf_output_begin(&handle, event, ei->header.size);
+ if (ret)
+ return;
+
+ ei->pid = perf_event_pid(event, task);
+ ei->tid = perf_event_tid(event, task);
+
+ perf_fill_ns_link_info(&ei->link_info[MNT_NS_INDEX],
+ task, &mntns_operations);
+
+#ifdef CONFIG_USER_NS
+ perf_fill_ns_link_info(&ei->link_info[USER_NS_INDEX],
+ task, &userns_operations);
+#endif
+#ifdef CONFIG_NET_NS
+ perf_fill_ns_link_info(&ei->link_info[NET_NS_INDEX],
+ task, &netns_operations);
+#endif
+#ifdef CONFIG_UTS_NS
+ perf_fill_ns_link_info(&ei->link_info[UTS_NS_INDEX],
+ task, &utsns_operations);
+#endif
+#ifdef CONFIG_IPC_NS
+ perf_fill_ns_link_info(&ei->link_info[IPC_NS_INDEX],
+ task, &ipcns_operations);
+#endif
+#ifdef CONFIG_PID_NS
+ perf_fill_ns_link_info(&ei->link_info[PID_NS_INDEX],
+ task, &pidns_operations);
+#endif
+#ifdef CONFIG_CGROUPS
+ perf_fill_ns_link_info(&ei->link_info[CGROUP_NS_INDEX],
+ task, &cgroupns_operations);
+#endif
+
+ perf_output_put(&handle, namespaces_event->event_id);
+
+ perf_event__output_id_sample(event, &handle, &sample);
+
+ perf_output_end(&handle);
+}
+
+void perf_event_namespaces(struct task_struct *task)
+{
+ struct perf_namespaces_event namespaces_event;
+
+ if (!atomic_read(&nr_namespaces_events))
+ return;
+
+ namespaces_event = (struct perf_namespaces_event){
+ .task = task,
+ .event_id = {
+ .header = {
+ .type = PERF_RECORD_NAMESPACES,
+ .misc = 0,
+ .size = sizeof(namespaces_event.event_id),
+ },
+ /* .pid */
+ /* .tid */
+ /* .link_info[NAMESPACES_MAX] */
+ },
+ };
+
+ perf_iterate_sb(perf_event_namespaces_output,
+ &namespaces_event,
+ NULL);
+}
+
+/*
* mmap tracking
*/

@@ -9028,6 +9156,8 @@ static void account_event(struct perf_event *event)
atomic_inc(&nr_mmap_events);
if (event->attr.comm)
atomic_inc(&nr_comm_events);
+ if (event->attr.namespaces)
+ atomic_inc(&nr_namespaces_events);
if (event->attr.task)
atomic_inc(&nr_task_events);
if (event->attr.freq)
@@ -9542,6 +9672,11 @@ SYSCALL_DEFINE5(perf_event_open,
return -EACCES;
}

+ if (attr.namespaces) {
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
+ }
+
if (attr.freq) {
if (attr.sample_freq > sysctl_perf_event_sample_rate)
return -EINVAL;
diff --git a/kernel/fork.c b/kernel/fork.c
index 869b8cc..c7b71f0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2289,6 +2289,9 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
free_fs_struct(new_fs);

bad_unshare_out:
+ if (!err)
+ perf_event_namespaces(current);
+
return err;
}

diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 782102e..4c25e6e 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -26,6 +26,7 @@
#include <linux/file.h>
#include <linux/syscalls.h>
#include <linux/cgroup.h>
+#include <linux/perf_event.h>

static struct kmem_cache *nsproxy_cachep;

@@ -264,6 +265,10 @@ SYSCALL_DEFINE2(setns, int, fd, int, nstype)
switch_task_namespaces(tsk, new_nsproxy);
out:
fput(file);
+
+ if (!err)
+ perf_event_namespaces(tsk);
+
return err;
}


2016-12-15 18:38:27

by Hari Bathini

[permalink] [raw]
Subject: [PATCH v4 3/3] perf tool: add cgroup identifier entry in perf report

This patch introduces a cgroup identifier entry field in perf report to
identify or distinguish data of different cgroups. It uses the device
number and inode number of cgroup namespace, included in perf data with
the new PERF_RECORD_NAMESPACES event, as cgroup identifier. With the
assumption that each container is created with it's own cgroup namespace,
this allows assessment/analysis of multiple containers at once.

Shown below is the output of perf report, sorted based on cgroup id, on
a system that was running three containers at the time of perf record
and clearly showing one of the containers' considerable use of kernel
memory in comparison with others:


$ perf report -s cgroup_id,sample --stdio
#
# Total Lost Samples: 0
#
# Samples: 2K of event 'kmem:kmalloc'
# Event count (approx.): 2176
#
# Overhead cgroup id (dev/inode) Samples
# ........ ..................... ............
#
89.20% 3/0xf00000cf 1941
3.31% 3/0xeffffffb 72
3.08% 3/0xf00000d0 67
2.25% 3/0xf00000d1 49
2.16% 0/0x0 47

Signed-off-by: Hari Bathini <[email protected]>
---
tools/perf/util/hist.c | 7 +++++++
tools/perf/util/hist.h | 1 +
tools/perf/util/sort.c | 41 +++++++++++++++++++++++++++++++++++++++++
tools/perf/util/sort.h | 7 +++++++
4 files changed, 56 insertions(+)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 6770a96..9996e0c 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -2,6 +2,7 @@
#include "build-id.h"
#include "hist.h"
#include "session.h"
+#include "namespaces.h"
#include "sort.h"
#include "evlist.h"
#include "evsel.h"
@@ -168,6 +169,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
hists__set_unres_dso_col_len(hists, HISTC_MEM_DADDR_DSO);
}

+ hists__new_col_len(hists, HISTC_CGROUP_ID, 20);
hists__new_col_len(hists, HISTC_CPU, 3);
hists__new_col_len(hists, HISTC_SOCKET, 6);
hists__new_col_len(hists, HISTC_MEM_LOCKED, 6);
@@ -573,9 +575,14 @@ __hists__add_entry(struct hists *hists,
bool sample_self,
struct hist_entry_ops *ops)
{
+ struct namespaces *ns = thread__namespaces(al->thread);
struct hist_entry entry = {
.thread = al->thread,
.comm = thread__comm(al->thread),
+ .cgroup_id = {
+ .dev = ns ? ns->link_info[CGROUP_NS_INDEX].dev : 0,
+ .ino = ns ? ns->link_info[CGROUP_NS_INDEX].ino : 0,
+ },
.ms = {
.map = al->map,
.sym = al->sym,
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index d4b6514..d7696fd 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -30,6 +30,7 @@ enum hist_column {
HISTC_DSO,
HISTC_THREAD,
HISTC_COMM,
+ HISTC_CGROUP_ID,
HISTC_PARENT,
HISTC_CPU,
HISTC_SOCKET,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index df622f4..9f5f404 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -536,6 +536,46 @@ struct sort_entry sort_cpu = {
.se_width_idx = HISTC_CPU,
};

+/* --sort cgroup_id */
+
+static int64_t _sort__cgroup_dev_cmp(u64 left_dev, u64 right_dev)
+{
+ return (int64_t)(right_dev - left_dev);
+}
+
+static int64_t _sort__cgroup_inode_cmp(u64 left_ino, u64 right_ino)
+{
+ return (int64_t)(right_ino - left_ino);
+}
+
+static int64_t
+sort__cgroup_id_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+ int64_t ret;
+
+ ret = _sort__cgroup_dev_cmp(right->cgroup_id.dev, left->cgroup_id.dev);
+ if (ret != 0)
+ return ret;
+
+ return _sort__cgroup_inode_cmp(right->cgroup_id.ino,
+ left->cgroup_id.ino);
+}
+
+static int hist_entry__cgroup_id_snprintf(struct hist_entry *he,
+ char *bf, size_t size,
+ unsigned int width __maybe_unused)
+{
+ return repsep_snprintf(bf, size, "%lu/0x%lx", he->cgroup_id.dev,
+ he->cgroup_id.ino);
+}
+
+struct sort_entry sort_cgroup_id = {
+ .se_header = "cgroup id (dev/inode)",
+ .se_cmp = sort__cgroup_id_cmp,
+ .se_snprintf = hist_entry__cgroup_id_snprintf,
+ .se_width_idx = HISTC_CGROUP_ID,
+};
+
/* --sort socket */

static int64_t
@@ -1418,6 +1458,7 @@ static struct sort_dimension common_sort_dimensions[] = {
DIM(SORT_GLOBAL_WEIGHT, "weight", sort_global_weight),
DIM(SORT_TRANSACTION, "transaction", sort_transaction),
DIM(SORT_TRACE, "trace", sort_trace),
+ DIM(SORT_CGROUP_ID, "cgroup_id", sort_cgroup_id),
};

#undef DIM
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 7aff317..68a5abb 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -54,6 +54,11 @@ struct he_stat {
u32 nr_events;
};

+struct namespace_id {
+ u64 dev;
+ u64 ino;
+};
+
struct hist_entry_diff {
bool computed;
union {
@@ -91,6 +96,7 @@ struct hist_entry {
struct map_symbol ms;
struct thread *thread;
struct comm *comm;
+ struct namespace_id cgroup_id;
u64 ip;
u64 transaction;
s32 socket;
@@ -211,6 +217,7 @@ enum sort_type {
SORT_GLOBAL_WEIGHT,
SORT_TRANSACTION,
SORT_TRACE,
+ SORT_CGROUP_ID,

/* branch stack specific sort keys */
__SORT_BRANCH_STACK,

2016-12-15 18:38:25

by Hari Bathini

[permalink] [raw]
Subject: [PATCH v4 2/3] perf tool: add PERF_RECORD_NAMESPACES to include namespaces related info

This patch updates perf tool to examine PERF_RECORD_NAMESPACES events
emitted by the kernel when fork, clone, setns or unshare are invoked.
Also, it synthesizes PERF_RECORD_NAMESPACES events for processes that
were running prior to invocation of perf record, the data for which
is taken from /proc/$PID/ns. These changes make way for analyzing
events with regard to namespaces.

Signed-off-by: Hari Bathini <[email protected]>
---
tools/include/uapi/linux/perf_event.h | 31 +++++++
tools/perf/builtin-annotate.c | 1
tools/perf/builtin-diff.c | 1
tools/perf/builtin-inject.c | 14 +++
tools/perf/builtin-kmem.c | 1
tools/perf/builtin-kvm.c | 2
tools/perf/builtin-lock.c | 1
tools/perf/builtin-mem.c | 1
tools/perf/builtin-record.c | 33 +++++++-
tools/perf/builtin-report.c | 1
tools/perf/builtin-sched.c | 1
tools/perf/builtin-script.c | 41 ++++++++++
tools/perf/builtin-trace.c | 3 -
tools/perf/perf.h | 1
tools/perf/util/Build | 1
tools/perf/util/data-convert-bt.c | 2
tools/perf/util/event.c | 138 ++++++++++++++++++++++++++++++++-
tools/perf/util/event.h | 18 ++++
tools/perf/util/evsel.c | 3 +
tools/perf/util/machine.c | 25 ++++++
tools/perf/util/machine.h | 3 +
tools/perf/util/namespaces.c | 27 ++++++
tools/perf/util/namespaces.h | 18 ++++
tools/perf/util/session.c | 7 ++
tools/perf/util/thread.c | 44 ++++++++++-
tools/perf/util/thread.h | 6 +
tools/perf/util/tool.h | 2
27 files changed, 412 insertions(+), 14 deletions(-)
create mode 100644 tools/perf/util/namespaces.c
create mode 100644 tools/perf/util/namespaces.h

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index c66a485..80024f4 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -344,7 +344,8 @@ struct perf_event_attr {
use_clockid : 1, /* use @clockid for time fields */
context_switch : 1, /* context switch data */
write_backward : 1, /* Write ring buffer from end to beginning */
- __reserved_1 : 36;
+ namespaces : 1, /* include namespaces data */
+ __reserved_1 : 35;

union {
__u32 wakeup_events; /* wakeup every n events */
@@ -610,6 +611,23 @@ struct perf_event_header {
__u16 size;
};

+struct perf_ns_link_info {
+ __u64 dev;
+ __u64 ino;
+};
+
+enum {
+ NET_NS_INDEX = 0,
+ UTS_NS_INDEX = 1,
+ IPC_NS_INDEX = 2,
+ PID_NS_INDEX = 3,
+ USER_NS_INDEX = 4,
+ MNT_NS_INDEX = 5,
+ CGROUP_NS_INDEX = 6,
+
+ NAMESPACES_MAX, /* maximum available namespaces */
+};
+
enum perf_event_type {

/*
@@ -862,6 +880,17 @@ enum perf_event_type {
*/
PERF_RECORD_SWITCH_CPU_WIDE = 15,

+ /*
+ * struct {
+ * struct perf_event_header header;
+ * u32 pid;
+ * u32 tid;
+ * struct namespace_link_info link_info[NAMESPACES_MAX];
+ * struct sample_id sample_id;
+ * };
+ */
+ PERF_RECORD_NAMESPACES = 16,
+
PERF_RECORD_MAX, /* non-ABI */
};

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index ebb6283..1b63dc4 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -393,6 +393,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused)
.comm = perf_event__process_comm,
.exit = perf_event__process_exit,
.fork = perf_event__process_fork,
+ .namespaces = perf_event__process_namespaces,
.ordered_events = true,
.ordering_requires_timestamps = true,
},
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 9ff0db4..c52552f 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -354,6 +354,7 @@ static struct perf_tool tool = {
.exit = perf_event__process_exit,
.fork = perf_event__process_fork,
.lost = perf_event__process_lost,
+ .namespaces = perf_event__process_namespaces,
.ordered_events = true,
.ordering_requires_timestamps = true,
};
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index b9bc7e3..c5ddc73 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -333,6 +333,19 @@ static int perf_event__repipe_comm(struct perf_tool *tool,
return err;
}

+static int perf_event__repipe_namespaces(struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine)
+{
+ int err;
+
+ err = perf_event__process_namespaces(tool, event, sample, machine);
+ perf_event__repipe(tool, event, sample, machine);
+
+ return err;
+}
+
static int perf_event__repipe_exit(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
@@ -660,6 +673,7 @@ static int __cmd_inject(struct perf_inject *inject)
session->itrace_synth_opts = &inject->itrace_synth_opts;
inject->itrace_synth_opts.inject = true;
inject->tool.comm = perf_event__repipe_comm;
+ inject->tool.namespaces = perf_event__repipe_namespaces;
inject->tool.exit = perf_event__repipe_exit;
inject->tool.id_index = perf_event__repipe_id_index;
inject->tool.auxtrace_info = perf_event__process_auxtrace_info;
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 35a02f8..862000c 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -965,6 +965,7 @@ static struct perf_tool perf_kmem = {
.comm = perf_event__process_comm,
.mmap = perf_event__process_mmap,
.mmap2 = perf_event__process_mmap2,
+ .namespaces = perf_event__process_namespaces,
.ordered_events = true,
};

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 08fa88f..18e6c38 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1044,6 +1044,7 @@ static int read_events(struct perf_kvm_stat *kvm)
struct perf_tool eops = {
.sample = process_sample_event,
.comm = perf_event__process_comm,
+ .namespaces = perf_event__process_namespaces,
.ordered_events = true,
};
struct perf_data_file file = {
@@ -1348,6 +1349,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
kvm->tool.exit = perf_event__process_exit;
kvm->tool.fork = perf_event__process_fork;
kvm->tool.lost = process_lost_event;
+ kvm->tool.namespaces = perf_event__process_namespaces;
kvm->tool.ordered_events = true;
perf_tool__fill_defaults(&kvm->tool);

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index ce3bfb4..d750cca 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -858,6 +858,7 @@ static int __cmd_report(bool display_info)
struct perf_tool eops = {
.sample = process_sample_event,
.comm = perf_event__process_comm,
+ .namespaces = perf_event__process_namespaces,
.ordered_events = true,
};
struct perf_data_file file = {
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index d1ce29b..da55056 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -342,6 +342,7 @@ int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused)
.lost = perf_event__process_lost,
.fork = perf_event__process_fork,
.build_id = perf_event__process_build_id,
+ .namespaces = perf_event__process_namespaces,
.ordered_events = true,
},
.input_name = "perf.data",
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index fa26865..e46a987 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -842,6 +842,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
signal(SIGTERM, sig_handler);
signal(SIGSEGV, sigsegv_handler);

+ if (rec->opts.record_namespaces)
+ tool->namespace_events = true;
+
if (rec->opts.auxtrace_snapshot_mode || rec->switch_output) {
signal(SIGUSR2, snapshot_sig_handler);
if (rec->opts.auxtrace_snapshot_mode)
@@ -949,6 +952,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
*/
if (forks) {
union perf_event *event;
+ pid_t tgid;

event = malloc(sizeof(event->comm) + machine->id_hdr_size);
if (event == NULL) {
@@ -962,10 +966,28 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
* cannot see a correct process name for those events.
* Synthesize COMM event to prevent it.
*/
- perf_event__synthesize_comm(tool, event,
- rec->evlist->workload.pid,
- process_synthesized_event,
- machine);
+ tgid = perf_event__synthesize_comm(tool, event,
+ rec->evlist->workload.pid,
+ process_synthesized_event,
+ machine);
+ free(event);
+
+ if (tgid == -1)
+ goto out_child;
+
+ event = malloc(sizeof(event->namespaces) + machine->id_hdr_size);
+ if (event == NULL) {
+ err = -ENOMEM;
+ goto out_child;
+ }
+
+ /*
+ * Synthesize NAMESPACES event for the command specified.
+ */
+ perf_event__synthesize_namespaces(tool, event,
+ rec->evlist->workload.pid,
+ tgid, process_synthesized_event,
+ machine);
free(event);

perf_evlist__start_workload(rec->evlist);
@@ -1387,6 +1409,7 @@ static struct record record = {
.fork = perf_event__process_fork,
.exit = perf_event__process_exit,
.comm = perf_event__process_comm,
+ .namespaces = perf_event__process_namespaces,
.mmap = perf_event__process_mmap,
.mmap2 = perf_event__process_mmap2,
.ordered_events = true,
@@ -1501,6 +1524,8 @@ struct option __record_options[] = {
"opts", "AUX area tracing Snapshot Mode", ""),
OPT_UINTEGER(0, "proc-map-timeout", &record.opts.proc_map_timeout,
"per thread proc mmap processing timeout in ms"),
+ OPT_BOOLEAN(0, "namespaces", &record.opts.record_namespaces,
+ "Record namespaces events"),
OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
"Record context switch events"),
OPT_BOOLEAN_FLAG(0, "all-kernel", &record.opts.all_kernel,
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index d2afbe4..5256767 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -694,6 +694,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
.mmap = perf_event__process_mmap,
.mmap2 = perf_event__process_mmap2,
.comm = perf_event__process_comm,
+ .namespaces = perf_event__process_namespaces,
.exit = perf_event__process_exit,
.fork = perf_event__process_fork,
.lost = perf_event__process_lost,
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 1a3f1be..8800924 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2966,6 +2966,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
.tool = {
.sample = perf_sched__process_tracepoint_sample,
.comm = perf_event__process_comm,
+ .namespaces = perf_event__process_namespaces,
.lost = perf_event__process_lost,
.fork = perf_sched__process_fork_event,
.ordered_events = true,
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 2f3ff69..32af2d3 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -830,6 +830,7 @@ struct perf_script {
bool show_task_events;
bool show_mmap_events;
bool show_switch_events;
+ bool show_namespaces_events;
bool allocated;
struct cpu_map *cpus;
struct thread_map *threads;
@@ -1118,6 +1119,41 @@ static int process_comm_event(struct perf_tool *tool,
return ret;
}

+static int process_namespaces_event(struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine)
+{
+ struct thread *thread;
+ struct perf_script *script = container_of(tool, struct perf_script, tool);
+ struct perf_session *session = script->session;
+ struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
+ int ret = -1;
+
+ thread = machine__findnew_thread(machine, event->namespaces.pid,
+ event->namespaces.tid);
+ if (thread == NULL) {
+ pr_debug("problem processing NAMESPACES event, skipping it.\n");
+ return -1;
+ }
+
+ if (perf_event__process_namespaces(tool, event, sample, machine) < 0)
+ goto out;
+
+ if (!evsel->attr.sample_id_all) {
+ sample->cpu = 0;
+ sample->time = 0;
+ sample->tid = event->namespaces.tid;
+ sample->pid = event->namespaces.pid;
+ }
+ print_sample_start(sample, thread, evsel);
+ perf_event__fprintf(event, stdout);
+ ret = 0;
+out:
+ thread__put(thread);
+ return ret;
+}
+
static int process_fork_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
@@ -1293,6 +1329,8 @@ static int __cmd_script(struct perf_script *script)
}
if (script->show_switch_events)
script->tool.context_switch = process_switch_event;
+ if (script->show_namespaces_events)
+ script->tool.namespaces = process_namespaces_event;

ret = perf_session__process_events(script->session);

@@ -2097,6 +2135,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
.mmap = perf_event__process_mmap,
.mmap2 = perf_event__process_mmap2,
.comm = perf_event__process_comm,
+ .namespaces = perf_event__process_namespaces,
.exit = perf_event__process_exit,
.fork = perf_event__process_fork,
.attr = process_attr,
@@ -2180,6 +2219,8 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
"Show the mmap events"),
OPT_BOOLEAN('\0', "show-switch-events", &script.show_switch_events,
"Show context switch events (if recorded)"),
+ OPT_BOOLEAN('\0', "show-namespaces-events", &script.show_namespaces_events,
+ "Show namespaces events (if recorded)"),
OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
OPT_BOOLEAN(0, "ns", &nanosecs,
"Use 9 decimal places when displaying time"),
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 206bf72..fc1d8fa 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -2414,8 +2414,9 @@ static int trace__replay(struct trace *trace)
trace->tool.exit = perf_event__process_exit;
trace->tool.fork = perf_event__process_fork;
trace->tool.attr = perf_event__process_attr;
- trace->tool.tracing_data = perf_event__process_tracing_data;
+ trace->tool.tracing_data = perf_event__process_tracing_data;
trace->tool.build_id = perf_event__process_build_id;
+ trace->tool.namespaces = perf_event__process_namespaces;

trace->tool.ordered_events = true;
trace->tool.ordering_requires_timestamps = true;
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 9a0236a..867e732 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -50,6 +50,7 @@ struct record_opts {
bool running_time;
bool full_auxtrace;
bool auxtrace_snapshot_mode;
+ bool record_namespaces;
bool record_switch_events;
bool all_kernel;
bool all_user;
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 3840e3a..59eb10d 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -42,6 +42,7 @@ libperf-y += pstack.o
libperf-y += session.o
libperf-$(CONFIG_AUDIT) += syscalltbl.o
libperf-y += ordered-events.o
+libperf-y += namespaces.o
libperf-y += comm.o
libperf-y += thread.o
libperf-y += thread_map.o
diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 7123f4d..1fcacf1 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1468,6 +1468,7 @@ int bt_convert__perf2ctf(const char *input, const char *path,
.lost = perf_event__process_lost,
.tracing_data = perf_event__process_tracing_data,
.build_id = perf_event__process_build_id,
+ .namespaces = perf_event__process_namespaces,
.ordered_events = true,
.ordering_requires_timestamps = true,
},
@@ -1479,6 +1480,7 @@ int bt_convert__perf2ctf(const char *input, const char *path,
c.tool.comm = process_comm_event;
c.tool.exit = process_exit_event;
c.tool.fork = process_fork_event;
+ c.tool.namespaces = process_namespaces_event;
}

perf_config(convert__config, &c);
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 8ab0d7d..3c80f8a 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -31,6 +31,7 @@ static const char *perf_event__names[] = {
[PERF_RECORD_LOST_SAMPLES] = "LOST_SAMPLES",
[PERF_RECORD_SWITCH] = "SWITCH",
[PERF_RECORD_SWITCH_CPU_WIDE] = "SWITCH_CPU_WIDE",
+ [PERF_RECORD_NAMESPACES] = "NAMESPACES",
[PERF_RECORD_HEADER_ATTR] = "ATTR",
[PERF_RECORD_HEADER_EVENT_TYPE] = "EVENT_TYPE",
[PERF_RECORD_HEADER_TRACING_DATA] = "TRACING_DATA",
@@ -203,6 +204,70 @@ pid_t perf_event__synthesize_comm(struct perf_tool *tool,
return tgid;
}

+static void perf_event__get_ns_link_info(pid_t pid, const char *ns,
+ struct perf_ns_link_info *ns_link_info)
+{
+ struct stat64 st;
+ char proc_ns[128];
+
+ sprintf(proc_ns, "/proc/%u/ns/%s", pid, ns);
+ if (stat64(proc_ns, &st) == 0) {
+ ns_link_info->dev = st.st_dev;
+ ns_link_info->ino = st.st_ino;
+ }
+}
+
+int perf_event__synthesize_namespaces(struct perf_tool *tool,
+ union perf_event *event,
+ pid_t pid, pid_t tgid,
+ perf_event__handler_t process,
+ struct machine *machine)
+{
+ struct perf_ns_link_info *ns_link_info;
+
+ if (!tool->namespace_events)
+ return 0;
+
+ memset(&event->namespaces, 0,
+ sizeof(event->namespaces) + machine->id_hdr_size);
+
+ event->namespaces.pid = tgid;
+ event->namespaces.tid = pid;
+
+ ns_link_info = event->namespaces.link_info;
+
+ perf_event__get_ns_link_info(pid, "mnt",
+ &ns_link_info[MNT_NS_INDEX]);
+
+ perf_event__get_ns_link_info(pid, "net",
+ &ns_link_info[NET_NS_INDEX]);
+
+ perf_event__get_ns_link_info(pid, "uts",
+ &ns_link_info[UTS_NS_INDEX]);
+
+ perf_event__get_ns_link_info(pid, "ipc",
+ &ns_link_info[IPC_NS_INDEX]);
+
+ perf_event__get_ns_link_info(pid, "pid",
+ &ns_link_info[PID_NS_INDEX]);
+
+ perf_event__get_ns_link_info(pid, "user",
+ &ns_link_info[USER_NS_INDEX]);
+
+ perf_event__get_ns_link_info(pid, "cgroup",
+ &ns_link_info[CGROUP_NS_INDEX]);
+
+ event->namespaces.header.type = PERF_RECORD_NAMESPACES;
+
+ event->namespaces.header.size = (sizeof(event->namespaces) +
+ machine->id_hdr_size);
+
+ if (perf_tool__process_synth_event(tool, event, machine, process) != 0)
+ return -1;
+
+ return 0;
+}
+
static int perf_event__synthesize_fork(struct perf_tool *tool,
union perf_event *event,
pid_t pid, pid_t tgid, pid_t ppid,
@@ -434,8 +499,9 @@ int perf_event__synthesize_modules(struct perf_tool *tool,
static int __event__synthesize_thread(union perf_event *comm_event,
union perf_event *mmap_event,
union perf_event *fork_event,
+ union perf_event *namespaces_event,
pid_t pid, int full,
- perf_event__handler_t process,
+ perf_event__handler_t process,
struct perf_tool *tool,
struct machine *machine,
bool mmap_data,
@@ -455,6 +521,11 @@ static int __event__synthesize_thread(union perf_event *comm_event,
if (tgid == -1)
return -1;

+ if (perf_event__synthesize_namespaces(tool, namespaces_event, pid,
+ tgid, process, machine) < 0)
+ return -1;
+
+
return perf_event__synthesize_mmap_events(tool, mmap_event, pid, tgid,
process, machine, mmap_data,
proc_map_timeout);
@@ -488,6 +559,11 @@ static int __event__synthesize_thread(union perf_event *comm_event,
if (perf_event__synthesize_fork(tool, fork_event, _pid, tgid,
ppid, process, machine) < 0)
break;
+
+ if (perf_event__synthesize_namespaces(tool, namespaces_event, _pid,
+ tgid, process, machine) < 0)
+ break;
+
/*
* Send the prepared comm event
*/
@@ -516,6 +592,7 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
unsigned int proc_map_timeout)
{
union perf_event *comm_event, *mmap_event, *fork_event;
+ union perf_event *namespaces_event;
int err = -1, thread, j;

comm_event = malloc(sizeof(comm_event->comm) + machine->id_hdr_size);
@@ -530,10 +607,15 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
if (fork_event == NULL)
goto out_free_mmap;

+ namespaces_event = malloc(sizeof(namespaces_event->namespaces) +
+ machine->id_hdr_size);
+ if (namespaces_event == NULL)
+ goto out_free_fork;
+
err = 0;
for (thread = 0; thread < threads->nr; ++thread) {
if (__event__synthesize_thread(comm_event, mmap_event,
- fork_event,
+ fork_event, namespaces_event,
thread_map__pid(threads, thread), 0,
process, tool, machine,
mmap_data, proc_map_timeout)) {
@@ -559,7 +641,7 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
/* if not, generate events for it */
if (need_leader &&
__event__synthesize_thread(comm_event, mmap_event,
- fork_event,
+ fork_event, namespaces_event,
comm_event->comm.pid, 0,
process, tool, machine,
mmap_data, proc_map_timeout)) {
@@ -568,6 +650,8 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
}
}
}
+ free(namespaces_event);
+out_free_fork:
free(fork_event);
out_free_mmap:
free(mmap_event);
@@ -587,6 +671,7 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
char proc_path[PATH_MAX];
struct dirent *dirent;
union perf_event *comm_event, *mmap_event, *fork_event;
+ union perf_event *namespaces_event;
int err = -1;

if (machine__is_default_guest(machine))
@@ -604,11 +689,16 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
if (fork_event == NULL)
goto out_free_mmap;

+ namespaces_event = malloc(sizeof(namespaces_event->namespaces) +
+ machine->id_hdr_size);
+ if (namespaces_event == NULL)
+ goto out_free_fork;
+
snprintf(proc_path, sizeof(proc_path), "%s/proc", machine->root_dir);
proc = opendir(proc_path);

if (proc == NULL)
- goto out_free_fork;
+ goto out_free_namespaces;

while ((dirent = readdir(proc)) != NULL) {
char *end;
@@ -620,13 +710,16 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
* We may race with exiting thread, so don't stop just because
* one thread couldn't be synthesized.
*/
- __event__synthesize_thread(comm_event, mmap_event, fork_event, pid,
- 1, process, tool, machine, mmap_data,
+ __event__synthesize_thread(comm_event, mmap_event, fork_event,
+ namespaces_event, pid, 1, process,
+ tool, machine, mmap_data,
proc_map_timeout);
}

err = 0;
closedir(proc);
+out_free_namespaces:
+ free(namespaces_event);
out_free_fork:
free(fork_event);
out_free_mmap:
@@ -1008,6 +1101,28 @@ size_t perf_event__fprintf_comm(union perf_event *event, FILE *fp)
return fprintf(fp, "%s: %s:%d/%d\n", s, event->comm.comm, event->comm.pid, event->comm.tid);
}

+size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp)
+{
+ return fprintf(fp, " %d/%d - [cgroup: %lu/0x%lx, ipc: %lu/0x%lx,"
+ " mnt: %lu/0x%lx, net: %lu/0x%lx, pid: %lu/0x%lx,"
+ " user: %lu/0x%lx, uts: %lu/0x%lx]\n\n",
+ event->namespaces.pid, event->namespaces.tid,
+ (u64)event->namespaces.link_info[CGROUP_NS_INDEX].dev,
+ (u64)event->namespaces.link_info[CGROUP_NS_INDEX].ino,
+ (u64)event->namespaces.link_info[IPC_NS_INDEX].dev,
+ (u64)event->namespaces.link_info[IPC_NS_INDEX].ino,
+ (u64)event->namespaces.link_info[MNT_NS_INDEX].dev,
+ (u64)event->namespaces.link_info[MNT_NS_INDEX].ino,
+ (u64)event->namespaces.link_info[NET_NS_INDEX].dev,
+ (u64)event->namespaces.link_info[NET_NS_INDEX].ino,
+ (u64)event->namespaces.link_info[PID_NS_INDEX].dev,
+ (u64)event->namespaces.link_info[PID_NS_INDEX].ino,
+ (u64)event->namespaces.link_info[USER_NS_INDEX].dev,
+ (u64)event->namespaces.link_info[USER_NS_INDEX].ino,
+ (u64)event->namespaces.link_info[UTS_NS_INDEX].dev,
+ (u64)event->namespaces.link_info[UTS_NS_INDEX].ino);
+}
+
int perf_event__process_comm(struct perf_tool *tool __maybe_unused,
union perf_event *event,
struct perf_sample *sample,
@@ -1016,6 +1131,14 @@ int perf_event__process_comm(struct perf_tool *tool __maybe_unused,
return machine__process_comm_event(machine, event, sample);
}

+int perf_event__process_namespaces(struct perf_tool *tool __maybe_unused,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine)
+{
+ return machine__process_namespaces_event(machine, event, sample);
+}
+
int perf_event__process_lost(struct perf_tool *tool __maybe_unused,
union perf_event *event,
struct perf_sample *sample,
@@ -1196,6 +1319,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
case PERF_RECORD_MMAP:
ret += perf_event__fprintf_mmap(event, fp);
break;
+ case PERF_RECORD_NAMESPACES:
+ ret += perf_event__fprintf_namespaces(event, fp);
+ break;
case PERF_RECORD_MMAP2:
ret += perf_event__fprintf_mmap2(event, fp);
break;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index c735c53..82effd9 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -39,6 +39,12 @@ struct comm_event {
char comm[16];
};

+struct namespaces_event {
+ struct perf_event_header header;
+ u32 pid, tid;
+ struct perf_ns_link_info link_info[NAMESPACES_MAX];
+};
+
struct fork_event {
struct perf_event_header header;
u32 pid, ppid;
@@ -485,6 +491,7 @@ union perf_event {
struct mmap_event mmap;
struct mmap2_event mmap2;
struct comm_event comm;
+ struct namespaces_event namespaces;
struct fork_event fork;
struct lost_event lost;
struct lost_samples_event lost_samples;
@@ -587,6 +594,10 @@ int perf_event__process_switch(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
struct machine *machine);
+int perf_event__process_namespaces(struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine);
int perf_event__process_mmap(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
@@ -636,6 +647,12 @@ pid_t perf_event__synthesize_comm(struct perf_tool *tool,
perf_event__handler_t process,
struct machine *machine);

+int perf_event__synthesize_namespaces(struct perf_tool *tool,
+ union perf_event *event,
+ pid_t pid, pid_t tgid,
+ perf_event__handler_t process,
+ struct machine *machine);
+
int perf_event__synthesize_mmap_events(struct perf_tool *tool,
union perf_event *event,
pid_t pid, pid_t tgid,
@@ -653,6 +670,7 @@ size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_cpu_map(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
size_t perf_event__fprintf(union perf_event *event, FILE *fp);

u64 kallsyms__get_function_start(const char *kallsyms_filename,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index b2365a63..667f6158 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -932,6 +932,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
attr->mmap2 = track && !perf_missing_features.mmap2;
attr->comm = track;

+ if (opts->record_namespaces)
+ attr->namespaces = track;
+
if (opts->record_switch_events)
attr->context_switch = track;

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 9b33bef..850ff8b 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -482,6 +482,29 @@ int machine__process_comm_event(struct machine *machine, union perf_event *event
return err;
}

+int machine__process_namespaces_event(struct machine *machine __maybe_unused,
+ union perf_event *event,
+ struct perf_sample *sample __maybe_unused)
+{
+ struct thread *thread = machine__findnew_thread(machine,
+ event->namespaces.pid,
+ event->namespaces.tid);
+ int err = 0;
+
+ if (dump_trace)
+ perf_event__fprintf_namespaces(event, stdout);
+
+ if (thread == NULL ||
+ thread__set_namespaces(thread, sample->time, &event->namespaces)) {
+ dump_printf("problem processing PERF_RECORD_NAMESPACES, skipping event.\n");
+ err = -1;
+ }
+
+ thread__put(thread);
+
+ return err;
+}
+
int machine__process_lost_event(struct machine *machine __maybe_unused,
union perf_event *event, struct perf_sample *sample __maybe_unused)
{
@@ -1519,6 +1542,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
ret = machine__process_comm_event(machine, event, sample); break;
case PERF_RECORD_MMAP:
ret = machine__process_mmap_event(machine, event, sample); break;
+ case PERF_RECORD_NAMESPACES:
+ ret = machine__process_namespaces_event(machine, event, sample); break;
case PERF_RECORD_MMAP2:
ret = machine__process_mmap2_event(machine, event, sample); break;
case PERF_RECORD_FORK:
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 354de6e..e494368 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -97,6 +97,9 @@ int machine__process_itrace_start_event(struct machine *machine,
union perf_event *event);
int machine__process_switch_event(struct machine *machine,
union perf_event *event);
+int machine__process_namespaces_event(struct machine *machine,
+ union perf_event *event,
+ struct perf_sample *sample);
int machine__process_mmap_event(struct machine *machine, union perf_event *event,
struct perf_sample *sample);
int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
diff --git a/tools/perf/util/namespaces.c b/tools/perf/util/namespaces.c
new file mode 100644
index 0000000..3f4d602
--- /dev/null
+++ b/tools/perf/util/namespaces.c
@@ -0,0 +1,27 @@
+#include "namespaces.h"
+#include "util.h"
+#include "event.h"
+#include <stdlib.h>
+#include <stdio.h>
+
+struct namespaces *namespaces__new(struct namespaces_event *event)
+{
+ struct namespaces *namespaces = zalloc(sizeof(*namespaces));
+
+ if (!namespaces)
+ return NULL;
+
+ namespaces->end_time = -1;
+
+ if (event) {
+ memcpy(namespaces->link_info, event->link_info,
+ sizeof(namespaces->link_info));
+ }
+
+ return namespaces;
+}
+
+void namespaces__free(struct namespaces *namespaces)
+{
+ free(namespaces);
+}
diff --git a/tools/perf/util/namespaces.h b/tools/perf/util/namespaces.h
new file mode 100644
index 0000000..91922a7
--- /dev/null
+++ b/tools/perf/util/namespaces.h
@@ -0,0 +1,18 @@
+#ifndef __PERF_NAMESPACES_H
+#define __PERF_NAMESPACES_H
+
+#include "../perf.h"
+#include <linux/list.h>
+
+struct namespaces_event;
+
+struct namespaces {
+ struct list_head list;
+ u64 end_time;
+ struct perf_ns_link_info link_info[NAMESPACES_MAX];
+};
+
+struct namespaces *namespaces__new(struct namespaces_event *event);
+void namespaces__free(struct namespaces *namespaces);
+
+#endif /* __PERF_NAMESPACES_H */
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index f268201..3ce081e 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1239,6 +1239,8 @@ static int machines__deliver_event(struct machines *machines,
return tool->mmap2(tool, event, sample, machine);
case PERF_RECORD_COMM:
return tool->comm(tool, event, sample, machine);
+ case PERF_RECORD_NAMESPACES:
+ return tool->namespaces(tool, event, sample, machine);
case PERF_RECORD_FORK:
return tool->fork(tool, event, sample, machine);
case PERF_RECORD_EXIT:
@@ -1494,6 +1496,11 @@ int perf_session__register_idle_thread(struct perf_session *session)
err = -1;
}

+ if (thread == NULL || thread__set_namespaces(thread, 0, NULL)) {
+ pr_err("problem inserting idle task.\n");
+ err = -1;
+ }
+
/* machine__findnew_thread() got the thread, so put it */
thread__put(thread);
return err;
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index f5af87f..b9fe432 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -7,6 +7,7 @@
#include "thread-stack.h"
#include "util.h"
#include "debug.h"
+#include "namespaces.h"
#include "comm.h"
#include "unwind.h"

@@ -40,6 +41,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
thread->tid = tid;
thread->ppid = -1;
thread->cpu = -1;
+ INIT_LIST_HEAD(&thread->namespaces_list);
INIT_LIST_HEAD(&thread->comm_list);

comm_str = malloc(32);
@@ -66,7 +68,8 @@ struct thread *thread__new(pid_t pid, pid_t tid)

void thread__delete(struct thread *thread)
{
- struct comm *comm, *tmp;
+ struct namespaces *namespaces, *tmp_namespaces;
+ struct comm *comm, *tmp_comm;

BUG_ON(!RB_EMPTY_NODE(&thread->rb_node));

@@ -76,7 +79,12 @@ void thread__delete(struct thread *thread)
map_groups__put(thread->mg);
thread->mg = NULL;
}
- list_for_each_entry_safe(comm, tmp, &thread->comm_list, list) {
+ list_for_each_entry_safe(namespaces, tmp_namespaces,
+ &thread->namespaces_list, list) {
+ list_del(&namespaces->list);
+ namespaces__free(namespaces);
+ }
+ list_for_each_entry_safe(comm, tmp_comm, &thread->comm_list, list) {
list_del(&comm->list);
comm__free(comm);
}
@@ -104,6 +112,38 @@ void thread__put(struct thread *thread)
}
}

+struct namespaces *thread__namespaces(const struct thread *thread)
+{
+ if (list_empty(&thread->namespaces_list))
+ return NULL;
+
+ return list_first_entry(&thread->namespaces_list, struct namespaces, list);
+}
+
+int thread__set_namespaces(struct thread *thread, u64 timestamp,
+ struct namespaces_event *event)
+{
+ struct namespaces *new, *curr = thread__namespaces(thread);
+
+ new = namespaces__new(event);
+ if (!new)
+ return -ENOMEM;
+
+ list_add(&new->list, &thread->namespaces_list);
+
+ if (timestamp && curr) {
+ /*
+ * setns syscall must have changed few or all the namespaces
+ * of this thread. Update end time for the namespaces
+ * previously used.
+ */
+ curr = list_next_entry(new, list);
+ curr->end_time = timestamp;
+ }
+
+ return 0;
+}
+
struct comm *thread__comm(const struct thread *thread)
{
if (list_empty(&thread->comm_list))
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 99263cb..b18b5a2 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -28,6 +28,7 @@ struct thread {
bool comm_set;
int comm_len;
bool dead; /* if set thread has exited */
+ struct list_head namespaces_list;
struct list_head comm_list;
u64 db_id;

@@ -40,6 +41,7 @@ struct thread {
};

struct machine;
+struct namespaces;
struct comm;

struct thread *thread__new(pid_t pid, pid_t tid);
@@ -62,6 +64,10 @@ static inline void thread__exited(struct thread *thread)
thread->dead = true;
}

+struct namespaces *thread__namespaces(const struct thread *thread);
+int thread__set_namespaces(struct thread *thread, u64 timestamp,
+ struct namespaces_event *event);
+
int __thread__set_comm(struct thread *thread, const char *comm, u64 timestamp,
bool exec);
static inline int thread__set_comm(struct thread *thread, const char *comm,
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index ac2590a..829471a 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -40,6 +40,7 @@ struct perf_tool {
event_op mmap,
mmap2,
comm,
+ namespaces,
fork,
exit,
lost,
@@ -66,6 +67,7 @@ struct perf_tool {
event_op3 auxtrace;
bool ordered_events;
bool ordering_requires_timestamps;
+ bool namespace_events;
};

#endif /* __PERF_TOOL_H */

2016-12-15 18:47:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info

On Fri, Dec 16, 2016 at 12:07:06AM +0530, Hari Bathini wrote:
> +struct perf_ns_link_info {
> + __u64 dev;
> + __u64 ino;
> +};
> +
> +enum {
> + NET_NS_INDEX = 0,
> + UTS_NS_INDEX = 1,
> + IPC_NS_INDEX = 2,
> + PID_NS_INDEX = 3,
> + USER_NS_INDEX = 4,
> + MNT_NS_INDEX = 5,
> + CGROUP_NS_INDEX = 6,
> +
> + NAMESPACES_MAX, /* maximum available namespaces */
> +};
> +
> enum perf_event_type {
>
> /*
> @@ -862,6 +880,17 @@ enum perf_event_type {
> */
> PERF_RECORD_SWITCH_CPU_WIDE = 15,
>
> + /*
> + * struct {
> + * struct perf_event_header header;
> + * u32 pid;
> + * u32 tid;
> + * struct namespace_link_info link_info[NAMESPACES_MAX];
> + * struct sample_id sample_id;
> + * };
> + */
> + PERF_RECORD_NAMESPACES = 16,
> +
> PERF_RECORD_MAX, /* non-ABI */
> };

What happens if a future kernel adds another namespace?

2016-12-16 07:07:13

by Hari Bathini

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info



On Friday 16 December 2016 12:16 AM, Peter Zijlstra wrote:
> On Fri, Dec 16, 2016 at 12:07:06AM +0530, Hari Bathini wrote:
>> +struct perf_ns_link_info {
>> + __u64 dev;
>> + __u64 ino;
>> +};
>> +
>> +enum {
>> + NET_NS_INDEX = 0,
>> + UTS_NS_INDEX = 1,
>> + IPC_NS_INDEX = 2,
>> + PID_NS_INDEX = 3,
>> + USER_NS_INDEX = 4,
>> + MNT_NS_INDEX = 5,
>> + CGROUP_NS_INDEX = 6,
>> +
>> + NAMESPACES_MAX, /* maximum available namespaces */
>> +};
>> +
>> enum perf_event_type {
>>
>> /*
>> @@ -862,6 +880,17 @@ enum perf_event_type {
>> */
>> PERF_RECORD_SWITCH_CPU_WIDE = 15,
>>
>> + /*
>> + * struct {
>> + * struct perf_event_header header;
>> + * u32 pid;
>> + * u32 tid;
>> + * struct namespace_link_info link_info[NAMESPACES_MAX];
>> + * struct sample_id sample_id;
>> + * };
>> + */
>> + PERF_RECORD_NAMESPACES = 16,
>> +
>> PERF_RECORD_MAX, /* non-ABI */
>> };
> What happens if a future kernel adds another namespace?
>

No impact unless NAMESPACES_MAX in include/uapi/linux/perf_event.h is
updated to accommodate that..
And if it is updated, the corresponding change is expected in perf-tool
as well..

Thanks
Hari

2016-12-16 08:21:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info

On Fri, Dec 16, 2016 at 11:57:47AM +0530, Hari Bathini wrote:
>
>
> On Friday 16 December 2016 12:16 AM, Peter Zijlstra wrote:
> >On Fri, Dec 16, 2016 at 12:07:06AM +0530, Hari Bathini wrote:
> >>+struct perf_ns_link_info {
> >>+ __u64 dev;
> >>+ __u64 ino;
> >>+};
> >>+
> >>+enum {
> >>+ NET_NS_INDEX = 0,
> >>+ UTS_NS_INDEX = 1,
> >>+ IPC_NS_INDEX = 2,
> >>+ PID_NS_INDEX = 3,
> >>+ USER_NS_INDEX = 4,
> >>+ MNT_NS_INDEX = 5,
> >>+ CGROUP_NS_INDEX = 6,
> >>+
> >>+ NAMESPACES_MAX, /* maximum available namespaces */
> >>+};
> >>+
> >> enum perf_event_type {
> >> /*
> >>@@ -862,6 +880,17 @@ enum perf_event_type {
> >> */
> >> PERF_RECORD_SWITCH_CPU_WIDE = 15,
> >>+ /*
> >>+ * struct {
> >>+ * struct perf_event_header header;
> >>+ * u32 pid;
> >>+ * u32 tid;
> >>+ * struct namespace_link_info link_info[NAMESPACES_MAX];
> >>+ * struct sample_id sample_id;
> >>+ * };
> >>+ */
> >>+ PERF_RECORD_NAMESPACES = 16,
> >>+
> >> PERF_RECORD_MAX, /* non-ABI */
> >> };
> >What happens if a future kernel adds another namespace?
> >
>
> No impact unless NAMESPACES_MAX in include/uapi/linux/perf_event.h is
> updated to accommodate that..
> And if it is updated, the corresponding change is expected in perf-tool as
> well..

And what happens if you try and process old data files with the new
tools or the other way around?

You must not expect lock-step updates for this to work.

2016-12-16 12:14:28

by Alban Crequy

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] perf: add support for analyzing events for containers

Hi,

> Currently, there is no trivial mechanism to analyze events based on
> containers. perf -G can be used, but it will not filter events for the
> containers created after perf is invoked, making it difficult to assess/
> analyze performance issues of multiple containers at once.
>
> This patch-set overcomes this limitation by using cgroup identifier as
> container unique identifier. A new PERF_RECORD_NAMESPACES event that
> records namespaces related info is introduced, from which the cgroup
> namespace's device & inode numbers are used as cgroup identifier. This
> is based on the assumption that each container is created with it's own
> cgroup namespace allowing assessment/analysis of multiple containers
> using cgroup identifier.
>
> The first patch introduces PERF_RECORD_NAMESPACES in kernel while the
> second patch makes the corresponding changes in perf tool to read this
> PERF_RECORD_NAMESPACES events. The third patch adds a cgroup identifier
> column in perf report, which contains the cgroup namespace's device and
> inode numbers.

I have a question for the pid namespace: does the new perf event gives
the pid namespace of the task, or the pid_ns_for_children from the
nsproxy? From my limited understanding, v4 seems to do the former, as
opposed to v3.

When synthesizing events from /proc/$PID/ns/pid, it cannot take the
pid_ns_for_children, so I wanted to make sure it is consistent.

Cheers,
Alban

2016-12-16 18:22:07

by Hari Bathini

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info

Hi Peter,


On Friday 16 December 2016 01:27 PM, Peter Zijlstra wrote:
> On Fri, Dec 16, 2016 at 11:57:47AM +0530, Hari Bathini wrote:
>>
>> On Friday 16 December 2016 12:16 AM, Peter Zijlstra wrote:
>>> On Fri, Dec 16, 2016 at 12:07:06AM +0530, Hari Bathini wrote:
>>>> +struct perf_ns_link_info {
>>>> + __u64 dev;
>>>> + __u64 ino;
>>>> +};
>>>> +
>>>> +enum {
>>>> + NET_NS_INDEX = 0,
>>>> + UTS_NS_INDEX = 1,
>>>> + IPC_NS_INDEX = 2,
>>>> + PID_NS_INDEX = 3,
>>>> + USER_NS_INDEX = 4,
>>>> + MNT_NS_INDEX = 5,
>>>> + CGROUP_NS_INDEX = 6,
>>>> +
>>>> + NAMESPACES_MAX, /* maximum available namespaces */
>>>> +};
>>>> +
>>>> enum perf_event_type {
>>>> /*
>>>> @@ -862,6 +880,17 @@ enum perf_event_type {
>>>> */
>>>> PERF_RECORD_SWITCH_CPU_WIDE = 15,
>>>> + /*
>>>> + * struct {
>>>> + * struct perf_event_header header;
>>>> + * u32 pid;
>>>> + * u32 tid;
>>>> + * struct namespace_link_info link_info[NAMESPACES_MAX];
>>>> + * struct sample_id sample_id;
>>>> + * };
>>>> + */
>>>> + PERF_RECORD_NAMESPACES = 16,
>>>> +
>>>> PERF_RECORD_MAX, /* non-ABI */
>>>> };
>>> What happens if a future kernel adds another namespace?
>>>
>> No impact unless NAMESPACES_MAX in include/uapi/linux/perf_event.h is
>> updated to accommodate that..
>> And if it is updated, the corresponding change is expected in perf-tool as
>> well..
> And what happens if you try and process old data files with the new
> tools or the other way around?

It works fine either way. I tested that..

> You must not expect lock-step updates for this to work.
>

Not required unless the tool is expected to process the new namespace data..

Thanks
Hari

2016-12-16 18:27:14

by Hari Bathini

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] perf: add support for analyzing events for containers

Hi Alban,


On Friday 16 December 2016 05:44 PM, Alban Crequy wrote:
> Hi,
>
>> Currently, there is no trivial mechanism to analyze events based on
>> containers. perf -G can be used, but it will not filter events for the
>> containers created after perf is invoked, making it difficult to assess/
>> analyze performance issues of multiple containers at once.
>>
>> This patch-set overcomes this limitation by using cgroup identifier as
>> container unique identifier. A new PERF_RECORD_NAMESPACES event that
>> records namespaces related info is introduced, from which the cgroup
>> namespace's device & inode numbers are used as cgroup identifier. This
>> is based on the assumption that each container is created with it's own
>> cgroup namespace allowing assessment/analysis of multiple containers
>> using cgroup identifier.
>>
>> The first patch introduces PERF_RECORD_NAMESPACES in kernel while the
>> second patch makes the corresponding changes in perf tool to read this
>> PERF_RECORD_NAMESPACES events. The third patch adds a cgroup identifier
>> column in perf report, which contains the cgroup namespace's device and
>> inode numbers.
> I have a question for the pid namespace: does the new perf event gives
> the pid namespace of the task, or the pid_ns_for_children from the
> nsproxy? From my limited understanding, v4 seems to do the former, as
> opposed to v3.

Ah! How did I miss that?!

> When synthesizing events from /proc/$PID/ns/pid, it cannot take the
> pid_ns_for_children, so I wanted to make sure it is consistent.
>

So, eventually this version sounds like the right way of doing it..?

Thanks
Hari

2016-12-16 20:06:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info

On Fri, Dec 16, 2016 at 11:51:20PM +0530, Hari Bathini wrote:
> Hi Peter,
>
>
> On Friday 16 December 2016 01:27 PM, Peter Zijlstra wrote:
> >On Fri, Dec 16, 2016 at 11:57:47AM +0530, Hari Bathini wrote:
> >>
> >>On Friday 16 December 2016 12:16 AM, Peter Zijlstra wrote:
> >>>On Fri, Dec 16, 2016 at 12:07:06AM +0530, Hari Bathini wrote:
> >>>>+struct perf_ns_link_info {
> >>>>+ __u64 dev;
> >>>>+ __u64 ino;
> >>>>+};
> >>>>+
> >>>>+enum {
> >>>>+ NET_NS_INDEX = 0,
> >>>>+ UTS_NS_INDEX = 1,
> >>>>+ IPC_NS_INDEX = 2,
> >>>>+ PID_NS_INDEX = 3,
> >>>>+ USER_NS_INDEX = 4,
> >>>>+ MNT_NS_INDEX = 5,
> >>>>+ CGROUP_NS_INDEX = 6,
> >>>>+
> >>>>+ NAMESPACES_MAX, /* maximum available namespaces */
> >>>>+};
> >>>>+
> >>>> enum perf_event_type {
> >>>> /*
> >>>>@@ -862,6 +880,17 @@ enum perf_event_type {
> >>>> */
> >>>> PERF_RECORD_SWITCH_CPU_WIDE = 15,
> >>>>+ /*
> >>>>+ * struct {
> >>>>+ * struct perf_event_header header;
> >>>>+ * u32 pid;
> >>>>+ * u32 tid;
> >>>>+ * struct namespace_link_info link_info[NAMESPACES_MAX];
> >>>>+ * struct sample_id sample_id;
> >>>>+ * };
> >>>>+ */
> >>>>+ PERF_RECORD_NAMESPACES = 16,
> >>>>+
> >>>> PERF_RECORD_MAX, /* non-ABI */
> >>>> };
> >>>What happens if a future kernel adds another namespace?
> >>>
> >>No impact unless NAMESPACES_MAX in include/uapi/linux/perf_event.h is
> >>updated to accommodate that..
> >>And if it is updated, the corresponding change is expected in perf-tool as
> >>well..
> >And what happens if you try and process old data files with the new
> >tools or the other way around?
>
> It works fine either way. I tested that..

I don't see how the tool can parse old records (with NAMESPACES_MAX ==
7) if you set its NAMESPACES_MAX to say 10.

Then it will expect the link_info array to be 10 entries and either read
past the end of the record (if !sample_all) or try and interpret
sample_id as link_info records.

2016-12-17 17:40:21

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] perf tool: add PERF_RECORD_NAMESPACES to include namespaces related info

On Fri, Dec 16, 2016 at 12:07:20AM +0530, Hari Bathini wrote:

SNIP

> +
> +int thread__set_namespaces(struct thread *thread, u64 timestamp,
> + struct namespaces_event *event)
> +{
> + struct namespaces *new, *curr = thread__namespaces(thread);
> +
> + new = namespaces__new(event);
> + if (!new)
> + return -ENOMEM;
> +
> + list_add(&new->list, &thread->namespaces_list);
> +
> + if (timestamp && curr) {
> + /*
> + * setns syscall must have changed few or all the namespaces
> + * of this thread. Update end time for the namespaces
> + * previously used.
> + */
> + curr = list_next_entry(new, list);
> + curr->end_time = timestamp;

hi,
couldn't you use just the curr you got from thread__namespaces?
why to retrieve it again via 'new' pointer?

thanks,
jirka

2016-12-21 13:09:23

by Hari Bathini

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info

Hi Peter,


On Saturday 17 December 2016 01:35 AM, Peter Zijlstra wrote:
> On Fri, Dec 16, 2016 at 11:51:20PM +0530, Hari Bathini wrote:
>> Hi Peter,
>>
>>
>> On Friday 16 December 2016 01:27 PM, Peter Zijlstra wrote:
>>> On Fri, Dec 16, 2016 at 11:57:47AM +0530, Hari Bathini wrote:
>>>> On Friday 16 December 2016 12:16 AM, Peter Zijlstra wrote:
>>>>> On Fri, Dec 16, 2016 at 12:07:06AM +0530, Hari Bathini wrote:
>>>>>> +struct perf_ns_link_info {
>>>>>> + __u64 dev;
>>>>>> + __u64 ino;
>>>>>> +};
>>>>>> +
>>>>>> +enum {
>>>>>> + NET_NS_INDEX = 0,
>>>>>> + UTS_NS_INDEX = 1,
>>>>>> + IPC_NS_INDEX = 2,
>>>>>> + PID_NS_INDEX = 3,
>>>>>> + USER_NS_INDEX = 4,
>>>>>> + MNT_NS_INDEX = 5,
>>>>>> + CGROUP_NS_INDEX = 6,
>>>>>> +
>>>>>> + NAMESPACES_MAX, /* maximum available namespaces */
>>>>>> +};
>>>>>> +
>>>>>> enum perf_event_type {
>>>>>> /*
>>>>>> @@ -862,6 +880,17 @@ enum perf_event_type {
>>>>>> */
>>>>>> PERF_RECORD_SWITCH_CPU_WIDE = 15,
>>>>>> + /*
>>>>>> + * struct {
>>>>>> + * struct perf_event_header header;
>>>>>> + * u32 pid;
>>>>>> + * u32 tid;
>>>>>> + * struct namespace_link_info link_info[NAMESPACES_MAX];
>>>>>> + * struct sample_id sample_id;
>>>>>> + * };
>>>>>> + */
>>>>>> + PERF_RECORD_NAMESPACES = 16,
>>>>>> +
>>>>>> PERF_RECORD_MAX, /* non-ABI */
>>>>>> };
>>>>> What happens if a future kernel adds another namespace?
>>>>>
>>>> No impact unless NAMESPACES_MAX in include/uapi/linux/perf_event.h is
>>>> updated to accommodate that..
>>>> And if it is updated, the corresponding change is expected in perf-tool as
>>>> well..
>>> And what happens if you try and process old data files with the new
>>> tools or the other way around?
>> It works fine either way. I tested that..
> I don't see how the tool can parse old records (with NAMESPACES_MAX ==
> 7) if you set its NAMESPACES_MAX to say 10.
>
> Then it will expect the link_info array to be 10 entries and either read
> past the end of the record (if !sample_all) or try and interpret
> sample_id as link_info records.
>

Right. There will be inconsistency with data the perf tool tries to read
beyond
what the kernel supports. IIUC, you mean, include nr_namespaces field in the
record and warn the user if it doesn't match with the one perf-tool supports
before proceeding..?

Thanks
Hari

PS: I am on vacation. Will post next version early January.

2016-12-21 13:19:01

by Hari Bathini

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] perf tool: add PERF_RECORD_NAMESPACES to include namespaces related info

Hi Jiri,


On Saturday 17 December 2016 11:10 PM, Jiri Olsa wrote:
> On Fri, Dec 16, 2016 at 12:07:20AM +0530, Hari Bathini wrote:
>
> SNIP
>
>> +
>> +int thread__set_namespaces(struct thread *thread, u64 timestamp,
>> + struct namespaces_event *event)
>> +{
>> + struct namespaces *new, *curr = thread__namespaces(thread);
>> +
>> + new = namespaces__new(event);
>> + if (!new)
>> + return -ENOMEM;
>> +
>> + list_add(&new->list, &thread->namespaces_list);
>> +
>> + if (timestamp && curr) {
>> + /*
>> + * setns syscall must have changed few or all the namespaces
>> + * of this thread. Update end time for the namespaces
>> + * previously used.
>> + */
>> + curr = list_next_entry(new, list);
>> + curr->end_time = timestamp;
> hi,
> couldn't you use just the curr you got from thread__namespaces?
> why to retrieve it again via 'new' pointer?
>

Adding the new namespaces info to the list while keeping the old ones
along with the change timestamp to retain the multiple changes made
for the namespaces of each thread.

Thanks
Hari

2016-12-21 13:25:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info

On Wed, Dec 21, 2016 at 06:39:01PM +0530, Hari Bathini wrote:
> Hi Peter,

> >I don't see how the tool can parse old records (with NAMESPACES_MAX ==
> >7) if you set its NAMESPACES_MAX to say 10.
> >
> >Then it will expect the link_info array to be 10 entries and either read
> >past the end of the record (if !sample_all) or try and interpret
> >sample_id as link_info records.
> >
>
> Right. There will be inconsistency with data the perf tool tries to read
> beyond
> what the kernel supports. IIUC, you mean, include nr_namespaces field in the
> record and warn the user if it doesn't match with the one perf-tool supports
> before proceeding..?

Yes, if you add a nr_namespaces field its always parsable. If an old
tool finds more namespace than it has 'names' for it can always display
the raw index number. If a new tool finds the array short, it will not
display the missing ones.

2016-12-21 15:56:31

by Hari Bathini

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info



On Wednesday 21 December 2016 06:54 PM, Peter Zijlstra wrote:
> On Wed, Dec 21, 2016 at 06:39:01PM +0530, Hari Bathini wrote:
>> Hi Peter,
>>> I don't see how the tool can parse old records (with NAMESPACES_MAX ==
>>> 7) if you set its NAMESPACES_MAX to say 10.
>>>
>>> Then it will expect the link_info array to be 10 entries and either read
>>> past the end of the record (if !sample_all) or try and interpret
>>> sample_id as link_info records.
>>>
>> Right. There will be inconsistency with data the perf tool tries to read
>> beyond
>> what the kernel supports. IIUC, you mean, include nr_namespaces field in the
>> record and warn the user if it doesn't match with the one perf-tool supports
>> before proceeding..?
> Yes, if you add a nr_namespaces field its always parsable. If an old
> tool finds more namespace than it has 'names' for it can always display
> the raw index number. If a new tool finds the array short, it will not
> display the missing ones.
>

Sure, Peter. Will post the next version as soon as
I am back from vacation..

Thanks
Hari

2016-12-22 07:24:43

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info

Hari Bathini <[email protected]> writes:

> On Wednesday 21 December 2016 06:54 PM, Peter Zijlstra wrote:
>> On Wed, Dec 21, 2016 at 06:39:01PM +0530, Hari Bathini wrote:
>>> Hi Peter,
>>>> I don't see how the tool can parse old records (with NAMESPACES_MAX ==
>>>> 7) if you set its NAMESPACES_MAX to say 10.
>>>>
>>>> Then it will expect the link_info array to be 10 entries and either read
>>>> past the end of the record (if !sample_all) or try and interpret
>>>> sample_id as link_info records.
>>>>
>>> Right. There will be inconsistency with data the perf tool tries to read
>>> beyond
>>> what the kernel supports. IIUC, you mean, include nr_namespaces field in the
>>> record and warn the user if it doesn't match with the one perf-tool supports
>>> before proceeding..?
>> Yes, if you add a nr_namespaces field its always parsable. If an old
>> tool finds more namespace than it has 'names' for it can always display
>> the raw index number. If a new tool finds the array short, it will not
>> display the missing ones.
>>
>
> Sure, Peter. Will post the next version as soon as
> I am back from vacation..

And please make the array the last item in the structure so that
expanding or contracting it does not affect the ability to read the rest
of the structure.

Eric

2016-12-22 07:53:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info

On Thu, Dec 22, 2016 at 08:21:23PM +1300, Eric W. Biederman wrote:
>
> And please make the array the last item in the structure so that
> expanding or contracting it does not affect the ability to read the rest
> of the structure.

Sorry, sample_id must be last, because hysterical crud :/

(basically because that was the only way to add a field to records like
PERF_RECORD_MMAP which used the record length to determine the
filename[] length, yes I know, we won't ever do that again).

2016-12-22 10:22:42

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info

Peter Zijlstra <[email protected]> writes:

> On Thu, Dec 22, 2016 at 08:21:23PM +1300, Eric W. Biederman wrote:
>>
>> And please make the array the last item in the structure so that
>> expanding or contracting it does not affect the ability to read the rest
>> of the structure.
>
> Sorry, sample_id must be last, because hysterical crud :/
>
> (basically because that was the only way to add a field to records like
> PERF_RECORD_MMAP which used the record length to determine the
> filename[] length, yes I know, we won't ever do that again).

Why does historical crud need to affect new records?

Totally confused. This looks like a major mess.

Eric



2016-12-22 13:24:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info

On Thu, Dec 22, 2016 at 11:19:17PM +1300, Eric W. Biederman wrote:
> Peter Zijlstra <[email protected]> writes:
>
> > On Thu, Dec 22, 2016 at 08:21:23PM +1300, Eric W. Biederman wrote:
> >>
> >> And please make the array the last item in the structure so that
> >> expanding or contracting it does not affect the ability to read the rest
> >> of the structure.
> >
> > Sorry, sample_id must be last, because hysterical crud :/
> >
> > (basically because that was the only way to add a field to records like
> > PERF_RECORD_MMAP which used the record length to determine the
> > filename[] length, yes I know, we won't ever do that again).
>
> Why does historical crud need to affect new records?

Because now the userspace parser expects sample_id to be the tail
field. Basically decoding a record now looks like:

if (sample_id_all) {
sample_id = (sample_id *)((char *)record + record->size - sizeof(sample_id));
record->size -= sizeof(sample_id);
}
/* process record */

We could of course create more exceptions..

> Totally confused. This looks like a major mess.

Not major, but yes, its ugly, but its also ABI :-(


2016-12-22 18:24:20

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info

Peter Zijlstra <[email protected]> writes:

> On Thu, Dec 22, 2016 at 11:19:17PM +1300, Eric W. Biederman wrote:
>> Peter Zijlstra <[email protected]> writes:
>>
>> > On Thu, Dec 22, 2016 at 08:21:23PM +1300, Eric W. Biederman wrote:
>> >>
>> >> And please make the array the last item in the structure so that
>> >> expanding or contracting it does not affect the ability to read the rest
>> >> of the structure.
>> >
>> > Sorry, sample_id must be last, because hysterical crud :/
>> >
>> > (basically because that was the only way to add a field to records like
>> > PERF_RECORD_MMAP which used the record length to determine the
>> > filename[] length, yes I know, we won't ever do that again).
>>
>> Why does historical crud need to affect new records?
>
> Because now the userspace parser expects sample_id to be the tail
> field. Basically decoding a record now looks like:
>
> if (sample_id_all) {
> sample_id = (sample_id *)((char *)record + record->size - sizeof(sample_id));
> record->size -= sizeof(sample_id);
> }
> /* process record */
>
> We could of course create more exceptions..
>
>> Totally confused. This looks like a major mess.
>
> Not major, but yes, its ugly, but its also ABI :-(

Fair enough. I will just avert my eyes now and deal with my own challenges.

Eric

2016-12-29 01:41:44

by Krister Johansen

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] perf: add support for analyzing events for containers

On Fri, Dec 16, 2016 at 12:06:55AM +0530, Hari Bathini wrote:
> This patch-set overcomes this limitation by using cgroup identifier as
> container unique identifier. A new PERF_RECORD_NAMESPACES event that
> records namespaces related info is introduced, from which the cgroup
> namespace's device & inode numbers are used as cgroup identifier. This
> is based on the assumption that each container is created with it's own
> cgroup namespace allowing assessment/analysis of multiple containers
> using cgroup identifier.

Why choose cgroups when the kernel dispenses namespace-unique
identifiers. Cgroup membership can be arbitrary. Moreover, cgroup and
namespace destruction are handled by separate subsystems. It's possible
to have a cgroup notifier run prior to network namespace teardown
occurring.

If it were me, I'd re-use existing convention to identify the namespaces
you want to monitor. The code in nsenter(1) can take a namespace that's
been bind mount'd on a file, or extract the ns information from a task
in /procfs.

My biggest concern is how the sample data is handled after it has been
collected. Both namespaces and cgroups don't survive reboots. Will the
records will contain all the persistent state needed to run a report or
script command at a later date?

Does this code attempt to enter alternate namespaces in order to record
stack/symbol information for a '-g' style trace? If so, how are you
holding on to that information? There's no guarantee that a particular
container will be alive or have its filesystems reachable from the host
if the trace data is evaluated at a later time.

-K