2012-05-18 10:14:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf: Add persistent event facilities

On Wed, 2012-03-21 at 15:34 +0100, Borislav Petkov wrote:
> +int perf_add_persistent_on_cpu(int cpu, struct pers_event_desc *desc,
> + struct dentry *dir, unsigned nr_pages)

OK, so this creates an even and registers it somewhere in debugfs.

- you allow an arbitrary place in debugfs; this might make finding them
'interesting'. Should we put them all in the same place?

- persistent events created from userspace don't seem to get a debugfs
entry and will be lost forever?


In general I think a little more exploring of the semantics and
ramifications might be in order.


2012-05-18 11:04:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf: Add persistent event facilities

On Fri, May 18, 2012 at 12:14:00PM +0200, Peter Zijlstra wrote:
> On Wed, 2012-03-21 at 15:34 +0100, Borislav Petkov wrote:
> > +int perf_add_persistent_on_cpu(int cpu, struct pers_event_desc *desc,
> > + struct dentry *dir, unsigned nr_pages)
>
> OK, so this creates an even and registers it somewhere in debugfs.

(debugfs)/mce/

> - you allow an arbitrary place in debugfs; this might make finding
> them 'interesting'. Should we put them all in the same place?

My take on this is that we want to be able to make the same events we
have now, persistent. Basically not trace for the duration of a child
process but in a process-agnostic way, system-wide.

In that case, we probably want to be able to mark events as persistent,
maybe add another node in debugfs:

(debugfs)/tracing/events/mce/mce_record/attr

which can be used for flags or whatever, or something to that effect...

> - persistent events created from userspace don't seem to get a debugfs
> entry and will be lost forever?

Yeah, see above.

> In general I think a little more exploring of the semantics and
> ramifications might be in order.

Absolutely!

Thanks.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2012-05-18 11:24:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf: Add persistent event facilities

On Fri, 2012-05-18 at 13:03 +0200, Borislav Petkov wrote:
> > - you allow an arbitrary place in debugfs; this might make finding
> > them 'interesting'. Should we put them all in the same place?
>
> My take on this is that we want to be able to make the same events we
> have now, persistent. Basically not trace for the duration of a child
> process but in a process-agnostic way, system-wide.

This would argue against per-task persistent events..

> In that case, we probably want to be able to mark events as persistent,
> maybe add another node in debugfs:
>
> (debugfs)/tracing/events/mce/mce_record/attr
>
> which can be used for flags or whatever, or something to that effect...

( while mce is a user of persistent events, it seems to me in general
persistent events should not be related to mce )

However you raise a valid point about keeping track of what events are
spooled into that buffer.

Note the plural there, it might be very desirable to allow multiple
events into a single persistent buffer.

2012-05-18 12:00:05

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf: Add persistent event facilities


* Peter Zijlstra <[email protected]> wrote:

> On Fri, 2012-05-18 at 13:03 +0200, Borislav Petkov wrote:
> > > - you allow an arbitrary place in debugfs; this might make finding
> > > them 'interesting'. Should we put them all in the same place?
> >
> > My take on this is that we want to be able to make the same events we
> > have now, persistent. Basically not trace for the duration of a child
> > process but in a process-agnostic way, system-wide.
>
> This would argue against per-task persistent events..

Yeah, 'persistency' is something that's per CPU at minimum.

> Note the plural there, it might be very desirable to allow
> multiple events into a single persistent buffer.

very, very, very much so. One (per CPU) buffer, with many events
multiplexed into it.

Thanks,

Ingo

2012-05-18 12:55:24

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf: Add persistent event facilities

On Fri, May 18, 2012 at 01:59:57PM +0200, Ingo Molnar wrote:
>
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Fri, 2012-05-18 at 13:03 +0200, Borislav Petkov wrote:
> > > > - you allow an arbitrary place in debugfs; this might make finding
> > > > them 'interesting'. Should we put them all in the same place?
> > >
> > > My take on this is that we want to be able to make the same events we
> > > have now, persistent. Basically not trace for the duration of a child
> > > process but in a process-agnostic way, system-wide.
> >
> > This would argue against per-task persistent events..
>
> Yeah, 'persistency' is something that's per CPU at minimum.

Yeah, forget the per-task thing - that's me not understanding the code
fully. They need to be per CPU and "taskless."

> > Note the plural there, it might be very desirable to allow
> > multiple events into a single persistent buffer.
>
> very, very, very much so. One (per CPU) buffer, with many events
> multiplexed into it.

Which begs the other question: mce could use buffers which are RO - at
least, I don't see a usecase where we want to delete entries from them -
but other persistent events users might want to delete those events to
free up the buffers.

It is an interesting question how we handle that RO/RW thing? Maybe per
event and do something of a kernel-side filtering which Peter didn't
like in Robert's IBS stuff.

Hmmm...

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2012-05-18 13:38:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf: Add persistent event facilities

On Fri, 2012-05-18 at 14:55 +0200, Borislav Petkov wrote:
> It is an interesting question how we handle that RO/RW thing?

RO buffers overwrite, RW buffers loose events.

We could allow both for persistent under the constraints:

VM_SHARED - must imply !VM_WRITE since multiple people writing to the
control page will get 'interesting' very fast.

!VM_SHARED - must fail mmap() of the debugfs file if there's already
an existing user.

Note that even RW buffers only allows writes to the control page, a
write to the actual data itself will generate a fault.

2012-05-18 14:10:04

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf: Add persistent event facilities

On Fri, May 18, 2012 at 03:37:47PM +0200, Peter Zijlstra wrote:
> On Fri, 2012-05-18 at 14:55 +0200, Borislav Petkov wrote:
> > It is an interesting question how we handle that RO/RW thing?
>
> RO buffers overwrite, RW buffers loose events.
>
> We could allow both for persistent under the constraints:
>
> VM_SHARED - must imply !VM_WRITE since multiple people writing to the
> control page will get 'interesting' very fast.

Ok, I get that one.

> !VM_SHARED - must fail mmap() of the debugfs file if there's already
> an existing user.

What happens here if a second user wants to enable a second set of
persistent events, allocate a second set of per-CPU buffers?

> Note that even RW buffers only allows writes to the control page, a
> write to the actual data itself will generate a fault.

Ok.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2012-05-18 14:14:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf: Add persistent event facilities

On Fri, 2012-05-18 at 16:09 +0200, Borislav Petkov wrote:
> What happens here if a second user wants to enable a second set of
> persistent events, allocate a second set of per-CPU buffers?

I'm not seeing the relation to VM_SHARED here. But yeah, if you want a
second set of buffers with either the same or other events, you create
another set..

2012-05-18 14:21:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf: Add persistent event facilities

On Fri, May 18, 2012 at 04:14:22PM +0200, Peter Zijlstra wrote:
> On Fri, 2012-05-18 at 16:09 +0200, Borislav Petkov wrote:
> > What happens here if a second user wants to enable a second set of
> > persistent events, allocate a second set of per-CPU buffers?
>
> I'm not seeing the relation to VM_SHARED here.

Well, in the sense that if mmap fails in the !VM_SHARED case, I can't
enable any persistent events except the ones which are enabled and so
I need to allocate another set of resources for that second persistent
events user.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2012-05-18 14:37:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf: Add persistent event facilities

On Fri, 2012-05-18 at 16:21 +0200, Borislav Petkov wrote:
> On Fri, May 18, 2012 at 04:14:22PM +0200, Peter Zijlstra wrote:
> > On Fri, 2012-05-18 at 16:09 +0200, Borislav Petkov wrote:
> > > What happens here if a second user wants to enable a second set of
> > > persistent events, allocate a second set of per-CPU buffers?
> >
> > I'm not seeing the relation to VM_SHARED here.
>
> Well, in the sense that if mmap fails in the !VM_SHARED case, I can't
> enable any persistent events except the ones which are enabled and so
> I need to allocate another set of resources for that second persistent
> events user.

Right, well, you could actually allow operations on the fd, just not a
second mapping :-)

Anyway.. I'd push that error back to the user. If they need a second
set, let them create it.

Do you really need multiple consumers for your MCE stuff? If so, what
would be the problem of using VM_SHARED?

2012-05-18 15:25:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf: Add persistent event facilities

On Fri, May 18, 2012 at 04:37:37PM +0200, Peter Zijlstra wrote:
> > > I'm not seeing the relation to VM_SHARED here.
> >
> > Well, in the sense that if mmap fails in the !VM_SHARED case, I can't
> > enable any persistent events except the ones which are enabled and so
> > I need to allocate another set of resources for that second persistent
> > events user.
>
> Right, well, you could actually allow operations on the fd, just not a
> second mapping :-)
>
> Anyway.. I'd push that error back to the user. If they need a second
> set, let them create it.
>
> Do you really need multiple consumers for your MCE stuff? If so, what
> would be the problem of using VM_SHARED?

Nah, for MCE I'm fine with RO buffers, simple :-).

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

2012-05-31 17:33:26

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf: Add persistent event facilities

On Fri, May 18, 2012 at 12:14:00PM +0200, Peter Zijlstra wrote:
> On Wed, 2012-03-21 at 15:34 +0100, Borislav Petkov wrote:
> > +int perf_add_persistent_on_cpu(int cpu, struct pers_event_desc *desc,
> > + struct dentry *dir, unsigned nr_pages)
>
> OK, so this creates an even and registers it somewhere in debugfs.
>
> - you allow an arbitrary place in debugfs; this might make finding them
> 'interesting'. Should we put them all in the same place?
>
> - persistent events created from userspace don't seem to get a debugfs
> entry and will be lost forever?
>
> In general I think a little more exploring of the semantics and
> ramifications might be in order.

Ok, how about the following. This is rough and not in any way ready - it
is supposed to feel out how you guys feel about it. Basically, it adds
another file in (debugfs)/tracing/events/<subsys>/<event_name>/ which
can be used for whatever.

In the persistent event case, I get this:

/mnt/dbg/tracing/events/mce/mce_record/
|-- enable
|-- filter
|-- format
|-- id
`-- pers

and 'pers' contains the file descriptor of the perf per-cpu buffer which
I can mmap in userspace. I can read that out later.

Other events can put other files in there containing other relevant
stuff for them.

The interface trace_add_file() is pretty generic (it can be done
even more so). It is not design-finished - it is only meant to show
the intent. Oh, and don't look at bob_add_trace_file() - it is a
quick'n'dirty hack for testing :-).

So, what do you guys think, Peter, Steven?

Thanks.

--
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 176a939d1547..1c13fb5e9a50 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -222,6 +222,8 @@ struct ftrace_event_call {
#ifdef CONFIG_PERF_EVENTS
int perf_refcount;
struct hlist_head __percpu *perf_events;
+ /* persistent event filedes */
+ int pers_fd;
#endif
};

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 29111da1d100..57ad0fbe0aef 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -980,6 +980,29 @@ show_header(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
return r;
}

+static ssize_t
+event_pers_fd_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
+{
+ struct ftrace_event_call *call = filp->private_data;
+ struct trace_seq *s;
+ int r;
+
+ if (*ppos)
+ return 0;
+
+ s = kmalloc(sizeof(*s), GFP_KERNEL);
+ if (!s)
+ return -ENOMEM;
+
+ trace_seq_init(s);
+ trace_seq_printf(s, "%d\n", call->pers_fd);
+
+ r = simple_read_from_buffer(ubuf, cnt, ppos,
+ s->buffer, s->len);
+ kfree(s);
+ return r;
+}
+
static const struct seq_operations show_event_seq_ops = {
.start = t_start,
.next = t_next,
@@ -1058,6 +1081,12 @@ static const struct file_operations ftrace_show_header_fops = {
.llseek = default_llseek,
};

+static const struct file_operations ftrace_persistent_fops = {
+ .open = tracing_open_generic,
+ .read = event_pers_fd_read,
+ .llseek = default_llseek,
+};
+
static struct dentry *event_trace_events_dir(void)
{
static struct dentry *d_tracer;
@@ -1199,6 +1228,56 @@ event_create_dir(struct ftrace_event_call *call, struct dentry *d_events,
return 0;
}

+static int __trace_add_file(struct ftrace_event_call *call, const char *fname,
+ umode_t mode, const struct file_operations *fops)
+{
+ int ret = 0;
+ mutex_lock(&event_mutex);
+
+ if (!trace_create_file(fname, mode, call->dir, call, fops))
+ ret = -EINVAL;
+
+ mutex_unlock(&event_mutex);
+ return ret;
+}
+
+extern struct ftrace_event_call *__start_ftrace_events[];
+extern struct ftrace_event_call *__stop_ftrace_events[];
+
+#define for_each_event(event, start, end) \
+ for (event = start; \
+ (unsigned long)event < (unsigned long)end; \
+ event++)
+
+/*
+ * Assumptions:
+ * - event debugfs dir is already initialized
+ * - trace event is not declared in a module
+ */
+int trace_add_file(const char *dir_name, const char *fname, umode_t mode,
+ const struct file_operations *fops)
+{
+ struct ftrace_event_call *call;
+
+ if (!strlen(fname))
+ return -EINVAL;
+
+ list_for_each_entry(call, &ftrace_events, list) {
+ if (!strncmp(call->name, dir_name, strlen(dir_name)))
+ return __trace_add_file(call, fname, mode, fops);
+ }
+ return -EINVAL;
+}
+
+static __init int bob_add_trace_file(void)
+{
+ int ret = 0;
+
+ ret = trace_add_file("mce_record", "pers", 0444, &ftrace_persistent_fops);
+
+ return ret;
+}
+
static int
__trace_add_event_call(struct ftrace_event_call *call, struct module *mod,
const struct file_operations *id,
@@ -1292,11 +1371,6 @@ void trace_remove_event_call(struct ftrace_event_call *call)
mutex_unlock(&event_mutex);
}

-#define for_each_event(event, start, end) \
- for (event = start; \
- (unsigned long)event < (unsigned long)end; \
- event++)
-
#ifdef CONFIG_MODULES

static LIST_HEAD(ftrace_module_file_list);
@@ -1435,9 +1509,6 @@ static struct notifier_block trace_module_nb = {
.priority = 0,
};

-extern struct ftrace_event_call *__start_ftrace_events[];
-extern struct ftrace_event_call *__stop_ftrace_events[];
-
static char bootup_event_buf[COMMAND_LINE_SIZE] __initdata;

static __init int setup_trace_event(char *str)
@@ -1753,3 +1824,4 @@ static __init int event_trace_self_tests_init(void)
late_initcall(event_trace_self_tests_init);

#endif
+late_initcall(bob_add_trace_file);


--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551