2010-06-22 07:24:39

by Lin Ming

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs

On Tue, 2010-06-22 at 14:59 +0800, Johannes Berg wrote:
> On Tue, 2010-06-22 at 14:25 +0800, Lin Ming wrote:
> > On Mon, 2010-06-21 at 17:34 +0800, Johannes Berg wrote:
> > > On Mon, 2010-06-21 at 16:55 +0800, Lin Ming wrote:
> > >
> > > > 21. iwlwifi_io events
> > > > /sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/net/wlan0/events/iwlwifi_dev_ioread32/
> > > > ...
> > > > /sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/net/wlan0/events/iwlwifi_dev_ucode_event/
> > >
> > >
> > > That doesn't work, you could have multiple PCI devices in your system.
> >
> > Understood. This is just a "demo".
> >
> > Actually, I mean
> >
> > net/wlan0/events/
> > net/waln1/events/
> > ....
> > net/walnN/events/
>
> That's not appropriate either though since you may have multiple network
> interfaces on the same hardware :)

Doesn't net/wlan0...wlanN mean multiple network interfaces on the same
hardware?

If my understanding is wrong, would you show me an example in sysfs?

>
> > > > 23. skb events
> > > > /sys/class/net/events/kfree_skb/
> > > > /sys/class/net/events/skb_copy_datagram_iovec/
> > >
> > > > 25. mac80211 events
> > > > /sys/class/net/events/drv_start/
> > > > ...
> > > > /sys/class/net/events/stop_queue
> > >
> > > It doesn't really seem right to mix all these.
> >
> > Well, agree, skb events are totally different with mac80211 events.
> > Any idea?
>
> I suppose the most appropriate thing would
> be /sys/class/ieee80211/phyN/events/... for almost all of them.

Good idea.

Thanks,
Lin Ming

>
> johannes
>


2010-06-22 07:34:12

by Johannes Berg

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs

On Tue, 2010-06-22 at 15:22 +0800, Lin Ming wrote:

> > > net/wlan0/events/
> > > net/waln1/events/
> > > ....
> > > net/walnN/events/
> >
> > That's not appropriate either though since you may have multiple network
> > interfaces on the same hardware :)
>
> Doesn't net/wlan0...wlanN mean multiple network interfaces on the same
> hardware?

Yes, but the trace points aren't per network interface but rather per
hardware piece.

johannes

2010-06-22 07:39:57

by Johannes Berg

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs

On Tue, 2010-06-22 at 09:33 +0200, Johannes Berg wrote:

> > > > net/wlan0/events/
> > > > net/waln1/events/
> > > > ....
> > > > net/walnN/events/
> > >
> > > That's not appropriate either though since you may have multiple network
> > > interfaces on the same hardware :)
> >
> > Doesn't net/wlan0...wlanN mean multiple network interfaces on the same
> > hardware?
>
> Yes, but the trace points aren't per network interface but rather per
> hardware piece.

Which really just means that whoever writes the tracepoint needs to
provide a struct device for where to put it (at least in the case of
driver tracepoints), and then ideally some description of the device
also gets put into the ringbuffer.

Assuming you actually want to have the event show up in sysfs twice if
it has multiple producers? I'd like that, it would make sense for a lot
of cases since you might only care about one of the producers.

johannes

2010-06-22 07:49:58

by Lin Ming

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs

On Tue, 2010-06-22 at 15:33 +0800, Johannes Berg wrote:
> On Tue, 2010-06-22 at 15:22 +0800, Lin Ming wrote:
>
> > > > net/wlan0/events/
> > > > net/waln1/events/
> > > > ....
> > > > net/walnN/events/
> > >
> > > That's not appropriate either though since you may have multiple network
> > > interfaces on the same hardware :)
> >
> > Doesn't net/wlan0...wlanN mean multiple network interfaces on the same
> > hardware?
>
> Yes, but the trace points aren't per network interface but rather per
> hardware piece.

So it can only trace the whole hardware piece rather than a specific
interface?

If yes, then will change to
- /sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/net/wlan0/events/iwlwifi_dev_ioread32/
+ /sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/events/iwlwifi_dev_ioread32/

>
> johannes
>

2010-06-22 07:53:03

by Johannes Berg

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs

On Tue, 2010-06-22 at 15:47 +0800, Lin Ming wrote:

> So it can only trace the whole hardware piece rather than a specific
> interface?
>
> If yes, then will change to
> - /sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/net/wlan0/events/iwlwifi_dev_ioread32/
> + /sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/events/iwlwifi_dev_ioread32/

Yes, I guess you were right from the start and I was just thinking about
it the wrong way. So presumably you'll somehow have to "instantiate"
tracepoints for a given device?

johannes

2010-06-22 08:06:21

by Lin Ming

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs

On Tue, 2010-06-22 at 15:39 +0800, Johannes Berg wrote:
> On Tue, 2010-06-22 at 09:33 +0200, Johannes Berg wrote:
>
> > > > > net/wlan0/events/
> > > > > net/waln1/events/
> > > > > ....
> > > > > net/walnN/events/
> > > >
> > > > That's not appropriate either though since you may have multiple network
> > > > interfaces on the same hardware :)
> > >
> > > Doesn't net/wlan0...wlanN mean multiple network interfaces on the same
> > > hardware?
> >
> > Yes, but the trace points aren't per network interface but rather per
> > hardware piece.
>
> Which really just means that whoever writes the tracepoint needs to
> provide a struct device for where to put it (at least in the case of
> driver tracepoints), and then ideally some description of the device
> also gets put into the ringbuffer.

I'm not familiar with tracepoint code. Correct me if I'm wrong.
Do you mean that, for example, if iwlwifi_dev_ioread32 event is traced,
then the "device" info will get put into the ftrace ringbuffer?

>
> Assuming you actually want to have the event show up in sysfs twice if
> it has multiple producers? I'd like that, it would make sense for a lot
> of cases since you might only care about one of the producers.

Yes, each producers has a "events" sysfs dir.

>
> johannes
>

2010-06-22 08:17:01

by Johannes Berg

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs

On Tue, 2010-06-22 at 16:04 +0800, Lin Ming wrote:

> > Which really just means that whoever writes the tracepoint needs to
> > provide a struct device for where to put it (at least in the case of
> > driver tracepoints), and then ideally some description of the device
> > also gets put into the ringbuffer.
>
> I'm not familiar with tracepoint code. Correct me if I'm wrong.
> Do you mean that, for example, if iwlwifi_dev_ioread32 event is traced,
> then the "device" info will get put into the ftrace ringbuffer?

No, right now it doesn't.

johannes

2010-06-24 09:37:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs


* Johannes Berg <[email protected]> wrote:

> On Tue, 2010-06-22 at 15:22 +0800, Lin Ming wrote:
>
> > > > net/wlan0/events/
> > > > net/waln1/events/
> > > > ....
> > > > net/walnN/events/
> > >
> > > That's not appropriate either though since you may have multiple network
> > > interfaces on the same hardware :)
> >
> > Doesn't net/wlan0...wlanN mean multiple network interfaces on the same
> > hardware?
>
> Yes, but the trace points aren't per network interface but rather per
> hardware piece.

Yeah - we generally want events to live at their 'natural' source in sysfs.

So if it's a per device hardware event, it should live with the hardware
piece. If it's a higher level chipset event, it should live where the chipset
driver is in sysfs. If it's a subsystem level event then it should live there.

I think what you mentioned in your other posting makes the most sense: give
flexibility to tracepoint authors to place the event in the most sensible
sysfs place. It is them who define the tracepoints and the events so any
second guessing by a generic layer will probably get in the way. The generic
tool layer will be content with having the event_source class in sysfs, to see
'all' event sources ttheir topological structure.

That's probably best achieved via a TRACE_EVENT() variant, by passing in the
sysfs location.

It might even make sense to make this a part of TRACE_EVENT() itself and make
'NULL' the current default, non-sysfs-enumerated behavior. That way we can
gradually (and non-intrusively) find all the right sysfs places for events.

Thanks,

Ingo

2010-06-24 16:15:10

by Johannes Berg

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs

On Thu, 2010-06-24 at 11:36 +0200, Ingo Molnar wrote:

> That's probably best achieved via a TRACE_EVENT() variant, by passing in the
> sysfs location.
>
> It might even make sense to make this a part of TRACE_EVENT() itself and make
> 'NULL' the current default, non-sysfs-enumerated behavior. That way we can
> gradually (and non-intrusively) find all the right sysfs places for events.

No, this doesn't work. A lot of events are multi-instance. Say you have
an event for each USB device. This event would have to show up in many
places in sysfs, and each trace_foo() invocation needs to get the struct
device pointer, not just the TRACE_EVENT() definition. Additionally, to
create/destroy the sysfs pieces we need something like
init_trace_foo(dev) and destroy_trace_foo(dev) be called when the sysfs
points for the device should be created/destroyed.

The TRACE_EVENT() just defines the template, but such multi-instance
events really should be standardised in terms of their struct device (or
maybe kobject).

I think that needs some TRACE_DEVICE_EVENT macro that creates the
required inlines etc, and including the init/destroy that are called
when the event should show up in sysfs.

There's no way you can have the event show up in sysfs at the right spot
with _just_ a TRACE_EVENT macro, since at define time in the header file
you don't even have a valid struct device pointer.

johannes

2010-06-24 17:33:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs


* Johannes Berg <[email protected]> wrote:

> On Thu, 2010-06-24 at 11:36 +0200, Ingo Molnar wrote:
>
> > That's probably best achieved via a TRACE_EVENT() variant, by passing in the
> > sysfs location.
> >
> > It might even make sense to make this a part of TRACE_EVENT() itself and make
> > 'NULL' the current default, non-sysfs-enumerated behavior. That way we can
> > gradually (and non-intrusively) find all the right sysfs places for events.
>
> No, this doesn't work. A lot of events are multi-instance. Say you have an
> event for each USB device. This event would have to show up in many places
> in sysfs, and each trace_foo() invocation needs to get the struct device
> pointer, not just the TRACE_EVENT() definition. Additionally, to
> create/destroy the sysfs pieces we need something like init_trace_foo(dev)
> and destroy_trace_foo(dev) be called when the sysfs points for the device
> should be created/destroyed.

Yes - but even this could be expressed via TRACE_EVENT(): by giving it a
device-specific function pointer and then instantiating individual events from
a single, central place in sysfs.

That is the place where we already know where it ends up in sysfs, and where
the event-specific function can match up whether that particular node belongs
to it and whether an additional event directory should be created for that
particular sysfs node.

> The TRACE_EVENT() just defines the template, but such multi-instance events
> really should be standardised in terms of their struct device (or maybe
> kobject).
>
> I think that needs some TRACE_DEVICE_EVENT macro that creates the required
> inlines etc, and including the init/destroy that are called when the event
> should show up in sysfs.
>
> There's no way you can have the event show up in sysfs at the right spot
> with _just_ a TRACE_EVENT macro, since at define time in the header file you
> don't even have a valid struct device pointer.

That would be another possible way to do it - to explicitly create the events
directory. It looks a bit simpler as we wouldnt have to touch TRACE_EVENT()
and because it directly expresses the 'this node has an events directory'
property at the place where we create the device node.

Thanks,

Ingo

2010-06-29 06:18:18

by Lin Ming

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs

On Fri, 2010-06-25 at 01:33 +0800, Ingo Molnar wrote:
> * Johannes Berg <[email protected]> wrote:
>
> > On Thu, 2010-06-24 at 11:36 +0200, Ingo Molnar wrote:
> >
> > > That's probably best achieved via a TRACE_EVENT() variant, by passing in the
> > > sysfs location.
> > >
> > > It might even make sense to make this a part of TRACE_EVENT() itself and make
> > > 'NULL' the current default, non-sysfs-enumerated behavior. That way we can
> > > gradually (and non-intrusively) find all the right sysfs places for events.
> >
> > No, this doesn't work. A lot of events are multi-instance. Say you have an
> > event for each USB device. This event would have to show up in many places
> > in sysfs, and each trace_foo() invocation needs to get the struct device
> > pointer, not just the TRACE_EVENT() definition. Additionally, to
> > create/destroy the sysfs pieces we need something like init_trace_foo(dev)
> > and destroy_trace_foo(dev) be called when the sysfs points for the device
> > should be created/destroyed.
>
> Yes - but even this could be expressed via TRACE_EVENT(): by giving it a
> device-specific function pointer and then instantiating individual events from
> a single, central place in sysfs.
>
> That is the place where we already know where it ends up in sysfs, and where
> the event-specific function can match up whether that particular node belongs
> to it and whether an additional event directory should be created for that
> particular sysfs node.
>
> > The TRACE_EVENT() just defines the template, but such multi-instance events
> > really should be standardised in terms of their struct device (or maybe
> > kobject).
> >
> > I think that needs some TRACE_DEVICE_EVENT macro that creates the required
> > inlines etc, and including the init/destroy that are called when the event
> > should show up in sysfs.
> >
> > There's no way you can have the event show up in sysfs at the right spot
> > with _just_ a TRACE_EVENT macro, since at define time in the header file you
> > don't even have a valid struct device pointer.
>
> That would be another possible way to do it - to explicitly create the events
> directory. It looks a bit simpler as we wouldnt have to touch TRACE_EVENT()
> and because it directly expresses the 'this node has an events directory'
> property at the place where we create the device node.

Let me take i915 tracepoints as an example.
Do you mean something like below?

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 423dc90..9e7e4a0 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -28,6 +28,7 @@
*/

#include <linux/device.h>
+#include <linux/perf_event.h>
#include "drmP.h"
#include "drm.h"
#include "i915_drm.h"
@@ -413,7 +414,17 @@ int i965_reset(struct drm_device *dev, u8 flags)
static int __devinit
i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
{
- return drm_get_dev(pdev, ent, &driver);
+ struct kobject *kobj;
+ int ret;
+
+ ret = drm_get_dev(pdev, ent, &driver);
+
+ if (!ret) {
+ kobj = &pdev->dev.kobj;
+ perf_sys_register_tp(kobj, "i915");
+ }
+
+ return ret;
}

static void
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 716f99b..2a6d834 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1019,6 +1019,8 @@ extern int perf_swevent_get_recursion_context(void);
extern void perf_swevent_put_recursion_context(int rctx);
extern void perf_event_enable(struct perf_event *event);
extern void perf_event_disable(struct perf_event *event);
+
+extern void perf_sys_register_tp(struct kobject *kobj, char *tp_system);
#else
static inline void
perf_event_task_sched_in(struct task_struct *task) { }
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 403d180..1b85dad 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -5877,3 +5877,32 @@ static int __init perf_event_sysfs_init(void)
&perfclass_attr_group);
}
device_initcall(perf_event_sysfs_init);
+
+#define for_each_event(event, start, end) \
+ for (event = start; \
+ (unsigned long)event < (unsigned long)end; \
+ event++)
+
+extern struct ftrace_event_call __start_ftrace_events[];
+extern struct ftrace_event_call __stop_ftrace_events[];
+
+void perf_sys_register_tp(struct kobject *kobj, char *tp_system)
+{
+ struct ftrace_event_call *call;
+ struct kobject *events_kobj;
+
+ events_kobj = kobject_create_and_add("events", kobj);
+ if (!events_kobj)
+ return;
+
+ for_each_event(call, __start_ftrace_events, __stop_ftrace_events) {
+ if (call->class->system && !strcmp(call->class->system, tp_system)) {
+
+ /* create events/<tracepoint> */
+ kobject_create_and_add(call->name, events_kobj);
+
+ /* create events/<tracepoint>/enable, filter, format, id */
+ /* TBD ... */
+ }
+ }
+}

2010-06-29 08:56:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs


* Lin Ming <[email protected]> wrote:

> On Fri, 2010-06-25 at 01:33 +0800, Ingo Molnar wrote:
> > * Johannes Berg <[email protected]> wrote:
> >
> > > On Thu, 2010-06-24 at 11:36 +0200, Ingo Molnar wrote:
> > >
> > > > That's probably best achieved via a TRACE_EVENT() variant, by passing in the
> > > > sysfs location.
> > > >
> > > > It might even make sense to make this a part of TRACE_EVENT() itself and make
> > > > 'NULL' the current default, non-sysfs-enumerated behavior. That way we can
> > > > gradually (and non-intrusively) find all the right sysfs places for events.
> > >
> > > No, this doesn't work. A lot of events are multi-instance. Say you have an
> > > event for each USB device. This event would have to show up in many places
> > > in sysfs, and each trace_foo() invocation needs to get the struct device
> > > pointer, not just the TRACE_EVENT() definition. Additionally, to
> > > create/destroy the sysfs pieces we need something like init_trace_foo(dev)
> > > and destroy_trace_foo(dev) be called when the sysfs points for the device
> > > should be created/destroyed.
> >
> > Yes - but even this could be expressed via TRACE_EVENT(): by giving it a
> > device-specific function pointer and then instantiating individual events from
> > a single, central place in sysfs.
> >
> > That is the place where we already know where it ends up in sysfs, and where
> > the event-specific function can match up whether that particular node belongs
> > to it and whether an additional event directory should be created for that
> > particular sysfs node.
> >
> > > The TRACE_EVENT() just defines the template, but such multi-instance events
> > > really should be standardised in terms of their struct device (or maybe
> > > kobject).
> > >
> > > I think that needs some TRACE_DEVICE_EVENT macro that creates the required
> > > inlines etc, and including the init/destroy that are called when the event
> > > should show up in sysfs.
> > >
> > > There's no way you can have the event show up in sysfs at the right spot
> > > with _just_ a TRACE_EVENT macro, since at define time in the header file you
> > > don't even have a valid struct device pointer.
> >
> > That would be another possible way to do it - to explicitly create the events
> > directory. It looks a bit simpler as we wouldnt have to touch TRACE_EVENT()
> > and because it directly expresses the 'this node has an events directory'
> > property at the place where we create the device node.
>
> Let me take i915 tracepoints as an example.
> Do you mean something like below?
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 423dc90..9e7e4a0 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -28,6 +28,7 @@
> */
>
> #include <linux/device.h>
> +#include <linux/perf_event.h>
> #include "drmP.h"
> #include "drm.h"
> #include "i915_drm.h"
> @@ -413,7 +414,17 @@ int i965_reset(struct drm_device *dev, u8 flags)
> static int __devinit
> i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> {
> - return drm_get_dev(pdev, ent, &driver);
> + struct kobject *kobj;
> + int ret;
> +
> + ret = drm_get_dev(pdev, ent, &driver);
> +
> + if (!ret) {
> + kobj = &pdev->dev.kobj;
> + perf_sys_register_tp(kobj, "i915");
> + }
> +
> + return ret;

Yeah, something like that - assuming that this means that we'll add the events
directory to the device directory, to all the
/sys/bus/pci/drivers/i915/*/events/ driver directories right? (i havent
checked the DRM code)

Small detail, it could be written a bit more compactly, like:

> + int ret;
> +
> + ret = drm_get_dev(pdev, ent, &driver);
> + if (!ret)
> + perf_sys_register_tp(&pdev->dev.kobj, "i915");
> +
> + return ret;

Also, we can (optionally) consider 'generic', subsystem level events to also
show up under:

/sys/bus/pci/drivers/i915/events/

This would give a model to non-device-specific events to be listed one level
higher in the sysfs hierarchy.

This too would be done in the driver, not by generic code. It's generally the
driver which knows how the events should be categorized.

I'd imagine something similar for wireless drivers as well - most currently
defined events would show up on a per device basis there.

Can you see practical problems with this scheme?

Ingo

2010-06-29 09:23:54

by Lin Ming

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs

On Tue, 2010-06-29 at 16:55 +0800, Ingo Molnar wrote:
> * Lin Ming <[email protected]> wrote:
>
> > On Fri, 2010-06-25 at 01:33 +0800, Ingo Molnar wrote:
> > > * Johannes Berg <[email protected]> wrote:
> > >
> > > > On Thu, 2010-06-24 at 11:36 +0200, Ingo Molnar wrote:
> > > >
> > > > > That's probably best achieved via a TRACE_EVENT() variant, by passing in the
> > > > > sysfs location.
> > > > >
> > > > > It might even make sense to make this a part of TRACE_EVENT() itself and make
> > > > > 'NULL' the current default, non-sysfs-enumerated behavior. That way we can
> > > > > gradually (and non-intrusively) find all the right sysfs places for events.
> > > >
> > > > No, this doesn't work. A lot of events are multi-instance. Say you have an
> > > > event for each USB device. This event would have to show up in many places
> > > > in sysfs, and each trace_foo() invocation needs to get the struct device
> > > > pointer, not just the TRACE_EVENT() definition. Additionally, to
> > > > create/destroy the sysfs pieces we need something like init_trace_foo(dev)
> > > > and destroy_trace_foo(dev) be called when the sysfs points for the device
> > > > should be created/destroyed.
> > >
> > > Yes - but even this could be expressed via TRACE_EVENT(): by giving it a
> > > device-specific function pointer and then instantiating individual events from
> > > a single, central place in sysfs.
> > >
> > > That is the place where we already know where it ends up in sysfs, and where
> > > the event-specific function can match up whether that particular node belongs
> > > to it and whether an additional event directory should be created for that
> > > particular sysfs node.
> > >
> > > > The TRACE_EVENT() just defines the template, but such multi-instance events
> > > > really should be standardised in terms of their struct device (or maybe
> > > > kobject).
> > > >
> > > > I think that needs some TRACE_DEVICE_EVENT macro that creates the required
> > > > inlines etc, and including the init/destroy that are called when the event
> > > > should show up in sysfs.
> > > >
> > > > There's no way you can have the event show up in sysfs at the right spot
> > > > with _just_ a TRACE_EVENT macro, since at define time in the header file you
> > > > don't even have a valid struct device pointer.
> > >
> > > That would be another possible way to do it - to explicitly create the events
> > > directory. It looks a bit simpler as we wouldnt have to touch TRACE_EVENT()
> > > and because it directly expresses the 'this node has an events directory'
> > > property at the place where we create the device node.
> >
> > Let me take i915 tracepoints as an example.
> > Do you mean something like below?
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index 423dc90..9e7e4a0 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -28,6 +28,7 @@
> > */
> >
> > #include <linux/device.h>
> > +#include <linux/perf_event.h>
> > #include "drmP.h"
> > #include "drm.h"
> > #include "i915_drm.h"
> > @@ -413,7 +414,17 @@ int i965_reset(struct drm_device *dev, u8 flags)
> > static int __devinit
> > i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> > {
> > - return drm_get_dev(pdev, ent, &driver);
> > + struct kobject *kobj;
> > + int ret;
> > +
> > + ret = drm_get_dev(pdev, ent, &driver);
> > +
> > + if (!ret) {
> > + kobj = &pdev->dev.kobj;
> > + perf_sys_register_tp(kobj, "i915");
> > + }
> > +
> > + return ret;
>
> Yeah, something like that - assuming that this means that we'll add the events
> directory to the device directory, to all the
> /sys/bus/pci/drivers/i915/*/events/ driver directories right? (i havent
> checked the DRM code)

I haven't run the code, but I think yes.

>
> Small detail, it could be written a bit more compactly, like:

Thanks for the tip.

>
> > + int ret;
> > +
> > + ret = drm_get_dev(pdev, ent, &driver);
> > + if (!ret)
> > + perf_sys_register_tp(&pdev->dev.kobj, "i915");
> > +
> > + return ret;
>
> Also, we can (optionally) consider 'generic', subsystem level events to also
> show up under:
>
> /sys/bus/pci/drivers/i915/events/
>
> This would give a model to non-device-specific events to be listed one level
> higher in the sysfs hierarchy.
>
> This too would be done in the driver, not by generic code. It's generally the
> driver which knows how the events should be categorized.

This is a bit difficult. I'd like not to touch TRACE_EVENT().
How does the driver know if an event is 'generic' if TRACE_EVENT is not
touched?

>
> I'd imagine something similar for wireless drivers as well - most currently
> defined events would show up on a per device basis there.
>
> Can you see practical problems with this scheme?

Not now. I may find some problems when write more detail code.

Lin Ming

>
> Ingo

2010-06-29 10:27:40

by Ingo Molnar

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs


* Lin Ming <[email protected]> wrote:

> > Also, we can (optionally) consider 'generic', subsystem level events to
> > also show up under:
> >
> > /sys/bus/pci/drivers/i915/events/
> >
> > This would give a model to non-device-specific events to be listed one
> > level higher in the sysfs hierarchy.
> >
> > This too would be done in the driver, not by generic code. It's generally
> > the driver which knows how the events should be categorized.
>
> This is a bit difficult. I'd like not to touch TRACE_EVENT(). [...]

We can certainly start with the simpler variant - it's also the more common
case.

> [...] How does the driver know if an event is 'generic' if TRACE_EVENT is
> not touched?

Well, it's per driver code which creates the 'events' directory anyway, so
that code decides where to link things. It can link it to the per driver kobj
- or to the per subsys kobj.

> > I'd imagine something similar for wireless drivers as well - most
> > currently defined events would show up on a per device basis there.
> >
> > Can you see practical problems with this scheme?
>
> Not now. I may find some problems when write more detail code.

Ok. Feel free to post RFC patches (even if they are not fully complete yet),
so that we can see how things are progressing.

I suspect the best approach would be to try to figure out the right sysfs
placement for one or two existing driver tracepoints, so that we can see it
all in practice. (Obviously any changes to drivers will have to go via the
relevant driver maintainer tree(s).)

Thanks,

Ingo

2010-07-02 08:06:21

by Lin Ming

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs

On Tue, 2010-06-29 at 18:26 +0800, Ingo Molnar wrote:
> * Lin Ming <[email protected]> wrote:
>
> > > Also, we can (optionally) consider 'generic', subsystem level events to
> > > also show up under:
> > >
> > > /sys/bus/pci/drivers/i915/events/
> > >
> > > This would give a model to non-device-specific events to be listed one
> > > level higher in the sysfs hierarchy.
> > >
> > > This too would be done in the driver, not by generic code. It's generally
> > > the driver which knows how the events should be categorized.
> >
> > This is a bit difficult. I'd like not to touch TRACE_EVENT(). [...]
>
> We can certainly start with the simpler variant - it's also the more common
> case.
>
> > [...] How does the driver know if an event is 'generic' if TRACE_EVENT is
> > not touched?
>
> Well, it's per driver code which creates the 'events' directory anyway, so
> that code decides where to link things. It can link it to the per driver kobj
> - or to the per subsys kobj.
>
> > > I'd imagine something similar for wireless drivers as well - most
> > > currently defined events would show up on a per device basis there.
> > >
> > > Can you see practical problems with this scheme?
> >
> > Not now. I may find some problems when write more detail code.
>
> Ok. Feel free to post RFC patches (even if they are not fully complete yet),
> so that we can see how things are progressing.
>
> I suspect the best approach would be to try to figure out the right sysfs
> placement for one or two existing driver tracepoints, so that we can see it
> all in practice. (Obviously any changes to drivers will have to go via the
> relevant driver maintainer tree(s).)

Well, take i915 tracepoints as an example, the sys structures as below

/sys/class/drm/card0/events/
|-- i915_gem_object_bind
| |-- enable
| |-- filter
| |-- format
| `-- id
|-- i915_gem_object_change_domain
| |-- enable
| |-- filter
| |-- format
| `-- id
|-- i915_gem_object_clflush
| |-- enable
| |-- filter
| |-- format
| `-- id
|-- i915_gem_object_create
| |-- enable
| |-- filter
| |-- format
| `-- id
|-- i915_gem_object_destroy
| |-- enable
| |-- filter
| |-- format
| `-- id
|-- i915_gem_object_get_fence
| |-- enable
| |-- filter
| |-- format
| `-- id
|-- i915_gem_object_unbind
| |-- enable
| |-- filter
| |-- format
| `-- id
|-- i915_gem_request_complete
| |-- enable
| |-- filter
| |-- format
| `-- id
|-- i915_gem_request_flush
| |-- enable
| |-- filter
| |-- format
| `-- id
|-- i915_gem_request_retire
| |-- enable
| |-- filter
| |-- format
| `-- id
|-- i915_gem_request_submit
| |-- enable
| |-- filter
| |-- format
| `-- id
|-- i915_gem_request_wait_begin
| |-- enable
| |-- filter
| |-- format
| `-- id
|-- i915_gem_request_wait_end
| |-- enable
| |-- filter
| |-- format
| `-- id
|-- i915_ring_wait_begin
| |-- enable
| |-- filter
| |-- format
| `-- id
`-- i915_ring_wait_end
|-- enable
|-- filter
|-- format
`-- id

And below is the very draft patch to export i915 tracepoints in sysfs.
Is it the right direction?

---
drivers/gpu/drm/i915/i915_drv.c | 15 +++-
include/linux/perf_event.h | 2 +
kernel/perf_event.c | 168 +++++++++++++++++++++++++++++++++++++++
4 files changed, 186 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 423dc90..eb7fa9e 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -28,6 +28,7 @@
*/

#include <linux/device.h>
+#include <linux/perf_event.h>
#include "drmP.h"
#include "drm.h"
#include "i915_drm.h"
@@ -413,7 +414,19 @@ int i965_reset(struct drm_device *dev, u8 flags)
static int __devinit
i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
{
- return drm_get_dev(pdev, ent, &driver);
+ struct kobject *kobj;
+ struct drm_device *drm_dev;
+ int ret;
+
+ ret = drm_get_dev(pdev, ent, &driver);
+
+ if (!ret) {
+ drm_dev = pci_get_drvdata(pdev);
+ kobj = &drm_dev->primary->kdev.kobj;
+ perf_sys_register_tp(kobj, "i915");
+ }
+
+ return ret;
}

static void
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 716f99b..2a6d834 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1019,6 +1019,8 @@ extern int perf_swevent_get_recursion_context(void);
extern void perf_swevent_put_recursion_context(int rctx);
extern void perf_event_enable(struct perf_event *event);
extern void perf_event_disable(struct perf_event *event);
+
+extern void perf_sys_register_tp(struct kobject *kobj, char *tp_system);
#else
static inline void
perf_event_task_sched_in(struct task_struct *task) { }
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 403d180..068ee48 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -5877,3 +5877,171 @@ static int __init perf_event_sysfs_init(void)
&perfclass_attr_group);
}
device_initcall(perf_event_sysfs_init);
+
+#define for_each_event(event, start, end) \
+ for (event = start; \
+ (unsigned long)event < (unsigned long)end; \
+ event++)
+
+extern struct ftrace_event_call __start_ftrace_events[];
+extern struct ftrace_event_call __stop_ftrace_events[];
+extern void print_event_filter(struct ftrace_event_call *call,
+ struct trace_seq *s);
+
+struct tp_kobject {
+ struct kobject *kobj;
+ struct ftrace_event_call *call;
+ struct tp_kobject *next;
+};
+
+static struct tp_kobject *tp_kobject_list;
+
+static struct ftrace_event_call *perf_sys_find_tp_call(struct kobject *kobj)
+{
+ struct tp_kobject *tp_kobj;
+
+ tp_kobj = tp_kobject_list;
+
+ while (tp_kobj) {
+ if (kobj == tp_kobj->kobj)
+ return tp_kobj->call;
+
+ tp_kobj = tp_kobj->next;
+ }
+
+ return NULL;
+}
+
+#define TP_ATTR_RO(_name) \
+ static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
+
+#define TP_ATTR(_name) \
+ static struct kobj_attribute _name##_attr = \
+ __ATTR(_name, 0644, _name##_show, _name##_store)
+
+static ssize_t enable_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct ftrace_event_call *call;
+
+ call = perf_sys_find_tp_call(kobj);
+ return sprintf(buf, "%d\n", call->flags & TRACE_EVENT_FL_ENABLED);
+}
+
+static ssize_t enable_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t len)
+{
+ /* Not implemented yet */
+
+ return 0;
+}
+TP_ATTR(enable);
+
+static ssize_t filter_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct ftrace_event_call *call;
+ struct trace_seq *s;
+
+ call = perf_sys_find_tp_call(kobj);
+
+ s = kmalloc(sizeof(*s), GFP_KERNEL);
+ if (!s)
+ return -ENOMEM;
+
+ trace_seq_init(s);
+
+ print_event_filter(call, s);
+
+ memcpy(buf, s->buffer, s->len);
+
+ kfree(s);
+
+ return s->len;
+}
+
+static ssize_t filter_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t len)
+{
+ /* Not implemented yet */
+
+ return 0;
+}
+TP_ATTR(filter);
+
+static ssize_t format_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ /* Not implemented yet */
+
+ return 0;
+}
+TP_ATTR_RO(format);
+
+static ssize_t id_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct ftrace_event_call *call;
+
+ call = perf_sys_find_tp_call(kobj);
+
+ return sprintf(buf, "%d\n", call->event.type);
+}
+TP_ATTR_RO(id);
+
+static struct attribute *tp_attrs[] = {
+ &enable_attr.attr,
+ &filter_attr.attr,
+ &format_attr.attr,
+ &id_attr.attr,
+ NULL,
+};
+
+static struct attribute_group tp_attr_group = {
+ .attrs = tp_attrs,
+};
+
+static int perf_sys_add_tp(struct kobject *parent, struct ftrace_event_call *call)
+{
+ struct tp_kobject *tp_kobj;
+ struct kobject *event_kobj;
+ int err;
+
+ event_kobj = kobject_create_and_add(call->name, parent);
+ if (!event_kobj)
+ return -ENOMEM;
+ err = sysfs_create_group(event_kobj, &tp_attr_group);
+ if (err) {
+ kobject_put(event_kobj);
+ return -ENOMEM;
+ }
+
+ tp_kobj = kmalloc(sizeof(*tp_kobj), GFP_KERNEL);
+ if (!tp_kobj) {
+ kobject_put(event_kobj);
+ return -ENOMEM;
+ }
+
+ tp_kobj->kobj = event_kobj;
+ tp_kobj->call = call;
+ tp_kobj->next = tp_kobject_list;
+ tp_kobject_list = tp_kobj;
+
+ return 0;
+}
+
+void perf_sys_register_tp(struct kobject *kobj, char *tp_system)
+{
+ struct ftrace_event_call *call;
+ struct kobject *events_kobj;
+
+ events_kobj = kobject_create_and_add("events", kobj);
+ if (!events_kobj)
+ return;
+
+ for_each_event(call, __start_ftrace_events, __stop_ftrace_events) {
+ if (call->class->system && !strcmp(call->class->system, tp_system)) {
+ perf_sys_add_tp(events_kobj, call);
+ }
+ }
+}

2010-07-03 12:55:41

by Ingo Molnar

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs


* Lin Ming <[email protected]> wrote:

> On Tue, 2010-06-29 at 18:26 +0800, Ingo Molnar wrote:
> > * Lin Ming <[email protected]> wrote:
> >
> > > > Also, we can (optionally) consider 'generic', subsystem level events to
> > > > also show up under:
> > > >
> > > > /sys/bus/pci/drivers/i915/events/
> > > >
> > > > This would give a model to non-device-specific events to be listed one
> > > > level higher in the sysfs hierarchy.
> > > >
> > > > This too would be done in the driver, not by generic code. It's generally
> > > > the driver which knows how the events should be categorized.
> > >
> > > This is a bit difficult. I'd like not to touch TRACE_EVENT(). [...]
> >
> > We can certainly start with the simpler variant - it's also the more common
> > case.
> >
> > > [...] How does the driver know if an event is 'generic' if TRACE_EVENT is
> > > not touched?
> >
> > Well, it's per driver code which creates the 'events' directory anyway, so
> > that code decides where to link things. It can link it to the per driver kobj
> > - or to the per subsys kobj.
> >
> > > > I'd imagine something similar for wireless drivers as well - most
> > > > currently defined events would show up on a per device basis there.
> > > >
> > > > Can you see practical problems with this scheme?
> > >
> > > Not now. I may find some problems when write more detail code.
> >
> > Ok. Feel free to post RFC patches (even if they are not fully complete yet),
> > so that we can see how things are progressing.
> >
> > I suspect the best approach would be to try to figure out the right sysfs
> > placement for one or two existing driver tracepoints, so that we can see it
> > all in practice. (Obviously any changes to drivers will have to go via the
> > relevant driver maintainer tree(s).)
>
> Well, take i915 tracepoints as an example, the sys structures as below
>
> /sys/class/drm/card0/events/
> |-- i915_gem_object_bind
> | |-- enable
> | |-- filter
> | |-- format
> | `-- id
> |-- i915_gem_object_change_domain
> | |-- enable
> | |-- filter
> | |-- format
> | `-- id
> |-- i915_gem_object_clflush
> | |-- enable
> | |-- filter
> | |-- format
> | `-- id
> |-- i915_gem_object_create
> | |-- enable
> | |-- filter
> | |-- format
> | `-- id
> |-- i915_gem_object_destroy
> | |-- enable
> | |-- filter
> | |-- format
> | `-- id
> |-- i915_gem_object_get_fence
> | |-- enable
> | |-- filter
> | |-- format
> | `-- id
> |-- i915_gem_object_unbind
> | |-- enable
> | |-- filter
> | |-- format
> | `-- id
> |-- i915_gem_request_complete
> | |-- enable
> | |-- filter
> | |-- format
> | `-- id
> |-- i915_gem_request_flush
> | |-- enable
> | |-- filter
> | |-- format
> | `-- id
> |-- i915_gem_request_retire
> | |-- enable
> | |-- filter
> | |-- format
> | `-- id
> |-- i915_gem_request_submit
> | |-- enable
> | |-- filter
> | |-- format
> | `-- id
> |-- i915_gem_request_wait_begin
> | |-- enable
> | |-- filter
> | |-- format
> | `-- id
> |-- i915_gem_request_wait_end
> | |-- enable
> | |-- filter
> | |-- format
> | `-- id
> |-- i915_ring_wait_begin
> | |-- enable
> | |-- filter
> | |-- format
> | `-- id
> `-- i915_ring_wait_end
> |-- enable
> |-- filter
> |-- format
> `-- id
>
> And below is the very draft patch to export i915 tracepoints in sysfs.
> Is it the right direction?

Yeah, i think so.

The per driver impact is small and to the point:

> drivers/gpu/drm/i915/i915_drv.c | 15 +++-

> i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> {
> - return drm_get_dev(pdev, ent, &driver);
> + struct kobject *kobj;
> + struct drm_device *drm_dev;
> + int ret;
> +
> + ret = drm_get_dev(pdev, ent, &driver);
> +
> + if (!ret) {
> + drm_dev = pci_get_drvdata(pdev);
> + kobj = &drm_dev->primary->kdev.kobj;
> + perf_sys_register_tp(kobj, "i915");
> + }
> +
> + return ret;

(It could be even shorter - the same compactness comment as i made last time
still holds for this function.)

Thanks,

Ingo

2010-07-17 00:20:39

by Corey Ashford

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs

On 07/02/2010 01:06 AM, Lin Ming wrote:
> On Tue, 2010-06-29 at 18:26 +0800, Ingo Molnar wrote:
>> * Lin Ming<[email protected]> wrote:
>>
>>>> Also, we can (optionally) consider 'generic', subsystem level events to
>>>> also show up under:
>>>>
>>>> /sys/bus/pci/drivers/i915/events/
>>>>
>>>> This would give a model to non-device-specific events to be listed one
>>>> level higher in the sysfs hierarchy.
>>>>
>>>> This too would be done in the driver, not by generic code. It's generally
>>>> the driver which knows how the events should be categorized.
>>>
>>> This is a bit difficult. I'd like not to touch TRACE_EVENT(). [...]
>>
>> We can certainly start with the simpler variant - it's also the more common
>> case.
>>
>>> [...] How does the driver know if an event is 'generic' if TRACE_EVENT is
>>> not touched?
>>
>> Well, it's per driver code which creates the 'events' directory anyway, so
>> that code decides where to link things. It can link it to the per driver kobj
>> - or to the per subsys kobj.
>>
>>>> I'd imagine something similar for wireless drivers as well - most
>>>> currently defined events would show up on a per device basis there.
>>>>
>>>> Can you see practical problems with this scheme?
>>>
>>> Not now. I may find some problems when write more detail code.
>>
>> Ok. Feel free to post RFC patches (even if they are not fully complete yet),
>> so that we can see how things are progressing.
>>
>> I suspect the best approach would be to try to figure out the right sysfs
>> placement for one or two existing driver tracepoints, so that we can see it
>> all in practice. (Obviously any changes to drivers will have to go via the
>> relevant driver maintainer tree(s).)
>
> Well, take i915 tracepoints as an example, the sys structures as below
>
> /sys/class/drm/card0/events/
> |-- i915_gem_object_bind
> | |-- enable
> | |-- filter
> | |-- format
> | `-- id
...

Hi Lin,

Sorry for my late reply on this thread. I had missed these posts
earlier because I had an email filter that was set to look for messages
with "perf" in the subject, and so I missed this entire thread.

With your example here, let's say I want to open this event with the
perf_events ABI... how would I go about doing that? Have you figured
out whether the caller would read the id and pass that into the
interface, or perhaps pass in the fd of the id file (or perhaps the fd
of the specific event directory).

Also, I see the filter and format fields here. Would the caller write
to these fields to set them up? What's the format of the data that's
written to them? Would it be totally device dependent? It seems like
there should be a way for a user space tool to discover what can be
programmed into the filter and format fields.

- Corey

2010-07-20 05:48:16

by Lin Ming

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs

On Sat, 2010-07-17 at 08:20 +0800, Corey Ashford wrote:
> On 07/02/2010 01:06 AM, Lin Ming wrote:
> > On Tue, 2010-06-29 at 18:26 +0800, Ingo Molnar wrote:
> >> * Lin Ming<[email protected]> wrote:
> >>
> >>>> Also, we can (optionally) consider 'generic', subsystem level events to
> >>>> also show up under:
> >>>>
> >>>> /sys/bus/pci/drivers/i915/events/
> >>>>
> >>>> This would give a model to non-device-specific events to be listed one
> >>>> level higher in the sysfs hierarchy.
> >>>>
> >>>> This too would be done in the driver, not by generic code. It's generally
> >>>> the driver which knows how the events should be categorized.
> >>>
> >>> This is a bit difficult. I'd like not to touch TRACE_EVENT(). [...]
> >>
> >> We can certainly start with the simpler variant - it's also the more common
> >> case.
> >>
> >>> [...] How does the driver know if an event is 'generic' if TRACE_EVENT is
> >>> not touched?
> >>
> >> Well, it's per driver code which creates the 'events' directory anyway, so
> >> that code decides where to link things. It can link it to the per driver kobj
> >> - or to the per subsys kobj.
> >>
> >>>> I'd imagine something similar for wireless drivers as well - most
> >>>> currently defined events would show up on a per device basis there.
> >>>>
> >>>> Can you see practical problems with this scheme?
> >>>
> >>> Not now. I may find some problems when write more detail code.
> >>
> >> Ok. Feel free to post RFC patches (even if they are not fully complete yet),
> >> so that we can see how things are progressing.
> >>
> >> I suspect the best approach would be to try to figure out the right sysfs
> >> placement for one or two existing driver tracepoints, so that we can see it
> >> all in practice. (Obviously any changes to drivers will have to go via the
> >> relevant driver maintainer tree(s).)
> >
> > Well, take i915 tracepoints as an example, the sys structures as below
> >
> > /sys/class/drm/card0/events/
> > |-- i915_gem_object_bind
> > | |-- enable
> > | |-- filter
> > | |-- format
> > | `-- id
> ...
>
> Hi Lin,
>
> Sorry for my late reply on this thread. I had missed these posts
> earlier because I had an email filter that was set to look for messages
> with "perf" in the subject, and so I missed this entire thread.

Sorry for my late reply too.
I have been busy with some other stuff. Hope I can send a more
functional patches this week.

>
> With your example here, let's say I want to open this event with the
> perf_events ABI... how would I go about doing that? Have you figured
> out whether the caller would read the id and pass that into the
> interface, or perhaps pass in the fd of the id file (or perhaps the fd
> of the specific event directory).

Please just ignore my above example. Now I have some uncompleted new
patches to export hardware/software/tracepoint events via sysfs, like
below.

The event path is passed in with perf's "-e" option, for example
perf record -e /sys/kernel/events/page-faults -- <some commands>

The caller reads config and type and pass them into perf_event_attr.

1. Hardware events
/sys/devices/system/cpu/cpu0...cpuN/events
|-- L1-dcache-load-misses ===> event name
| |-- config ===> config value for the event
| `-- type ===> event type
|-- cycles
| |-- config
| `-- type
.....

2. Software events
/sys/kernel/events
|-- page-faults
| |-- config
| `-- type
|-- context-switches
| |-- config
| `-- type
....

3. Tracepoint events
/sys/devices/pci0000:00/0000:00:02.0/events
|-- i915_gem_object_create
| |-- config
| `-- type
|-- i915_gem_object_bind
| |-- config
| `-- type
....
....
/sys/devices/system/kvm/kvm0/events
|-- kvm_entry
| |-- config
| `-- type
|-- kvm_hypercall
| |-- config
| `-- type
....
....

>
> Also, I see the filter and format fields here. Would the caller write
> to these fields to set them up? What's the format of the data that's
> written to them? Would it be totally device dependent? It seems like
> there should be a way for a user space tool to discover what can be
> programmed into the filter and format fields.

Now only read-only event attributes(config and type) are exported.
I want to first make some minimal functional patches. Then to implement
the complex writable attributes.

Lin Ming

Subject: Re: [rfc] Describe events in a structured way via sysfs

On 20.07.10 01:48:28, Lin Ming wrote:
The caller reads config and type and pass them into perf_event_attr.
>
> 1. Hardware events
> /sys/devices/system/cpu/cpu0...cpuN/events
> |-- L1-dcache-load-misses ===> event name
> | |-- config ===> config value for the event
> | `-- type ===> event type

Wouldn't it be much easier to have a unique sysfs id (could be an
u64):

> |-- L1-dcache-load-misses ===> event name
> | `-- id ===> event id

... and then extend the syscall to enable an event by its sysfs id:

memset(&attr, 0, sizeof(attr));
attr.type = PERF_TYPE_SYSFS;
attr.sysfs_id = sysfs_id;
attr.sample_type = PERF_SAMPLE_CPU | PERF_SAMPLE_RAW;
attr.config = config;
...

The kerrnel then knows which event is meant and the don't have to
provide event specific paramaters such as type/config that requires an
event specific setup. The advantage would be that we can open an event
file descriptor of every kind of event in a standardized way.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

2010-07-20 17:44:25

by Corey Ashford

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs

On 07/19/2010 10:48 PM, Lin Ming wrote:
> On Sat, 2010-07-17 at 08:20 +0800, Corey Ashford wrote:
>> On 07/02/2010 01:06 AM, Lin Ming wrote:
>>> On Tue, 2010-06-29 at 18:26 +0800, Ingo Molnar wrote:
>>>> * Lin Ming<[email protected]> wrote:
>>>>
>>>>>> Also, we can (optionally) consider 'generic', subsystem level events to
>>>>>> also show up under:
>>>>>>
>>>>>> /sys/bus/pci/drivers/i915/events/
>>>>>>
>>>>>> This would give a model to non-device-specific events to be listed one
>>>>>> level higher in the sysfs hierarchy.
>>>>>>
>>>>>> This too would be done in the driver, not by generic code. It's generally
>>>>>> the driver which knows how the events should be categorized.
>>>>>
>>>>> This is a bit difficult. I'd like not to touch TRACE_EVENT(). [...]
>>>>
>>>> We can certainly start with the simpler variant - it's also the more common
>>>> case.
>>>>
>>>>> [...] How does the driver know if an event is 'generic' if TRACE_EVENT is
>>>>> not touched?
>>>>
>>>> Well, it's per driver code which creates the 'events' directory anyway, so
>>>> that code decides where to link things. It can link it to the per driver kobj
>>>> - or to the per subsys kobj.
>>>>
>>>>>> I'd imagine something similar for wireless drivers as well - most
>>>>>> currently defined events would show up on a per device basis there.
>>>>>>
>>>>>> Can you see practical problems with this scheme?
>>>>>
>>>>> Not now. I may find some problems when write more detail code.
>>>>
>>>> Ok. Feel free to post RFC patches (even if they are not fully complete yet),
>>>> so that we can see how things are progressing.
>>>>
>>>> I suspect the best approach would be to try to figure out the right sysfs
>>>> placement for one or two existing driver tracepoints, so that we can see it
>>>> all in practice. (Obviously any changes to drivers will have to go via the
>>>> relevant driver maintainer tree(s).)
>>>
>>> Well, take i915 tracepoints as an example, the sys structures as below
>>>
>>> /sys/class/drm/card0/events/
>>> |-- i915_gem_object_bind
>>> | |-- enable
>>> | |-- filter
>>> | |-- format
>>> | `-- id
>> ...
>>
>> Hi Lin,
>>
>> Sorry for my late reply on this thread. I had missed these posts
>> earlier because I had an email filter that was set to look for messages
>> with "perf" in the subject, and so I missed this entire thread.
>
> Sorry for my late reply too.
> I have been busy with some other stuff. Hope I can send a more
> functional patches this week.
>
>>
>> With your example here, let's say I want to open this event with the
>> perf_events ABI... how would I go about doing that? Have you figured
>> out whether the caller would read the id and pass that into the
>> interface, or perhaps pass in the fd of the id file (or perhaps the fd
>> of the specific event directory).
>
> Please just ignore my above example. Now I have some uncompleted new
> patches to export hardware/software/tracepoint events via sysfs, like
> below.
>
> The event path is passed in with perf's "-e" option, for example
> perf record -e /sys/kernel/events/page-faults --<some commands>
>
> The caller reads config and type and pass them into perf_event_attr.
>
> 1. Hardware events
> /sys/devices/system/cpu/cpu0...cpuN/events
> |-- L1-dcache-load-misses ===> event name
> | |-- config ===> config value for the event
> | `-- type ===> event type
> |-- cycles
> | |-- config
> | `-- type
> .....
>
> 2. Software events
> /sys/kernel/events
> |-- page-faults
> | |-- config
> | `-- type
> |-- context-switches
> | |-- config
> | `-- type
> ....
>
> 3. Tracepoint events
> /sys/devices/pci0000:00/0000:00:02.0/events
> |-- i915_gem_object_create
> | |-- config
> | `-- type
> |-- i915_gem_object_bind
> | |-- config
> | `-- type
> ....
> ....
> /sys/devices/system/kvm/kvm0/events
> |-- kvm_entry
> | |-- config
> | `-- type
> |-- kvm_hypercall
> | |-- config
> | `-- type
> ....
> ....
>
>>
>> Also, I see the filter and format fields here. Would the caller write
>> to these fields to set them up? What's the format of the data that's
>> written to them? Would it be totally device dependent? It seems like
>> there should be a way for a user space tool to discover what can be
>> programmed into the filter and format fields.
>
> Now only read-only event attributes(config and type) are exported.
> I want to first make some minimal functional patches. Then to implement
> the complex writable attributes.

I'm not seeing the value of writable attributes in sysfs at this point.
Wouldn't that disconnect the event opening between the syscall and the
writing of attributes in user space, with no real way to tie them
together? For example, what if two users wrote to the same attribute
with different values... which one would take precedence when you go to
do the open syscall? I think all of the attribute data should be in the
open call, and sysfs should be read-only.

Earlier, I briefly presented an idea that would allow a caller to read
attribute formatting information, such as a shift and mask value, which
would allow the caller to build up a more complex .config value,
possibly extending into a new attr field - .config_extra[n] as dictated
by the shift value; shift values greater than 63 would place the
attribute into .config_extra[shift amount / 64] shifted by shift amount
% 64. It's not the prettiest interface, but I think it could work and
would be extensible.

- Corey

2010-07-20 17:50:22

by Corey Ashford

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs

On 07/20/2010 08:19 AM, Robert Richter wrote:
> On 20.07.10 01:48:28, Lin Ming wrote:
> The caller reads config and type and pass them into perf_event_attr.
>>
>> 1. Hardware events
>> /sys/devices/system/cpu/cpu0...cpuN/events
>> |-- L1-dcache-load-misses ===> event name
>> | |-- config ===> config value for the event
>> | `-- type ===> event type
>
> Wouldn't it be much easier to have a unique sysfs id (could be an
> u64):
>
>> |-- L1-dcache-load-misses ===> event name
>> | `-- id ===> event id
>
> ... and then extend the syscall to enable an event by its sysfs id:
>
> memset(&attr, 0, sizeof(attr));
> attr.type = PERF_TYPE_SYSFS;
> attr.sysfs_id = sysfs_id;
> attr.sample_type = PERF_SAMPLE_CPU | PERF_SAMPLE_RAW;
> attr.config = config;
> ...
>
> The kerrnel then knows which event is meant and the don't have to
> provide event specific paramaters such as type/config that requires an
> event specific setup. The advantage would be that we can open an event
> file descriptor of every kind of event in a standardized way.

Your example above still shows the .config member being set. Was that
intentional?

Maybe another way to accomplish this would be to reuse the .config field
for the sysfs_id.

We still need a way to deal with event attributes though, so something
more than a single sysfs_id would be needed to specify the event completely.

- Corey

Subject: Re: [rfc] Describe events in a structured way via sysfs

On 20.07.10 13:50:01, Corey Ashford wrote:

> > ... and then extend the syscall to enable an event by its sysfs id:
> >
> > memset(&attr, 0, sizeof(attr));
> > attr.type = PERF_TYPE_SYSFS;
> > attr.sysfs_id = sysfs_id;
> > attr.sample_type = PERF_SAMPLE_CPU | PERF_SAMPLE_RAW;
> > attr.config = config;
> > ...

> Your example above still shows the .config member being set. Was that
> intentional?
>
> Maybe another way to accomplish this would be to reuse the .config field
> for the sysfs_id.

This was intended as this could be used to configure the event,
otherwise there is no way to setup the event with certain
parameters. The config value will be event specific then and we can be
sure the parameter belongs to _this_ kind of event.

> We still need a way to deal with event attributes though, so something
> more than a single sysfs_id would be needed to specify the event completely.

It is true that you still need knowledge of what the event is
measuring and how it is set up or configured. Maybe the configuration
may left blank if the event can be setup without it. But with this
approach you can get file descriptors for every event a user may be
interested in simply by looking into sysfs.

For example, I was thinking of perfctr events vs. ibs events. The cpu
could setup something like:

/sys/devices/system/cpu/cpu0...cpuN/events/perfctr/id
/sys/devices/system/cpu/cpu0...cpuN/events/ibs_op/id

Both events are setup with one 64 bit config value that is basically
the event's configuration msr (x86 perfctr or AMD IBS). These are
definded in the hardware specifications. Its formats differ. You could
then open the event file descriptor using the sysfs id and use the
config value to customize the event. You don't have a complicated
setup or implementation to detect which kind of event you want to use
as the id indicates the type of event.

Actually, we could setup e.g. also trace events with this mechanism.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

2010-07-20 21:18:51

by Corey Ashford

[permalink] [raw]
Subject: Re: [rfc] Describe events in a structured way via sysfs

On 07/20/2010 11:30 AM, Robert Richter wrote:
> On 20.07.10 13:50:01, Corey Ashford wrote:
>
>>> ... and then extend the syscall to enable an event by its sysfs id:
>>>
>>> memset(&attr, 0, sizeof(attr));
>>> attr.type = PERF_TYPE_SYSFS;
>>> attr.sysfs_id = sysfs_id;
>>> attr.sample_type = PERF_SAMPLE_CPU | PERF_SAMPLE_RAW;
>>> attr.config = config;
>>> ...
>
>> Your example above still shows the .config member being set. Was that
>> intentional?
>>
>> Maybe another way to accomplish this would be to reuse the .config field
>> for the sysfs_id.
>
> This was intended as this could be used to configure the event,
> otherwise there is no way to setup the event with certain
> parameters. The config value will be event specific then and we can be
> sure the parameter belongs to _this_ kind of event.
>
>> We still need a way to deal with event attributes though, so something
>> more than a single sysfs_id would be needed to specify the event completely.
>
> It is true that you still need knowledge of what the event is
> measuring and how it is set up or configured. Maybe the configuration
> may left blank if the event can be setup without it. But with this
> approach you can get file descriptors for every event a user may be
> interested in simply by looking into sysfs.
>

Yes, that would be a nice feature.

> For example, I was thinking of perfctr events vs. ibs events. The cpu
> could setup something like:
>
> /sys/devices/system/cpu/cpu0...cpuN/events/perfctr/id
> /sys/devices/system/cpu/cpu0...cpuN/events/ibs_op/id
>
> Both events are setup with one 64 bit config value that is basically
> the event's configuration msr (x86 perfctr or AMD IBS). These are
> definded in the hardware specifications. Its formats differ. You could
> then open the event file descriptor using the sysfs id and use the
> config value to customize the event. You don't have a complicated
> setup or implementation to detect which kind of event you want to use
> as the id indicates the type of event.
>
> Actually, we could setup e.g. also trace events with this mechanism.

In perf_events, as I recall, they started out with a combined type and
config field, but it quickly became obvious that config was going to get
too crowded even with 64 bits available, so they were split up into
separate type and config fields. I fear that's what would happen to the
sysfs_id value as well... it would be too crowded.

Retaining the type and config nodes in sysfs makes it very clear for a
programmer as to how to use them.... just read and copy them into the
attr struct's corresponding members, and requires no changes to the
existing attr struct (at least for the moment).

- Corey