I'm working on getting the hardware performance counters working on a
Raspberry-Pi (BCM2835/ARM1176).
The counters are there, but the overflow interrupt is not hooked up so the
init code disables perf_event.
The following patch enables perf_event and it works fine for simple
"perf stat" type workloads. perf record and anything requiring sampling
doesn't work (as expected).
I thought I would have to add a periodic timer to catch counter overflows,
but it turns out that's unnecessary. From what I can tell even though the
nPMUIRQ interrupt is not hooked up, the overflows are marked in the status
register and this is noticed and handled at context-switch time. So as
long as the counters overflow less frequently than the context switch
interval the registers don't overflow.
So my question, is a patch like this acceptable?
Should the perf_event interface handle setups like this better and work
fine in aggregate mode but return ENOTSUP if a sampled or overflow event
is attempted?
Vince
Signed-off-by: Vince Weaver <[email protected]>
diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index d85055c..ff1a752 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -97,8 +97,8 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
irqs = min(pmu_device->num_resources, num_possible_cpus());
if (irqs < 1) {
- pr_err("no irqs for PMUs defined\n");
- return -ENODEV;
+ printk_once("no irqs for PMUs defined, enabling anyway\n");
+ return 0;
}
for (i = 0; i < irqs; ++i) {
Vince,
I ran into the same problem myself on my Raspberry PI. In fact,
I am really frustrated by ARM platforms not having that IRQ handled
properly. It is pretty common for manufacturers not to provide the code
to route the IRQ, even though the HW does support it. Sometimes the
specs on how to do this are not even public it seems.
Having the PMU IRQ setup properly would make using the PMU on ARM
a lot more attractive.
On Wed, Jan 8, 2014 at 10:28 PM, Vince Weaver <[email protected]> wrote:
>
>
> I'm working on getting the hardware performance counters working on a
> Raspberry-Pi (BCM2835/ARM/1176).
>
> The counters are there, but the overflow interrupt is not hooked up so the
> init code disables perf_event.
>
> The following patch enables perf_event and it works fine for simple
> "perf stat" type workloads. perf record and anything requiring sampling
> doesn't work (as expected).
>
> I thought I would have to add a periodic timer to catch counter overflows,
> but it turns out that's unnecessary. From what I can tell even though the
> nPMUIRQ interrupt is not hooked up, the overflows are marked in the status
> register and this is noticed and handled at context-switch time. So as
> long as the counters overflow less frequently than the context switch
> interval the registers don't overflow.
>
> So my question, is a patch like this acceptable?
>
> Should the perf_event interface handle setups like this better and work
> fine in aggregate mode but return ENOTSUP if a sampled or overflow event
> is attempted?
>
> Vince
>
>
> Signed-off-by: Vince Weaver <[email protected]>
>
> diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
> index d85055c..ff1a752 100644
> --- a/arch/arm/kernel/perf_event_cpu.c
> +++ b/arch/arm/kernel/perf_event_cpu.c
> @@ -97,8 +97,8 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
>
> irqs = min(pmu_device->num_resources, num_possible_cpus());
> if (irqs < 1) {
> - pr_err("no irqs for PMUs defined\n");
> - return -ENODEV;
> + printk_once("no irqs for PMUs defined, enabling anyway\n");
> + return 0;
> }
>
> for (i = 0; i < irqs; ++i) {
On Wed, Jan 08, 2014 at 04:28:20PM -0500, Vince Weaver wrote:
> Should the perf_event interface handle setups like this better and work
> fine in aggregate mode but return ENOTSUP if a sampled or overflow event
> is attempted?
Yeah that would be better, we do something similar for P6 class machines
without lapic IIRC.
On Wed, 8 Jan 2014, Peter Zijlstra wrote:
> On Wed, Jan 08, 2014 at 04:28:20PM -0500, Vince Weaver wrote:
> > Should the perf_event interface handle setups like this better and work
> > fine in aggregate mode but return ENOTSUP if a sampled or overflow event
> > is attempted?
>
> Yeah that would be better, we do something similar for P6 class machines
> without lapic IIRC.
You're right. Something like the following works for me on the rasp-pi,
although maybe if x86 is doing it too things should be moved up into
generic code?
---
Return EOPNOTSUPP if we have no PMU overflow interrupt but a
sampled event is requested.
Signed-off-by: Vince Weaver <[email protected]>
diff --git a/arch/arm/include/asm/pmu.h b/arch/arm/include/asm/pmu.h
index f24edad..f1f27a3 100644
--- a/arch/arm/include/asm/pmu.h
+++ b/arch/arm/include/asm/pmu.h
@@ -87,6 +87,7 @@ struct arm_pmu {
u64 max_period;
struct platform_device *plat_device;
struct pmu_hw_events *(*get_hw_events)(void);
+ int no_overflow_interrupt;
};
#define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index bc3f2ef..fa4ecc3 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -397,7 +397,7 @@ __hw_perf_event_init(struct perf_event *event)
*/
hwc->config_base |= (unsigned long)mapping;
- if (!hwc->sample_period) {
+ if (!is_sampling_event(event)) {
/*
* For non-sampling runs, limit the sample_period to half
* of the counter width. That way, the new counter value
@@ -407,6 +407,12 @@ __hw_perf_event_init(struct perf_event *event)
hwc->sample_period = armpmu->max_period >> 1;
hwc->last_period = hwc->sample_period;
local64_set(&hwc->period_left, hwc->sample_period);
+ } else {
+ /*
+ * If we have no PMU interrupt we cannot sample
+ */
+ if (armpmu->no_overflow_interrupt)
+ return -EOPNOTSUPP;
}
if (event->group_leader != event) {
diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index d85055c..a74e0cd 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -97,8 +97,9 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
irqs = min(pmu_device->num_resources, num_possible_cpus());
if (irqs < 1) {
- pr_err("no irqs for PMUs defined\n");
- return -ENODEV;
+ printk_once("no irqs for PMUs defined, emulating with hrtimer\n");
+ cpu_pmu->no_overflow_interrupt=1;
+ return 0;
}
for (i = 0; i < irqs; ++i) {
Hi Vince,
On Thu, Jan 09, 2014 at 03:47:19AM +0000, Vince Weaver wrote:
> On Wed, 8 Jan 2014, Peter Zijlstra wrote:
>
> > On Wed, Jan 08, 2014 at 04:28:20PM -0500, Vince Weaver wrote:
> > > Should the perf_event interface handle setups like this better and work
> > > fine in aggregate mode but return ENOTSUP if a sampled or overflow event
> > > is attempted?
> >
> > Yeah that would be better, we do something similar for P6 class machines
> > without lapic IIRC.
>
> You're right. Something like the following works for me on the rasp-pi,
> although maybe if x86 is doing it too things should be moved up into
> generic code?
I'd rather see it in the generic code if at all possible. Maybe we could add
a flags field to perf_pmu_register?
> Return EOPNOTSUPP if we have no PMU overflow interrupt but a
> sampled event is requested.
>
> Signed-off-by: Vince Weaver <[email protected]>
>
> diff --git a/arch/arm/include/asm/pmu.h b/arch/arm/include/asm/pmu.h
> index f24edad..f1f27a3 100644
> --- a/arch/arm/include/asm/pmu.h
> +++ b/arch/arm/include/asm/pmu.h
> @@ -87,6 +87,7 @@ struct arm_pmu {
> u64 max_period;
> struct platform_device *plat_device;
> struct pmu_hw_events *(*get_hw_events)(void);
> + int no_overflow_interrupt;
For the arm bits, we can actually use platform_get_irq on the platform
device and avoid the need for a new field in here.
Cheers,
Will
On Thu, 9 Jan 2014, Will Deacon wrote:
> I'd rather see it in the generic code if at all possible. Maybe we could add
> a flags field to perf_pmu_register?
I can look into adding the check in generic code.
In the meantime, would you consider a patch like this that disables
the IRQ check and lets ARM devices missing an IRQ (such as the rasp-pi)
still have access to the counters?
diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index d85055c..ff1a752 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -97,8 +97,8 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
irqs = min(pmu_device->num_resources, num_possible_cpus());
if (irqs < 1) {
- pr_err("no irqs for PMUs defined\n");
- return -ENODEV;
+ printk_once("no irqs for PMUs defined, sampling events not supported\n");
+ return 0;
}
for (i = 0; i < irqs; ++i) {
On Thu, Jan 09, 2014 at 11:08:47PM -0500, Vince Weaver wrote:
> On Thu, 9 Jan 2014, Will Deacon wrote:
>
> > I'd rather see it in the generic code if at all possible. Maybe we could add
> > a flags field to perf_pmu_register?
>
> I can look into adding the check in generic code.
Adding something like this to the generic code would mean adding a
struct pmu capabilities field and visiting all existing PMU
implementations to properly fill this out.
There's a number of hardware PMU implementations that do not have an
interrupt and would need to set this flag.
On Fri, Jan 10, 2014 at 04:08:47AM +0000, Vince Weaver wrote:
> On Thu, 9 Jan 2014, Will Deacon wrote:
>
> > I'd rather see it in the generic code if at all possible. Maybe we could add
> > a flags field to perf_pmu_register?
>
> I can look into adding the check in generic code.
>
> In the meantime, would you consider a patch like this that disables
> the IRQ check and lets ARM devices missing an IRQ (such as the rasp-pi)
> still have access to the counters?
In the absence of a core change, I think I'd rather have something like your
second patch, but without the extra no_overflow_irq field (you can check the
platform device, as I mentioned previously).
Cheers,
Will
>
> diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
> index d85055c..ff1a752 100644
> --- a/arch/arm/kernel/perf_event_cpu.c
> +++ b/arch/arm/kernel/perf_event_cpu.c
> @@ -97,8 +97,8 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
>
> irqs = min(pmu_device->num_resources, num_possible_cpus());
> if (irqs < 1) {
> - pr_err("no irqs for PMUs defined\n");
> - return -ENODEV;
> + printk_once("no irqs for PMUs defined, sampling events not supported\n");
> + return 0;
> }
>
> for (i = 0; i < irqs; ++i) {
>
>
On Fri, Jan 10, 2014 at 11:08 AM, Will Deacon <[email protected]> wrote:
> On Fri, Jan 10, 2014 at 04:08:47AM +0000, Vince Weaver wrote:
>> On Thu, 9 Jan 2014, Will Deacon wrote:
>>
>> > I'd rather see it in the generic code if at all possible. Maybe we could add
>> > a flags field to perf_pmu_register?
>>
>> I can look into adding the check in generic code.
>>
>> In the meantime, would you consider a patch like this that disables
>> the IRQ check and lets ARM devices missing an IRQ (such as the rasp-pi)
>> still have access to the counters?
>
> In the absence of a core change, I think I'd rather have something like your
> second patch, but without the extra no_overflow_irq field (you can check the
> platform device, as I mentioned previously).
>
But before we do that, can someone confirm that on Pi there is simply
no interrupt
generated by the PMU or is it that we don't know how to route it back?
> Cheers,
>
> Will
>
>>
>> diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
>> index d85055c..ff1a752 100644
>> --- a/arch/arm/kernel/perf_event_cpu.c
>> +++ b/arch/arm/kernel/perf_event_cpu.c
>> @@ -97,8 +97,8 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
>>
>> irqs = min(pmu_device->num_resources, num_possible_cpus());
>> if (irqs < 1) {
>> - pr_err("no irqs for PMUs defined\n");
>> - return -ENODEV;
>> + printk_once("no irqs for PMUs defined, sampling events not supported\n");
>> + return 0;
>> }
>>
>> for (i = 0; i < irqs; ++i) {
>>
>>
On Fri, 10 Jan 2014, Stephane Eranian wrote:
> But before we do that, can someone confirm that on Pi there is simply
> no interrupt
> generated by the PMU or is it that we don't know how to route it back?
The closest I've seen is this posting from one of the rasp-pi employees
who presumably has access to better documentation than we do:
http://www.raspberrypi.org/phpBB3/viewtopic.php?f=33&t=19151
It sounds like the SoC just doesn't bother hooking the nPMUIRQ line up
anything useful on the interrupt controller. The Pi is complicated
because it's really just a GPU chip that just happens to have an ARM1176
core hanging off the side of it.
Vince
On Fri, 10 Jan 2014, Will Deacon wrote:
> In the absence of a core change, I think I'd rather have something like your
> second patch, but without the extra no_overflow_irq field (you can check the
> platform device, as I mentioned previously).
Something like the following? It works on my rasp-pi, still waiting for
the compile to finish on the pandaboard so I haven't verified that the
has-working-interrupt case still works.
---
Allow hardware perf_events to be enabled even if no overflow
interrupt is available. Return EOPNOTSUPP if a sampling event
is attempted and the interrupt is not available.
This is necessary for access to hardware perf_events on
the Raspberry-Pi (bcm2835).
Signed-off-by: Vince Weaver <[email protected]>
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index bc3f2ef..e2c4aa2 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -397,7 +397,7 @@ __hw_perf_event_init(struct perf_event *event)
*/
hwc->config_base |= (unsigned long)mapping;
- if (!hwc->sample_period) {
+ if (!is_sampling_event(event)) {
/*
* For non-sampling runs, limit the sample_period to half
* of the counter width. That way, the new counter value
@@ -407,6 +407,14 @@ __hw_perf_event_init(struct perf_event *event)
hwc->sample_period = armpmu->max_period >> 1;
hwc->last_period = hwc->sample_period;
local64_set(&hwc->period_left, hwc->sample_period);
+ } else {
+
+ /*
+ * If we have no PMU interrupt we cannot sample.
+ */
+ if (platform_get_irq(armpmu->plat_device, 0) < 0)
+ return -EOPNOTSUPP;
+
}
if (event->group_leader != event) {
diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index d85055c..7a84738 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -97,8 +97,8 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
irqs = min(pmu_device->num_resources, num_possible_cpus());
if (irqs < 1) {
- pr_err("no irqs for PMUs defined\n");
- return -ENODEV;
+ printk_once("no irqs for PMUs defined, sampled events not supported\n");
+ return 0;
}
for (i = 0; i < irqs; ++i) {
On Fri, 10 Jan 2014, Peter Zijlstra wrote:
> On Thu, Jan 09, 2014 at 11:08:47PM -0500, Vince Weaver wrote:
> > On Thu, 9 Jan 2014, Will Deacon wrote:
> >
> > > I'd rather see it in the generic code if at all possible. Maybe we could add
> > > a flags field to perf_pmu_register?
> >
> > I can look into adding the check in generic code.
>
> Adding something like this to the generic code would mean adding a
> struct pmu capabilities field and visiting all existing PMU
> implementations to properly fill this out.
I don't see an existing pmu capabilities struct... or do you mean
coming up with one?
Would it only hold an "overflow_interrupt_available" flag, or are
there other generic capabilities it would be handy to know about?
> There's a number of hardware PMU implementations that do not have an
> interrupt and would need to set this flag.
Well that can be added gradually, right? Things wouldn't get any worse if
we add a generic check without auditing all code, things will just behave
the same as before for those architectures.
There is some subtlety here though. On ARM (or at least rasp-pi) the
overflow hardware is there, just no interrupt is hooked up. So things
like counter overflow are handled as long as overflows aren't faster than
context switch time. It's just sampled events aren't possible.
On architectures without overflow support at all (I've had such hardware;
some SPARC machines, the Playstation 3 hypervisor) then counter overflow
isn't possible without a periodic timer (sort of like what is done with
Intel uncore). Is that something that should be in generic code too?
Vince
On Mon, Jan 13, 2014 at 11:55:17PM -0500, Vince Weaver wrote:
> On Fri, 10 Jan 2014, Peter Zijlstra wrote:
>
> > On Thu, Jan 09, 2014 at 11:08:47PM -0500, Vince Weaver wrote:
> > > On Thu, 9 Jan 2014, Will Deacon wrote:
> > >
> > > > I'd rather see it in the generic code if at all possible. Maybe we could add
> > > > a flags field to perf_pmu_register?
> > >
> > > I can look into adding the check in generic code.
> >
> > Adding something like this to the generic code would mean adding a
> > struct pmu capabilities field and visiting all existing PMU
> > implementations to properly fill this out.
>
> I don't see an existing pmu capabilities struct... or do you mean
> coming up with one?
Yeah, adding one.
> Would it only hold an "overflow_interrupt_available" flag, or are
> there other generic capabilities it would be handy to know about?
Possible (other) flags could be:
PMU_HAS_INT -- would allow sampling events
PMU_HAS_PRECISE -- would allow any ::precise value
PMU_HAS_FILTER -- would allow all os/user/etc. flags
> > There's a number of hardware PMU implementations that do not have an
> > interrupt and would need to set this flag.
>
> Well that can be added gradually, right? Things wouldn't get any worse if
> we add a generic check without auditing all code, things will just behave
> the same as before for those architectures.
Right, doing a sweep once every so often is useful to find more patterns
though.
> There is some subtlety here though. On ARM (or at least rasp-pi) the
> overflow hardware is there, just no interrupt is hooked up. So things
> like counter overflow are handled as long as overflows aren't faster than
> context switch time. It's just sampled events aren't possible.
>
> On architectures without overflow support at all (I've had such hardware;
> some SPARC machines, the Playstation 3 hypervisor) then counter overflow
> isn't possible without a periodic timer (sort of like what is done with
> Intel uncore). Is that something that should be in generic code too?
Maybe yeah, if there's enough replication of this it certainly makes
sense to lift it into generic code.
Hi Vince,
On Tue, Jan 14, 2014 at 04:42:13AM +0000, Vince Weaver wrote:
> On Fri, 10 Jan 2014, Will Deacon wrote:
> > In the absence of a core change, I think I'd rather have something like your
> > second patch, but without the extra no_overflow_irq field (you can check the
> > platform device, as I mentioned previously).
>
> Something like the following? It works on my rasp-pi, still waiting for
> the compile to finish on the pandaboard so I haven't verified that the
> has-working-interrupt case still works.
[...]
> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> index bc3f2ef..e2c4aa2 100644
> --- a/arch/arm/kernel/perf_event.c
> +++ b/arch/arm/kernel/perf_event.c
> @@ -397,7 +397,7 @@ __hw_perf_event_init(struct perf_event *event)
> */
> hwc->config_base |= (unsigned long)mapping;
>
> - if (!hwc->sample_period) {
> + if (!is_sampling_event(event)) {
> /*
> * For non-sampling runs, limit the sample_period to half
> * of the counter width. That way, the new counter value
> @@ -407,6 +407,14 @@ __hw_perf_event_init(struct perf_event *event)
> hwc->sample_period = armpmu->max_period >> 1;
> hwc->last_period = hwc->sample_period;
> local64_set(&hwc->period_left, hwc->sample_period);
> + } else {
> +
> + /*
> + * If we have no PMU interrupt we cannot sample.
> + */
> + if (platform_get_irq(armpmu->plat_device, 0) < 0)
> + return -EOPNOTSUPP;
I think this should be <= 0, but apart from that this looks alright to me in
the absence of a core change.
Will
On Wed, 15 Jan 2014, Will Deacon wrote:
> Hi Vince,
>
> On Tue, Jan 14, 2014 at 04:42:13AM +0000, Vince Weaver wrote:
> > On Fri, 10 Jan 2014, Will Deacon wrote:
> > > In the absence of a core change, I think I'd rather have something like your
> > > second patch, but without the extra no_overflow_irq field (you can check the
> > > platform device, as I mentioned previously).
> >
> > Something like the following? It works on my rasp-pi, still waiting for
> > the compile to finish on the pandaboard so I haven't verified that the
> > has-working-interrupt case still works.
>
> [...]
>
> > diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> > index bc3f2ef..e2c4aa2 100644
> > --- a/arch/arm/kernel/perf_event.c
> > +++ b/arch/arm/kernel/perf_event.c
> > @@ -397,7 +397,7 @@ __hw_perf_event_init(struct perf_event *event)
> > */
> > hwc->config_base |= (unsigned long)mapping;
> >
> > - if (!hwc->sample_period) {
> > + if (!is_sampling_event(event)) {
> > /*
> > * For non-sampling runs, limit the sample_period to half
> > * of the counter width. That way, the new counter value
> > @@ -407,6 +407,14 @@ __hw_perf_event_init(struct perf_event *event)
> > hwc->sample_period = armpmu->max_period >> 1;
> > hwc->last_period = hwc->sample_period;
> > local64_set(&hwc->period_left, hwc->sample_period);
> > + } else {
> > +
> > + /*
> > + * If we have no PMU interrupt we cannot sample.
> > + */
> > + if (platform_get_irq(armpmu->plat_device, 0) < 0)
> > + return -EOPNOTSUPP;
>
> I think this should be <= 0, but apart from that this looks alright to me in
> the absence of a core change.
It's a shame platform_get_irq() doesn't have better documentation.
I always forget what the result of the "is 0 a valid IRQ" flamewar was.
A grep through the kernel shows more or less an even split of using !,
using <0, and using <=0.
I'll put together a patch with this change and send it off. I'm also
investigating the proper core change, but I'm guessing that's going to
take a bit longer to get together, and it would be nice to have the
rasp-pi counters working sooner rather than later.
Vince
On Tue, 14 Jan 2014, Peter Zijlstra wrote:
> On Mon, Jan 13, 2014 at 11:55:17PM -0500, Vince Weaver wrote:
> > I don't see an existing pmu capabilities struct... or do you mean
> > coming up with one?
>
> Yeah, adding one.
So would it be a struct, or just an integer with flags?
> > Would it only hold an "overflow_interrupt_available" flag, or are
> > there other generic capabilities it would be handy to know about?
>
> Possible (other) flags could be:
>
> PMU_HAS_INT -- would allow sampling events
> PMU_HAS_PRECISE -- would allow any ::precise value
> PMU_HAS_FILTER -- would allow all os/user/etc. flags
should we export these to userspace somehow?
It would be handy to be able to tell you're getting EOPNOTSUP because
PMU_HAS_INT is not set for your pmu, rather than trying to guess why
things are failing.
Vince
On Thu, Jan 16, 2014 at 12:13:10PM -0500, Vince Weaver wrote:
> On Tue, 14 Jan 2014, Peter Zijlstra wrote:
>
> > On Mon, Jan 13, 2014 at 11:55:17PM -0500, Vince Weaver wrote:
>
> > > I don't see an existing pmu capabilities struct... or do you mean
> > > coming up with one?
> >
> > Yeah, adding one.
>
> So would it be a struct, or just an integer with flags?
I hadn't really considered that; per the below proposed names I suppose
I was thinking of 'unsigned long flags' with #define PMU_HAS_flags. But
a named struct would work too I suppose.
> > > Would it only hold an "overflow_interrupt_available" flag, or are
> > > there other generic capabilities it would be handy to know about?
> >
> > Possible (other) flags could be:
> >
> > PMU_HAS_INT -- would allow sampling events
> > PMU_HAS_PRECISE -- would allow any ::precise value
> > PMU_HAS_FILTER -- would allow all os/user/etc. flags
>
> should we export these to userspace somehow?
>
> It would be handy to be able to tell you're getting EOPNOTSUP because
> PMU_HAS_INT is not set for your pmu, rather than trying to guess why
> things are failing.
Yeah I suppose we could do something like that. Maybe something like:
# cat /sys/bus/event_source/devices/cpu/flags
int precise filter
?
On Thu, 16 Jan 2014, Peter Zijlstra wrote:
> Yeah I suppose we could do something like that. Maybe something like:
>
> # cat /sys/bus/event_source/devices/cpu/flags
> int precise filter
wouldn't that violate the "one value per file" rule?
I guess we could also stick it in the mmap page somewhere, as tools like
PAPI already have to check there for things like rdpmc support.
Anyway, here's a first pass at a generic fix for this issue. I'm sure the
code is pretty horrible, I don't usually mess with the perf code at this
level. I haven't tested yet as it touched enough files that the rasp-pi
is going to take a few hours to finish compiling.
Am I on the right track, or totally off here?
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index bc3f2ef..6782379 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -397,7 +397,7 @@ __hw_perf_event_init(struct perf_event *event)
*/
hwc->config_base |= (unsigned long)mapping;
- if (!hwc->sample_period) {
+ if (!is_sampling_event(event)) {
/*
* For non-sampling runs, limit the sample_period to half
* of the counter width. That way, the new counter value
diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index 20d553c..afb3fc8 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -97,8 +97,8 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
irqs = min(pmu_device->num_resources, num_possible_cpus());
if (irqs < 1) {
- pr_err("no irqs for PMUs defined\n");
- return -ENODEV;
+ printk_once("no irqs for PMUs defined, sampled events not supported\n");
+ return 0;
}
for (i = 0; i < irqs; ++i) {
@@ -150,6 +150,10 @@ static void cpu_pmu_init(struct arm_pmu *cpu_pmu)
/* Ensure the PMU has sane values out of reset. */
if (cpu_pmu->reset)
on_each_cpu(cpu_pmu->reset, cpu_pmu, 1);
+
+ if (platform_get_irq(cpu_pmu->plat_device, 0) <= 0)
+ cpu_pmu->pmu.capabilities |= PERF_PMU_NO_INTERRUPT;
+
}
/*
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2e069d1..f5e7f0c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -251,9 +251,19 @@ struct pmu {
* flush branch stack on context-switches (needed in cpu-wide mode)
*/
void (*flush_branch_stack) (void);
+
+ /*
+ * various common per-pmu feature flags
+ */
+ int capabilities;
};
/**
+ * struct pmu->capabilites flags
+ */
+#define PERF_PMU_NO_INTERRUPT 1
+
+/**
* enum perf_event_active_state - the states of a event
*/
enum perf_event_active_state {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f574401..f65c6c6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7056,6 +7056,14 @@ SYSCALL_DEFINE5(perf_event_open,
*/
pmu = event->pmu;
+ if (is_sampling_event(event)) {
+ if (pmu->capabilities & PERF_PMU_NO_INTERRUPT)
+ err = -ENOTSUPP;
+ goto err_alloc;
+ }
+
+
+
if (group_leader &&
(is_software_event(event) != is_software_event(group_leader))) {
if (is_software_event(event)) {
Hi Vince,
On Fri, Jan 17, 2014 at 05:45:04AM +0000, Vince Weaver wrote:
> On Thu, 16 Jan 2014, Peter Zijlstra wrote:
>
> > Yeah I suppose we could do something like that. Maybe something like:
> >
> > # cat /sys/bus/event_source/devices/cpu/flags
> > int precise filter
>
> wouldn't that violate the "one value per file" rule?
>
> I guess we could also stick it in the mmap page somewhere, as tools like
> PAPI already have to check there for things like rdpmc support.
>
>
> Anyway, here's a first pass at a generic fix for this issue. I'm sure the
> code is pretty horrible, I don't usually mess with the perf code at this
> level. I haven't tested yet as it touched enough files that the rasp-pi
> is going to take a few hours to finish compiling.
>
> Am I on the right track, or totally off here?
Where did we get to with this? If the generic changes are going to take some
time, I'm happy to take a temporary (non-invasive) fix in the ARM backend while
you sort out the core.
Will
On Mon, 24 Feb 2014, Will Deacon wrote:
> On Fri, Jan 17, 2014 at 05:45:04AM +0000, Vince Weaver wrote:
> > On Thu, 16 Jan 2014, Peter Zijlstra wrote:
> >
> > > Yeah I suppose we could do something like that. Maybe something like:
> > >
> > > # cat /sys/bus/event_source/devices/cpu/flags
> > > int precise filter
> >
> > wouldn't that violate the "one value per file" rule?
> >
> > I guess we could also stick it in the mmap page somewhere, as tools like
> > PAPI already have to check there for things like rdpmc support.
> >
> >
> > Anyway, here's a first pass at a generic fix for this issue. I'm sure the
> > code is pretty horrible, I don't usually mess with the perf code at this
> > level. I haven't tested yet as it touched enough files that the rasp-pi
> > is going to take a few hours to finish compiling.
> >
> > Am I on the right track, or totally off here?
>
> Where did we get to with this? If the generic changes are going to take some
> time, I'm happy to take a temporary (non-invasive) fix in the ARM backend while
> you sort out the core.
OK, let me revisit the patches. The last set of generic ones I sent out
actually was broken on ARM and I've been meaning to straighten things out
and send a proper follow up patch set but got distracted by other perf
related issues.
Vince