2013-08-05 20:27:27

by Vince Weaver

[permalink] [raw]
Subject: perf,arm -- oops in validate_event


My perf_fuzzer quickly triggers this oops on my ARM Cortex A9 pandaboard
running Linux 3.11-rc4.

Below is the oops, I've attached a simple C test case that triggers the
bug.

Vince


[ 8110.698669] Unable to handle kernel paging request at virtual address fffffffe
[ 8110.706390] pgd = ecd88000
[ 8110.708251] [fffffffe] *pgd=ae7f6821, *pte=00000000, *ppte=00000000
[ 8110.715820] Internal error: Oops: 80000007 [#2] SMP ARM
[ 8110.716033] Modules linked in: bluetooth snd_soc_omap_hdmi omapdss snd_soc_omap_abe_twl6040 snd_soc_twl6040 snd_soc_omap_hdmi_card snd_soc_omap snd_soc_omap_mcpdm snd_soc_omap_mcbsp snd_soc_core snd_compress regmap_spi snd_pcm snd_page_alloc snd_timer snd soundcore
[ 8110.743133] CPU: 1 PID: 28431 Comm: perf_fuzzer Tainted: G D 3.11.0-rc4 #4
[ 8110.743133] task: edab8100 ti: ece5c000 task.ti: ece5c000
[ 8110.760681] PC is at 0xfffffffe
[ 8110.760681] LR is at validate_event+0x3c/0x50
[ 8110.766906] pc : [<fffffffe>] lr : [<c001bf9c>] psr: 20000033
[ 8110.766906] sp : ece5de40 ip : edfbd960 fp : edfbd800
[ 8110.775238] r10: 00000000 r9 : 00000000 r8 : ed8c3ec0
[ 8110.781066] r7 : ed8c3f5c r6 : edfbd800 r5 : ecaed000 r4 : ece5de4c
[ 8110.791107] r3 : ffffffff r2 : 000000d9 r1 : ecaed000 r0 : ece5de50
[ 8110.791107] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user
[ 8110.803924] Control: 10c5387d Table: acd8804a DAC: 00000015
[ 8110.814239] Process perf_fuzzer (pid: 28431, stack limit = 0xece5c240)
[ 8110.821197] Stack: (0xece5de40 to 0xece5e000)
[ 8110.821197] de40: 00000000 c001c280 00000002 00000000 00000001 ece5de4c 00000000 c00bf058
[ 8110.831085] de60: 00000000 c008626c 00000000 00000000 00000000 edfbd800 ed8c3ec0 edfbd800
[ 8110.831085] de80: 00000000 c073ffac ece5df20 c00bf160 00000001 00000000 c00bf058 ece5df20
[ 8110.851959] dea0: 00000000 ed8c3ec0 00000000 00000000 00000000 c0cb0818 edab8100 c00bf420
[ 8110.860656] dec0: ece5df20 00000000 edab8100 ecaed000 00000000 00000000 00000000 00000000
[ 8110.862182] dee0: 00000000 ecad5680 edab8100 c00bfe48 00000000 00000000 00000000 c073e7c0
[ 8110.862182] df00: 00000000 ece5c000 c15036e8 ece5c030 00000005 c06eb5c0 6b139c44 00000000
[ 8110.879913] df20: 00000004 00000050 8dfff7d3 00000000 00000000 00000000 00000000 00000000
[ 8110.895507] df40: 00000000 00000000 001d4a0b 00000000 00000000 00000000 00000000 00000000
[ 8110.901062] df60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 8110.911102] df80: 00000000 00000000 00090990 000103a4 0000016c c00128e8 ece5c000 00000000
[ 8110.921112] dfa0: 000107a0 c0012700 00000000 00090990 00090bd0 00000000 00000000 00000004
[ 8110.921112] dfc0: 00000000 00090990 000103a4 0000016c 00090bd0 00090bc8 00090998 000107a0
[ 8110.931060] dfe0: beab7be0 beab7bd0 0000b6c9 b6f016d0 40000010 00090bd0 00000000 00000000
[ 8110.941009] [<c001bf9c>] (validate_event+0x3c/0x50) from [<c001c280>] (armpmu_event_init+0x16c/0x280)
[ 8110.953247] [<c001c280>] (armpmu_event_init+0x16c/0x280) from [<c00bf160>] (perf_init_event+0x108/0x180)
[ 8110.967712] [<c00bf160>] (perf_init_event+0x108/0x180) from [<c00bf420>] (perf_event_alloc+0x248/0x40c)
[ 8110.971069] [<c00bf420>] (perf_event_alloc+0x248/0x40c) from [<c00bfe48>] (SyS_perf_event_open+0x4f4/0x8fc)
[ 8110.981048] [<c00bfe48>] (SyS_perf_event_open+0x4f4/0x8fc) from [<c0012700>] (ret_fast_syscall+0x0/0x48)
[ 8110.998199] Code: bad PC value
[ 8111.001495] ---[ end trace 0e6c892fae28bee4 ]---


Attachments:
arm_perf_oops.c (2.08 kB)

2013-08-05 21:17:36

by Vince Weaver

[permalink] [raw]
Subject: Re: perf,arm -- oops in validate_event

On Mon, 5 Aug 2013, Vince Weaver wrote:

> My perf_fuzzer quickly triggers this oops on my ARM Cortex A9 pandaboard
> running Linux 3.11-rc4.
>
> Below is the oops, I've attached a simple C test case that triggers the
> bug.

Also, if it helps, the disassembled code in question.

It looks like in validate_event() we do

struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
...
return armpmu->get_event_idx(hw_events, event) >= 0;

armpmu is read into r3, and somehow the value at the offset of
armpmu->get_event_idx is either -1 or 0, so when it does a "blx"
branch to the address at this offset we get the ooops.

c001bf8c: e3120010 tst r2, #16
c001bf90: 0a000004 beq c001bfa8 <validate_event+0x48>
c001bf94: e5933070 ldr r3, [r3, #112] ; 0x70
* c001bf98: e12fff33 blx r3
c001bf9c: e1e00000 mvn r0, r0

I'm having trouble tracing the code back past that, and I don't have time
to start adding printk's and recompiling right now.

Vince

> [ 8110.698669] Unable to handle kernel paging request at virtual address fffffffe
> [ 8110.706390] pgd = ecd88000
> [ 8110.708251] [fffffffe] *pgd=ae7f6821, *pte=00000000, *ppte=00000000
> [ 8110.715820] Internal error: Oops: 80000007 [#2] SMP ARM
> [ 8110.716033] Modules linked in: bluetooth snd_soc_omap_hdmi omapdss snd_soc_omap_abe_twl6040 snd_soc_twl6040 snd_soc_omap_hdmi_card snd_soc_omap snd_soc_omap_mcpdm snd_soc_omap_mcbsp snd_soc_core snd_compress regmap_spi snd_pcm snd_page_alloc snd_timer snd soundcore
> [ 8110.743133] CPU: 1 PID: 28431 Comm: perf_fuzzer Tainted: G D 3.11.0-rc4 #4
> [ 8110.743133] task: edab8100 ti: ece5c000 task.ti: ece5c000
> [ 8110.760681] PC is at 0xfffffffe
> [ 8110.760681] LR is at validate_event+0x3c/0x50
> [ 8110.766906] pc : [<fffffffe>] lr : [<c001bf9c>] psr: 20000033
> [ 8110.766906] sp : ece5de40 ip : edfbd960 fp : edfbd800
> [ 8110.775238] r10: 00000000 r9 : 00000000 r8 : ed8c3ec0
> [ 8110.781066] r7 : ed8c3f5c r6 : edfbd800 r5 : ecaed000 r4 : ece5de4c
> [ 8110.791107] r3 : ffffffff r2 : 000000d9 r1 : ecaed000 r0 : ece5de50
> [ 8110.791107] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user
> [ 8110.803924] Control: 10c5387d Table: acd8804a DAC: 00000015
> [ 8110.814239] Process perf_fuzzer (pid: 28431, stack limit = 0xece5c240)
> [ 8110.821197] Stack: (0xece5de40 to 0xece5e000)
> [ 8110.821197] de40: 00000000 c001c280 00000002 00000000 00000001 ece5de4c 00000000 c00bf058
> [ 8110.831085] de60: 00000000 c008626c 00000000 00000000 00000000 edfbd800 ed8c3ec0 edfbd800
> [ 8110.831085] de80: 00000000 c073ffac ece5df20 c00bf160 00000001 00000000 c00bf058 ece5df20
> [ 8110.851959] dea0: 00000000 ed8c3ec0 00000000 00000000 00000000 c0cb0818 edab8100 c00bf420
> [ 8110.860656] dec0: ece5df20 00000000 edab8100 ecaed000 00000000 00000000 00000000 00000000
> [ 8110.862182] dee0: 00000000 ecad5680 edab8100 c00bfe48 00000000 00000000 00000000 c073e7c0
> [ 8110.862182] df00: 00000000 ece5c000 c15036e8 ece5c030 00000005 c06eb5c0 6b139c44 00000000
> [ 8110.879913] df20: 00000004 00000050 8dfff7d3 00000000 00000000 00000000 00000000 00000000
> [ 8110.895507] df40: 00000000 00000000 001d4a0b 00000000 00000000 00000000 00000000 00000000
> [ 8110.901062] df60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [ 8110.911102] df80: 00000000 00000000 00090990 000103a4 0000016c c00128e8 ece5c000 00000000
> [ 8110.921112] dfa0: 000107a0 c0012700 00000000 00090990 00090bd0 00000000 00000000 00000004
> [ 8110.921112] dfc0: 00000000 00090990 000103a4 0000016c 00090bd0 00090bc8 00090998 000107a0
> [ 8110.931060] dfe0: beab7be0 beab7bd0 0000b6c9 b6f016d0 40000010 00090bd0 00000000 00000000
> [ 8110.941009] [<c001bf9c>] (validate_event+0x3c/0x50) from [<c001c280>] (armpmu_event_init+0x16c/0x280)
> [ 8110.953247] [<c001c280>] (armpmu_event_init+0x16c/0x280) from [<c00bf160>] (perf_init_event+0x108/0x180)
> [ 8110.967712] [<c00bf160>] (perf_init_event+0x108/0x180) from [<c00bf420>] (perf_event_alloc+0x248/0x40c)
> [ 8110.971069] [<c00bf420>] (perf_event_alloc+0x248/0x40c) from [<c00bfe48>] (SyS_perf_event_open+0x4f4/0x8fc)
> [ 8110.981048] [<c00bfe48>] (SyS_perf_event_open+0x4f4/0x8fc) from [<c0012700>] (ret_fast_syscall+0x0/0x48)
> [ 8110.998199] Code: bad PC value
> [ 8111.001495] ---[ end trace 0e6c892fae28bee4 ]---

2013-08-06 11:20:15

by Mark Rutland

[permalink] [raw]
Subject: Re: perf,arm -- oops in validate_event

Hi Vince,

Thanks for the report.

On Mon, Aug 05, 2013 at 10:17:37PM +0100, Vince Weaver wrote:
> On Mon, 5 Aug 2013, Vince Weaver wrote:
>
> > My perf_fuzzer quickly triggers this oops on my ARM Cortex A9 pandaboard
> > running Linux 3.11-rc4.
> >
> > Below is the oops, I've attached a simple C test case that triggers the
> > bug.
>
> Also, if it helps, the disassembled code in question.
>
> It looks like in validate_event() we do
>
> struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> ...
> return armpmu->get_event_idx(hw_events, event) >= 0;
>
> armpmu is read into r3, and somehow the value at the offset of
> armpmu->get_event_idx is either -1 or 0, so when it does a "blx"
> branch to the address at this offset we get the ooops.
>
> c001bf8c: e3120010 tst r2, #16
> c001bf90: 0a000004 beq c001bfa8 <validate_event+0x48>
> c001bf94: e5933070 ldr r3, [r3, #112] ; 0x70
> * c001bf98: e12fff33 blx r3
> c001bf9c: e1e00000 mvn r0, r0
>
> I'm having trouble tracing the code back past that, and I don't have time
> to start adding printk's and recompiling right now.
>
> Vince

I think I can save you the effort :)

>From the looks of the test case and the kernel code in question, it
looks like the following happens:

* We create a software event, which becomes its own group leader.
* We create a hardware event, with the software event as its group
leader.
* When we try to schedule the hardware event, we try to validate all
events in its event group (the leader + siblings), but in doing so we
treat the software event as a hardware event, and erroneously try to
get its (non-existent) arm_pmu container, and call some garbage value
as get_event_idx(...).

This could also happen if we tried to add events from different hardware
PMUs to the same groups. I'm not sure if that's valid, but I couldn't
see any code preventing that, and it seems the x86 validation logic is
wired to allow this. If it's not valid, we could skip validation of
software events by checking with is_software_event.

Either way, I believe the patch below should solve the issue.

Thanks,
Mark.

---->8----

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index d9f5cd4..a7609a0 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -262,9 +262,15 @@ validate_event(struct pmu_hw_events *hw_events,
return armpmu->get_event_idx(hw_events, event) >= 0;
}

+static bool is_pmu_event(struct pmu *pmu, struct perf_event *event)
+{
+ return pmu == event->pmu;
+}
+
static int
validate_group(struct perf_event *event)
{
+ struct pmu *pmu = event->pmu;
struct perf_event *sibling, *leader = event->group_leader;
struct pmu_hw_events fake_pmu;
DECLARE_BITMAP(fake_used_mask, ARMPMU_MAX_HWEVENTS);
@@ -276,10 +282,17 @@ validate_group(struct perf_event *event)
memset(fake_used_mask, 0, sizeof(fake_used_mask));
fake_pmu.used_mask = fake_used_mask;

- if (!validate_event(&fake_pmu, leader))
+ if (is_pmu_event(pmu, leader) && !validate_event(&fake_pmu, leader))
return -EINVAL;

list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
+ /*
+ * We do not validate events for other PMUs (e.g. software
+ * events). Software events are always schedulable, and other
+ * PMU events must be validated elsewhere.
+ */
+ if (!is_pmu_event(pmu, sibling))
+ continue;
if (!validate_event(&fake_pmu, sibling))
return -EINVAL;
}

2013-08-06 11:59:38

by Will Deacon

[permalink] [raw]
Subject: Re: perf,arm -- oops in validate_event

On Tue, Aug 06, 2013 at 12:19:32PM +0100, Mark Rutland wrote:
> On Mon, Aug 05, 2013 at 10:17:37PM +0100, Vince Weaver wrote:
> > It looks like in validate_event() we do
> >
> > struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> > ...
> > return armpmu->get_event_idx(hw_events, event) >= 0;
> >
> > armpmu is read into r3, and somehow the value at the offset of
> > armpmu->get_event_idx is either -1 or 0, so when it does a "blx"
> > branch to the address at this offset we get the ooops.
> >
> > c001bf8c: e3120010 tst r2, #16
> > c001bf90: 0a000004 beq c001bfa8 <validate_event+0x48>
> > c001bf94: e5933070 ldr r3, [r3, #112] ; 0x70
> > * c001bf98: e12fff33 blx r3
> > c001bf9c: e1e00000 mvn r0, r0
> >
> > I'm having trouble tracing the code back past that, and I don't have time
> > to start adding printk's and recompiling right now.
> >
> > Vince
>
> I think I can save you the effort :)
>
> From the looks of the test case and the kernel code in question, it
> looks like the following happens:
>
> * We create a software event, which becomes its own group leader.
> * We create a hardware event, with the software event as its group
> leader.
> * When we try to schedule the hardware event, we try to validate all
> events in its event group (the leader + siblings), but in doing so we
> treat the software event as a hardware event, and erroneously try to
> get its (non-existent) arm_pmu container, and call some garbage value
> as get_event_idx(...).
>
> This could also happen if we tried to add events from different hardware
> PMUs to the same groups. I'm not sure if that's valid, but I couldn't
> see any code preventing that, and it seems the x86 validation logic is
> wired to allow this. If it's not valid, we could skip validation of
> software events by checking with is_software_event.

But we already check `event->pmu != leader_pmu' in validate_event, so we
shouldn't get anywhere nearer calling get_event_idx in the case you
describe. It sounds more like we have an inconsistency with one of the
events.

Can you dump the events as they're processed in validate_group please?

Will

2013-08-06 13:08:39

by Mark Rutland

[permalink] [raw]
Subject: Re: perf,arm -- oops in validate_event

On Tue, Aug 06, 2013 at 12:59:21PM +0100, Will Deacon wrote:
> On Tue, Aug 06, 2013 at 12:19:32PM +0100, Mark Rutland wrote:
> > On Mon, Aug 05, 2013 at 10:17:37PM +0100, Vince Weaver wrote:
> > > It looks like in validate_event() we do
> > >
> > > struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> > > ...
> > > return armpmu->get_event_idx(hw_events, event) >= 0;
> > >
> > > armpmu is read into r3, and somehow the value at the offset of
> > > armpmu->get_event_idx is either -1 or 0, so when it does a "blx"
> > > branch to the address at this offset we get the ooops.
> > >
> > > c001bf8c: e3120010 tst r2, #16
> > > c001bf90: 0a000004 beq c001bfa8 <validate_event+0x48>
> > > c001bf94: e5933070 ldr r3, [r3, #112] ; 0x70
> > > * c001bf98: e12fff33 blx r3
> > > c001bf9c: e1e00000 mvn r0, r0
> > >
> > > I'm having trouble tracing the code back past that, and I don't have time
> > > to start adding printk's and recompiling right now.
> > >
> > > Vince
> >
> > I think I can save you the effort :)
> >
> > From the looks of the test case and the kernel code in question, it
> > looks like the following happens:
> >
> > * We create a software event, which becomes its own group leader.
> > * We create a hardware event, with the software event as its group
> > leader.
> > * When we try to schedule the hardware event, we try to validate all
> > events in its event group (the leader + siblings), but in doing so we
> > treat the software event as a hardware event, and erroneously try to
> > get its (non-existent) arm_pmu container, and call some garbage value
> > as get_event_idx(...).
> >
> > This could also happen if we tried to add events from different hardware
> > PMUs to the same groups. I'm not sure if that's valid, but I couldn't
> > see any code preventing that, and it seems the x86 validation logic is
> > wired to allow this. If it's not valid, we could skip validation of
> > software events by checking with is_software_event.
>
> But we already check `event->pmu != leader_pmu' in validate_event, so we
> shouldn't get anywhere nearer calling get_event_idx in the case you
> describe. It sounds more like we have an inconsistency with one of the
> events.

Note in my example that the software event was the group leader (so in
fact we'd *only* be checking those events which we can't actually
handle...).

I was also under the impression that in the case of mixed hardware and
software events, a hardware event must be the group leader. That
doesn't seem to be the case. If a hardware event is added to a software
group, the group is moved to hardware context but the original software
event stays as the group leader.

Thanks,
Mark.

>
> Can you dump the events as they're processed in validate_group please?

Sure. Patch and output below. I only get one output line before it
explodes.

Thanks,
Mark.

---->8----

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index d9f5cd4..cdff367 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -253,6 +253,11 @@ validate_event(struct pmu_hw_events *hw_events,
struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
struct pmu *leader_pmu = event->group_leader->pmu;

+ printk("Event %p, PMU %p %s, leader PMU %p %s %s\n",
+ event, event->pmu, event->pmu->name,
+ leader_pmu, leader_pmu->name,
+ is_software_event(event) ? "Software" : "Hardware");
+
if (event->pmu != leader_pmu || event->state < PERF_EVENT_STATE_OFF)
return 1;

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f86599e..796f82b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5668,7 +5668,7 @@ static struct pmu perf_swevent = {
.start = perf_swevent_start,
.stop = perf_swevent_stop,
.read = perf_swevent_read,
-
+ .name = "perf_swevent",
.event_idx = perf_swevent_event_idx,
};

@@ -5788,6 +5788,7 @@ static struct pmu perf_tracepoint = {
.stop = perf_swevent_stop,
.read = perf_swevent_read,

+ .name = "perf_tracepoint",
.event_idx = perf_swevent_event_idx,
};

@@ -6014,7 +6015,7 @@ static struct pmu perf_cpu_clock = {
.start = cpu_clock_event_start,
.stop = cpu_clock_event_stop,
.read = cpu_clock_event_read,
-
+ .name = "perf_cpu_clock",
.event_idx = perf_swevent_event_idx,
};

@@ -6094,7 +6095,7 @@ static struct pmu perf_task_clock = {
.start = task_clock_event_start,
.stop = task_clock_event_stop,
.read = task_clock_event_read,
-
+ .name = "perf_task_clock",
.event_idx = perf_swevent_event_idx,
};

---->8----

Event 87210800, PMU 804d440c perf_task_clock, leader PMU 804d440c perf_task_clock Software
Unable to handle kernel NULL pointer dereference at virtual address 00000f58
pgd = 87380000
[00000f58] *pgd=672f9831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 1235 Comm: a.out Not tainted 3.11.0-rc4+ #154
task: 87a0f840 ti: 866b6000 task.ti: 866b6000
PC is at 0x80000000
LR is at validate_event+0x98/0xa8
pc : [<80000000>] lr : [<80016ac8>] psr: 20000013
sp : 866b7e08 ip : 00000000 fp : 866b7f20
r10: 87a0f840 r9 : 00000001 r8 : 866b7e3c
r7 : 80417588 r6 : 804d440c r5 : 804d440c r4 : 87210800
r3 : 80000000 r2 : 80612974 r1 : 87210800 r0 : 866b7e3c
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c53c7d Table: 6738004a DAC: 00000015
Process a.out (pid: 1235, stack limit = 0x866b6238)
Stack: (0x866b7e08 to 0x866b8000)
7e00: 804d440c 80417588 80410e30 87210400 87a5859c 87210400
7e20: 87a58500 87210800 000000d3 87a585a0 00000000 80016cc8 00000000 867501b8
7e40: 866b7e38 87380000 87a58500 87210400 804d42d4 00000000 87210400 800856d0
7e60: 87210800 87a58500 00000000 00000001 00000000 00000002 00000000 800859d4
7e80: 00000000 00000000 00000000 00000000 00000029 00000800 00000000 87a0f840
7ea0: 87210800 00000000 00000000 00000000 866b6000 00000000 8790d9c0 80086754
7ec0: 00000000 00000000 00000000 00000004 00000004 00000000 00000000 00000000
7ee0: 00000000 00000000 00000000 00000000 00000000 00000000 0009104c 866b7fb0
7f00: 00000000 76f3b000 00000000 80008468 8742d388 87ae0000 00000001 00000000
7f20: 00000004 00000050 8dfff7d3 00000000 00000000 00000000 00000000 00000000
7f40: 00000000 00000000 001d4a0b 00000000 00000000 00000000 00000000 00000000
7f60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
7f80: 866b6000 00000000 00000003 00000000 0000016c 8000e348 866b6000 00000000
7fa0: 00000000 8000e1a0 00000000 00000003 00093040 00000000 00000000 00000003
7fc0: 00000000 00000003 00000000 0000016c 00000000 00000000 76f3b000 00000000
7fe0: 7eb41740 7eb41730 00008451 76ec1ed0 40000010 00093040 e4836563 8503c5f2
[<80016ac8>] (validate_event+0x98/0xa8) from [<80016cc8>] (armpmu_event_init+0x1b8/0x27c)
[<80016cc8>] (armpmu_event_init+0x1b8/0x27c) from [<800856d0>] (perf_init_event+0xc8/0x104)
[<800856d0>] (perf_init_event+0xc8/0x104) from [<800859d4>] (perf_event_alloc+0x2c8/0x478)
[<800859d4>] (perf_event_alloc+0x2c8/0x478) from [<80086754>] (SyS_perf_event_open+0x86c/0x9d0)
[<80086754>] (SyS_perf_event_open+0x86c/0x9d0) from [<8000e1a0>] (ret_fast_syscall+0x0/0x30)
Code: bad PC value
---[ end trace 85dac5c0d80aac6d ]---

2013-08-07 13:00:54

by Will Deacon

[permalink] [raw]
Subject: Re: perf,arm -- oops in validate_event

On Tue, Aug 06, 2013 at 02:08:15PM +0100, Mark Rutland wrote:
> On Tue, Aug 06, 2013 at 12:59:21PM +0100, Will Deacon wrote:
> > But we already check `event->pmu != leader_pmu' in validate_event, so we
> > shouldn't get anywhere nearer calling get_event_idx in the case you
> > describe. It sounds more like we have an inconsistency with one of the
> > events.
>
> Note in my example that the software event was the group leader (so in
> fact we'd *only* be checking those events which we can't actually
> handle...).
>
> I was also under the impression that in the case of mixed hardware and
> software events, a hardware event must be the group leader. That
> doesn't seem to be the case. If a hardware event is added to a software
> group, the group is moved to hardware context but the original software
> event stays as the group leader.

Ok, so the following quick hack below should solve the issue (can you confirm
it please, since I don't have access to any hardware atm?)

We should revisit this for 3.12 though, because I'm not sure that our
validation code even does the right thing when there are multiple PMUs
involved.

Will

--->8

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index d9f5cd4..0500f10b 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -253,6 +253,9 @@ validate_event(struct pmu_hw_events *hw_events,
struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
struct pmu *leader_pmu = event->group_leader->pmu;

+ if (is_software_event(event))
+ return 1;
+
if (event->pmu != leader_pmu || event->state < PERF_EVENT_STATE_OFF)
return 1;

2013-08-07 13:54:55

by Mark Rutland

[permalink] [raw]
Subject: Re: perf,arm -- oops in validate_event

On Wed, Aug 07, 2013 at 02:00:27PM +0100, Will Deacon wrote:
> On Tue, Aug 06, 2013 at 02:08:15PM +0100, Mark Rutland wrote:
> > On Tue, Aug 06, 2013 at 12:59:21PM +0100, Will Deacon wrote:
> > > But we already check `event->pmu != leader_pmu' in validate_event, so we
> > > shouldn't get anywhere nearer calling get_event_idx in the case you
> > > describe. It sounds more like we have an inconsistency with one of the
> > > events.
> >
> > Note in my example that the software event was the group leader (so in
> > fact we'd *only* be checking those events which we can't actually
> > handle...).
> >
> > I was also under the impression that in the case of mixed hardware and
> > software events, a hardware event must be the group leader. That
> > doesn't seem to be the case. If a hardware event is added to a software
> > group, the group is moved to hardware context but the original software
> > event stays as the group leader.
>
> Ok, so the following quick hack below should solve the issue (can you confirm
> it please, since I don't have access to any hardware atm?)

It works for me when running Vince's test case.

Tested-by: Mark Rutland <[email protected]>

>
> We should revisit this for 3.12 though, because I'm not sure that our
> validation code even does the right thing when there are multiple PMUs
> involved.

Certainly. I suspect we're not alone there.

Thanks,
Mark.

>
> Will
>
> --->8
>
> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> index d9f5cd4..0500f10b 100644
> --- a/arch/arm/kernel/perf_event.c
> +++ b/arch/arm/kernel/perf_event.c
> @@ -253,6 +253,9 @@ validate_event(struct pmu_hw_events *hw_events,
> struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> struct pmu *leader_pmu = event->group_leader->pmu;
>
> + if (is_software_event(event))
> + return 1;
> +
> if (event->pmu != leader_pmu || event->state < PERF_EVENT_STATE_OFF)
> return 1;

2013-08-07 15:17:46

by Vince Weaver

[permalink] [raw]
Subject: Re: perf,arm -- oops in validate_event

On Wed, 7 Aug 2013, Will Deacon wrote:

> Ok, so the following quick hack below should solve the issue (can you confirm
> it please, since I don't have access to any hardware atm?)
>
> We should revisit this for 3.12 though, because I'm not sure that our
> validation code even does the right thing when there are multiple PMUs
> involved.
>
> --->8
>
> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> index d9f5cd4..0500f10b 100644
> --- a/arch/arm/kernel/perf_event.c
> +++ b/arch/arm/kernel/perf_event.c
> @@ -253,6 +253,9 @@ validate_event(struct pmu_hw_events *hw_events,
> struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> struct pmu *leader_pmu = event->group_leader->pmu;
>
> + if (is_software_event(event))
> + return 1;
> +
> if (event->pmu != leader_pmu || event->state < PERF_EVENT_STATE_OFF)
> return 1;

this isn't enough. You can also trigger the oops by using
tracepoint or breakpoint events as group leaders in addition to software
events.

Vince

2013-08-07 15:24:20

by Vince Weaver

[permalink] [raw]
Subject: Re: perf,arm -- oops in validate_event

On Wed, 7 Aug 2013, Vince Weaver wrote:

> On Wed, 7 Aug 2013, Will Deacon wrote:
>
> > Ok, so the following quick hack below should solve the issue (can you confirm
> > it please, since I don't have access to any hardware atm?)
> >
> > We should revisit this for 3.12 though, because I'm not sure that our
> > validation code even does the right thing when there are multiple PMUs
> > involved.
> >
> > --->8
> >
> > diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> > index d9f5cd4..0500f10b 100644
> > --- a/arch/arm/kernel/perf_event.c
> > +++ b/arch/arm/kernel/perf_event.c
> > @@ -253,6 +253,9 @@ validate_event(struct pmu_hw_events *hw_events,
> > struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> > struct pmu *leader_pmu = event->group_leader->pmu;
> >
> > + if (is_software_event(event))
> > + return 1;
> > +
> > if (event->pmu != leader_pmu || event->state < PERF_EVENT_STATE_OFF)
> > return 1;
>
> this isn't enough. You can also trigger the oops by using
> tracepoint or breakpoint events as group leaders in addition to software
> events.

I take that back, it turns out tracepoint and breakpoint both
have task_ctx_nr set to perf_sw_context (althouth breakpoint has
a comment saying this may change in the future).

Let me compile and verify the fix. It may take some time for the compile
to finish as it's not a very fast machine.

Vince

2013-08-07 19:59:41

by Vince Weaver

[permalink] [raw]
Subject: Re: perf,arm -- oops in validate_event

On Wed, 7 Aug 2013, Mark Rutland wrote:

> On Wed, Aug 07, 2013 at 02:00:27PM +0100, Will Deacon wrote:
> > On Tue, Aug 06, 2013 at 02:08:15PM +0100, Mark Rutland wrote:
> > > On Tue, Aug 06, 2013 at 12:59:21PM +0100, Will Deacon wrote:
> > Ok, so the following quick hack below should solve the issue (can you confirm
> > it please, since I don't have access to any hardware atm?)
>
> It works for me when running Vince's test case.
>
> Tested-by: Mark Rutland <[email protected]>

I've also tested it and the fix works for the various test cases I've
constructed for this issue.

Tested-by: Vince Weaver <[email protected]>

2013-08-07 22:36:08

by Will Deacon

[permalink] [raw]
Subject: Re: perf,arm -- oops in validate_event

On Wed, Aug 07, 2013 at 04:30:33PM +0100, Vince Weaver wrote:
> On Wed, 7 Aug 2013, Vince Weaver wrote:
> > On Wed, 7 Aug 2013, Will Deacon wrote:
> > > diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> > > index d9f5cd4..0500f10b 100644
> > > --- a/arch/arm/kernel/perf_event.c
> > > +++ b/arch/arm/kernel/perf_event.c
> > > @@ -253,6 +253,9 @@ validate_event(struct pmu_hw_events *hw_events,
> > > struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> > > struct pmu *leader_pmu = event->group_leader->pmu;
> > >
> > > + if (is_software_event(event))
> > > + return 1;
> > > +
> > > if (event->pmu != leader_pmu || event->state < PERF_EVENT_STATE_OFF)
> > > return 1;
> >
> > this isn't enough. You can also trigger the oops by using
> > tracepoint or breakpoint events as group leaders in addition to software
> > events.
>
> I take that back, it turns out tracepoint and breakpoint both
> have task_ctx_nr set to perf_sw_context (althouth breakpoint has
> a comment saying this may change in the future).
>
> Let me compile and verify the fix. It may take some time for the compile
> to finish as it's not a very fast machine.

Ok, thanks for testing, I'll send the fix to rmk with a CC for stable.

As for future work, we want to support multiple HW PMUs on ARM yet restrict
event groups to accept events targetting the same HW PMU. If we allow groups
to have a software event as a leader, yet still accept HW events, then this
check becomes a real pain because we essentially have to stash the PMU for
the first HW event added to the group.

I think I'd rather fail validation when adding a HW event to a group with a
SW event as a group leader.

PeterZ: any thoughts? This sort of stuff feels like it should really belong
in core code...

Will